Started 1 mo 20 days ago
Took 17 hr on green-dragon-08

Failed Build #5456 (Aug 30, 2019 12:07:39 PM)

  • : 370501
  • : 370493
  • : 370390
  • : 364589
  • : 370502
  • : 370482
  1. Make `vector` unconditionally move elements when exceptions are disabled.

    `std::vector<T>` is free choose between using copy or move operations when it needs to resize. The standard only candidates that the correct exception safety guarantees are provided. When exceptions are disabled these guarantees are trivially satisfied. Meaning vector is free to optimize it's implementation by moving instead of copying.

    This patch makes `std::vector` unconditionally move elements when exceptions are disabled.

    This optimization is conforming according to the current standard wording.

    There are concerns that moving in `-fno-noexceptions`mode will be a surprise to users. For example, a user may be surprised to find their code is slower with exceptions enabled than it is disabled. I'm sympathetic to this surprised, but I don't think it should block this optimization.

    Reviewers: mclow.lists, ldionne, rsmith

    Reviewed By: ldionne

    Subscribers: zoecarver, christof, dexonsmith, libcxx-commits

    Tags: #libc

    Differential Revision: (detail/ViewSVN)
    by ericwf
  2. gn build: Merge r370500 (detail/ViewSVN)
    by nico
  3. [MachinePipeliner] Separate schedule emission, NFC

    This is the first stage in refactoring the pipeliner and making it more
    accessible for backends to override and control. This separates the logic and
    state required to *emit* a scheudule from the logic that *computes* and
    validates a schedule.

    This will enable (a) new schedule emitters and (b) new modulo scheduling
    implementations to coexist.


    Differential Revision: (detail/ViewSVN)
    by jamesm
  4. [llvm-ifs][IFS] llvm Interface Stubs merging + object file generation tool.

    This tool merges interface stub files to produce a merged interface stub file
    or a stub library. Currently it for stub library generation it can produce an
    ELF .so stub file, or a TBD file (experimental). It will be used by the clang
    -emit-interface-stubs compilation pipeline to merge and assemble the per-CU
    stub files into a stub library.

    The new IFS format is as follows:

    --- !experimental-ifs-v1
    IfsVersion:      1.0
    Triple:          <llvm triple>
    ObjectFileFormat: <ELF | TBD>
      _ZSymbolName: { Type: <type>, etc... }

    Differential Revision: (detail/ViewSVN)
    by zer0
  5. [DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI.

    SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent. (detail/ViewSVN)
    by rksimon
  6. [TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node.

    Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.

    Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests (detail/ViewSVN)
    by rksimon
  7. GlobalISel: Fix missing pass dependency (detail/ViewSVN)
    by arsenm
  8. [X86] Pass v32i16/v64i8 in zmm registers on KNL target.

    gcc and icc pass these types in zmm registers in zmm registers.

    This patch implements a quick hack to override the register
    type before calling convention handling to one that is legal.
    Longer term we might want to do something similar to 256-bit
    integer registers on AVX1 where we just split all the operations.

    Fixes PR42957

    Differential Revision: (detail/ViewSVN)
    by ctopper
  9. [ValueTypes] Add v16f16 and v32f16 to EVT::getEVTString and Tablegen's getEnumName

    Missed these when I hadded the enum entries (detail/ViewSVN)
    by ctopper
  10. [clang-scan-deps] NFC, remove outdated implementation comment

    There's no need to purge symlinked entries in the FileManager,
    as the new FileEntryRef API allows us to compute dependencies more
    accurately when the FileManager is reused. (detail/ViewSVN)
    by arphaman
  11. gn build: Merge r370490 (detail/ViewSVN)
    by nico
  12. MemTag: unchecked load/store optimization.

    MTE allows memory access to bypass tag check iff the address argument
    is [SP, #imm]. This change takes advantage of this to demote uses of
    tagged addresses to regular FrameIndex operands, reducing register
    pressure in large functions.

    MO_TAGGED target flag is used to signal that the FrameIndex operand
    refers to memory that might be tagged, and needs to be handled with
    care. Such operand must be lowered to [SP, #imm] directly, without a
    scratch register.

    The transformation pass attempts to predict when the offset will be
    out of range and disable the optimization.
    AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
    this prediction has been wrong, but it is quite inefficient and should
    be avoided.

    Reviewers: pcc, vitalybuka, ostannard

    Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by eugenis
  13. [DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI. (detail/ViewSVN)
    by rksimon
  14. FileManager: Remove ShouldCloseOpenFile argument from getBufferForFile, NFC

    Remove this dead code.  We always close it. (detail/ViewSVN)
    by Duncan P. N. Exon Smith
  15. [INSTRUCTIONS] Add support of const for getLoadStorePointerOperand() and
    Reviewer: hsaito, sebpop, reames, hfinkel, mkuper, bogner, haicheng,
    arsenm, lattner, chandlerc, grosser, rengolin
    Reviewed By: reames
    Subscribers: wdng, llvm-commits, bmahjour
    Tag: LLVM
    Differential Revision: (detail/ViewSVN)
    by whitneyt
  16. [Attributor] Fix: do not pretend to preserve the CFG (detail/ViewSVN)
    by jdoerfert
  17. [X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site.

    I'm looking at unfolding broadcast loads on AVX512 which will
    require refactoring this code to select broadcast opcodes instead
    of regular load/stores in some cases. Merging them to avoid
    further complicating their interfaces. (detail/ViewSVN)
    by ctopper
  18. [clangd] Add highlighting for macro expansions.


    Reviewers: hokein, ilya-biryukov

    Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by jvikstrom
  19. Revert [Clang Interpreter] Initial patch for the constexpr interpreter

    This reverts r370476 (git commit a5590950549719d0d9ea69ed164b0c8c0f4e02e6) (detail/ViewSVN)
    by nand
  20. [Attributor] Use existing function information for the call site

    Instead of recomputing information for call sites we now use the
    function information directly. This is always valid and once we have
    call site specific information we can improve here.

    This patch also bootstraps attributes that are created on-demand through
    an initial update call. Information that is known will then directly be
    available in the new attribute without causing an iteration delay.

    The tests show how this improves the iteration count.

    Reviewers: sstefan1, uenoku

    Subscribers: hiraditya, bollu, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by jdoerfert
  21. [Attributor] Manifest load/store alignment generally

    Any pointer could have load/store users not only floating ones so we
    move the manifest logic for alignment into the AAAlignImpl class.

    Reviewers: uenoku, sstefan1

    Subscribers: hiraditya, bollu, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by jdoerfert
  22. [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI. (detail/ViewSVN)
    by rksimon
  23. [Clang Interpreter] Initial patch for the constexpr interpreter

    This patch introduces the skeleton of the constexpr interpreter,
    capable of evaluating a simple constexpr functions consisting of
    if statements. The interpreter is described in more detail in the
    RFC. Further patches will add more features.

    Reviewers: Bigcheese, jfb, rsmith

    Subscribers: bruno, uenoku, ldionne, Tyker, thegameg, tschuett, dexonsmith, mgorny, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by nand
  24. [InstCombine][AMDGPU] Simplify tbuffer loads

    Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.

    Reviewers: arsenm, nhaehnle

    Reviewed By: arsenm

    Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by Piotr Sobczak
  25. [llvm-nm] Small fix to Exected<StringRef>

    Differential Revision: (detail/ViewSVN)
    by sidneym
  26. [clangd] Added highlighting for structured bindings.

    Summary: Structured bindings are in a BindingDecl. The decl the declRefExpr points to are the BindingDecls. So this adds an additional if statement in the addToken function to highlight them.

    Reviewers: hokein, ilya-biryukov

    Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by jvikstrom
  27. [yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther".

    Currenly we can encode the 'st_other' field of symbol using 3 fields.
    'Visibility' is used to encode STV_* values.
    'Other' is used to encode everything except the visibility, but it can't handle arbitrary values.
    'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough.

    'st_other' field is used to encode symbol visibility and platform-dependent
    flags and values. Problem to encode it is that it consists of Visibility part (STV_* values)
    which are enumeration values and the Other part, which is different and inconsistent.

    For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16.
    (Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make
    STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...)

    And for PPC64 the Other part might actually encode any value.

    This patch implements custom logic for handling the st_other and removes
    'Visibility' and 'StOther' fields.

    Here is an example of a new YAML style this patch allows:

    - Name:  foo
      Other: [ 0x4 ]
    - Name:  bar
      Other: [ STV_PROTECTED, 4 ]
    - Name:  zed

    Differential revision: (detail/ViewSVN)
    by grimar
  28. [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts. (detail/ViewSVN)
    by rksimon
  29. [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types.

    This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total. (detail/ViewSVN)
    by rksimon
  30. [mips] Merge common checkings under the same check prefix. NFC (detail/ViewSVN)
    by atanasyan
  31. [RISCV] Fix a couple of tests' CHECKs (detail/ViewSVN)
    by luismarques
  32. Remove an extra ";", NFC. (detail/ViewSVN)
    by hokein
  33. [X86] Add tests for rotate matching. NFC (detail/ViewSVN)
    by deadalnix
  34. [CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC

    Found a couple of places in the code where all the PHI nodes
    of a MBB is updated, replacing references to one MBB by
    reference to another MBB instead.

    This patch simply refactors the code to use a common helper
    (MachineBasicBlock::replacePhiUsesWith) for such PHI node

    Reviewers: t.p.northover, arsenm, uabelho

    Subscribers: wdng, hiraditya, jsji, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by bjope
  35. [ASTImporter] Do not look up lambda classes

    Consider this code:
          void f() {
            auto L0 = [](){};
            auto L1 = [](){};

    First we import `L0` then `L1`. Currently we end up having only one
    CXXRecordDecl for the two different lambdas. And that is a problem if
    the body of their op() is different. This happens because when we import
    `L1` then lookup finds the existing `L0` and since they are structurally
    equivalent we just map the imported L0 to be the counterpart of L1.

    We have the same problem in this case:
          template <typename F0, typename F1>
          void f(F0 L0 = [](){}, F1 L1 = [](){}) {}


    In StructuralEquivalenceContext we could distinquish lambdas only by
    their source location in these cases. But we the lambdas are actually
    structrually equivalent they differn only by the source location.

    Thus, the  solution is to disable lookup completely if the decl in
    the "from" context is a lambda.
    However, that could have other problems: what if the lambda is defined
    in a header file and included in several TUs? I think we'd have as many
    duplicates as many includes we have. I think we could live with that,
    because the lambda classes are TU local anyway, we cannot just access
    them from another TU.

    Reviewers: a_sidorin, a.sidorin, shafik

    Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by martong
  36. [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros

    Return a proper zero vector, just in case some elements are undef.

    Noticed by inspection after dealing with a similar issue in PR43159. (detail/ViewSVN)
    by rksimon
  37. Fix Wdocumentation warning. NFCI. (detail/ViewSVN)
    by rksimon
  38. [llvm-objcopy] Allow the visibility of symbols created by --binary and
    --add-symbol to be specified with --new-symbol-visibility (detail/ViewSVN)
    by chrisj
  39. [ASTImporter] Propagate errors during import of overridden methods.

    If importing overridden methods fails for a method it can be seen
    incorrectly as non-virtual. To avoid this inconsistency the method
    is marked with import error to avoid later use of it.

    Reviewers: martong, a.sidorin, shafik, a_sidorin

    Reviewed By: martong, shafik

    Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by balazske
  40. [Attributor] Implement AANoAliasCallSiteArgument initialization

    Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.

    Reviewers: jdoerfert, sstefan1

    Reviewed By: jdoerfert

    Subscribers: hiraditya, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by uenoku
  41. [Clangd] ExtractFunction Added checks for broken control flow

    - Added checks for broken control flow
    - Added unittests

    Reviewers: sammccall, kadircet

    Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by sureyeaah
  42. [LoopIdiomRecognize] BCmp loop idiom recognition

    @mclow.lists brought up this issue up in IRC.
    It is a reasonably common problem to compare some two values for equality.
    Those may be just some integers, strings or arrays of integers.

    In C, there is `memcmp()`, `bcmp()` functions.
    In C++, there exists `std::equal()` algorithm.
    One can also write that function manually.

    libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
    various types, but not `std::byte` from C++2a.

    libc++ does not do anything like that, it simply relies on simple C++'s
    `operator==()`. (GOOD!)

    So likely, there exists a certain performance opportunities.
    Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
    is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

    #include <algorithm>
    #include <cmath>
    #include <cstdint>
    #include <iterator>
    #include <limits>
    #include <random>
    #include <type_traits>
    #include <utility>
    #include <vector>

    #include "benchmark/benchmark.h"

    template <class T>
    bool equal(T* a, T* a_end, T* b) noexcept {
      for (; a != a_end; ++a, ++b) {
        if (*a != *b) return false;
      return true;

    template <typename T>
    std::vector<T> getVectorOfRandomNumbers(size_t count) {
      std::random_device rd;
      std::mt19937 gen(rd());
      std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
      std::vector<T> v;
      std::generate_n(std::back_inserter(v), count,
                      [&dis, &gen]() { return dis(gen); });
      assert(v.size() == count);
      return v;

    struct Identical {
      template <typename T>
      static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
        auto Tmp = getVectorOfRandomNumbers<T>(count);
        return std::make_pair(Tmp, std::move(Tmp));

    struct InequalHalfway {
      template <typename T>
      static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
        auto V0 = getVectorOfRandomNumbers<T>(count);
        auto V1 = V0;
        V1[V1.size() / size_t(2)]++;  // just change the value.
        return std::make_pair(std::move(V0), std::move(V1));

    template <class T, class Gen>
    void BM_bcmp(benchmark::State& state) {
      const size_t Length = state.range(0);

      const std::pair<std::vector<T>, std::vector<T>> Data =
          Gen::template Gen<T>(Length);
      const std::vector<T>& a = Data.first;
      const std::vector<T>& b = Data.second;
      assert(a.size() == Length && b.size() == a.size());


      for (auto _ : state) {
        const bool is_equal = equal(, + a.size(),;
      state.counters["eltcnt"] =
          benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
      state.counters["eltcnt/sec"] =
          benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
      const size_t BytesRead = 2 * sizeof(T) * Length;
      state.counters["bytes_read/iteration"] =
          benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
      state.counters["bytes_read/sec"] = benchmark::Counter(
          BytesRead, benchmark::Counter::kIsIterationInvariantRate,

    template <typename T>
    static void CustomArguments(benchmark::internal::Benchmark* b) {
      const size_t L2SizeBytes = []() {
        for (const benchmark::CPUInfo::CacheInfo& I :
             benchmark::CPUInfo::Get().caches) {
          if (I.level == 2) return I.size;
        return 0;
      // What is the largest range we can check to always fit within given L2 cache?
      const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                            /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
      b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);

    BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)

    BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    $ ~/src/googlebenchmark/tools/ --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
    RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
    2019-04-25 21:17:11
    Running build-old/test/llvm-bcmp-bench
    Run on (8 X 4000 MHz CPU s)
    CPU Caches:
      L1 Data 16K (x8)
      L1 Instruction 64K (x4)
      L2 Unified 2048K (x4)
      L3 Unified 8192K (x1)
    Load Average: 0.65, 3.90, 4.14
    Benchmark                                         Time             CPU   Iterations UserCounters...
    BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
    BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
    BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
    BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
    BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
    BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
    BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
    BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
    BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
    BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
    BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
    BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
    BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
    BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
    BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
    BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
    BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
    BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
    BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
    BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
    BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
    BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
    BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
    BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
    RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
    2019-04-25 21:19:29
    Running build-new/test/llvm-bcmp-bench
    Run on (8 X 4000 MHz CPU s)
    CPU Caches:
      L1 Data 16K (x8)
      L1 Instruction 64K (x4)
      L2 Unified 2048K (x4)
      L3 Unified 8192K (x1)
    Load Average: 1.01, 2.85, 3.71
    Benchmark                                         Time             CPU   Iterations UserCounters...
    BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
    BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
    BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
    BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
    BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
    BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
    BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
    BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
    BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
    BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
    BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
    BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
    BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
    BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
    BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
    BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
    BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
    BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
    BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
    BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
    BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
    BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
    BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
    BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
    Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
    Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
    BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
    BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
    BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
    BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
    BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
    BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
    BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
    BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498

    What can we tell from the benchmark?
    * Performance of naive equality check somewhat improves with element size,
      maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
      for uint64_t. I think, that instability implies performance problems.
    * Performance of `memcmp()`-aware benchmark always maxes out at around
      bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
      naive variant!
    * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
      eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
      linearly decreases with element size.
      For uint64_t, it's ~4x+ the elements/second.
    * The call obvious is more pricey than the loop, with small element count.
      As it can be seen from the full output {F8768210}, the `memcmp()` is almost
      universally worse, independent of the element size (and thus buffer size) when
      element count is less than 8.

    So all in all, bcmp idiom does indeed pose untapped performance headroom.
    This diff does implement said idiom recognition. I think a reasonable test
    coverage is present, but do tell if there is anything obvious missing.

    Now, quality. This does succeed to build and pass the test-suite, at least
    without any non-bundled elements. {F8768216} {F8768217}
    This transform fires 91 times:
    $ /build/test-suite/utils/ -m loop-idiom.NumBCmp result-new.json
    Tests: 1149
    Metric: loop-idiom.NumBCmp

    Program                                         result-new

    MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
    MultiSource/Applications/d/make_dparser         3.00
    SingleSource/UnitTests/vla                      2.00
    MultiSource/Applications/Burg/burg              1.00
    MultiSourc.../Applications/JM/lencod/lencod     1.00
    MultiSource/Applications/lemon/lemon            1.00
    MultiSource/Benchmarks/Bullet/bullet            1.00
    MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00     1.00
    MultiSourc...Prolangs-C/simulator/simulator     1.00
    The size changes are:
    I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
    $ /build/test-suite/utils/ -m size..text result-{old,new}.json --filter-hash
    Tests: 1149
    Same hash: 907 (filtered out)
    Remaining: 242
    Metric: size..text

    Program                                        result-old result-new diff
    test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
    test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
    test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
    test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
    test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
    test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
    test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
    test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
    test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
    test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
    test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
    test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
    test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
    test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
    test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
    Geomean difference                                                   nan%
             result-old    result-new       diff
    count  1.000000e+01  10.00000      10.000000
    mean   3.152090e+05  311695.40000  0.006749
    std    3.790398e+05  372091.42232  0.036605
    min    7.530000e+02  833.00000    -0.034981
    25%    4.243300e+04  42401.00000  -0.000866
    50%    1.197370e+05  119689.00000 -0.000392
    75%    6.397050e+05  639705.00000 -0.000005
    max    1.001697e+06  966657.00000  0.106242

    I don't have timings though.

    And now to the code. The basic idea is to completely replace the whole loop.
    If we can't fully kill it, don't transform.
    I have left one or two comments in the code, so hopefully it can be understood.

    Also, there is a few TODO's that i have left for follow-ups:
    * widening of `memcmp()`/`bcmp()`
    * step smaller than the comparison size
    * Metadata propagation
    * more than two blocks as long as there is still a single backedge?
    * ???

    Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

    Reviewed By: courbet

    Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by lebedevri
  43. [NFC] SCEVExpander: add SetCurrentDebugLocation() / getCurrentDebugLocation() wrappers

    The internal `Builder` is private, which means there is
    currently no way to set the debuginfo locations for `SCEVExpander`.
    This only adds the wrappers, but does not use them anywhere.

    Reviewers: mkazantsev, sanjoy, gberry, jyknight, dneilson

    Reviewed By: sanjoy

    Subscribers: javed.absar, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by lebedevri
  44. [clangd] Collecting main file macro expansion locations in ParsedAST.

    Summary: TokenBuffer does not collect macro expansions inside macro arguments which is needed for semantic higlighting. Therefore collects macro expansions in the main file in a PPCallback when building the ParsedAST instead.

    Reviewers: hokein, ilya-biryukov

    Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by jvikstrom
  45. [Tooling] Migrated APIs that take ownership of objects to unique_ptr

    Subscribers: jkorous, arphaman, kadircet, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by gribozavr
  46. [LiveDebugValues] Insert entry values after bundles

    Change LiveDebugValues so that it inserts entry values after the bundle
    which contains the clobbering instruction. Previously it would insert
    the debug value after the bundle head using insertAfter(), breaking the

    Reviewers: djtodoro, NikolaPrica, aprantl, vsk

    Reviewed By: vsk

    Subscribers: hiraditya, llvm-commits

    Tags: #debug-info, #llvm

    Differential Revision: (detail/ViewSVN)
    by dstenb
  47. [clangd] Add .vscode-test to .gitignore.

    Reviewers: jvikstrom

    Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by hokein
  48. [CodeGen]: fix error message for "=r" asm constraint

    Nico Weber reported that the following code:
      char buf[9];
      asm("" : "=r" (buf));

    yields the "impossible constraint in asm: can't store struct into a register"
    error message, although |buf| is not a struct (see

    Make the error message more generic and add a test for it.
    Also make sure other tests in x86_64-PR42672.c check for the full error

    Reviewers: eli.friedman, thakis

    Subscribers: cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by glider
  49. vim: add `immarg` keyword

    The `immarg` attribute was added in r355981. (detail/ViewSVN)
    by svenvh
  50. gn build: Merge r370441 (detail/ViewSVN)
    by nico
  51. [ADT] Removed VariadicFunction

    It is not used. It uses macro-based unrolling instead of variadic
    templates, so it is not idiomatic anymore, and therefore it is a
    questionable API to keep "just in case".

    Subscribers: mgorny, dmgreen, dexonsmith, llvm-commits

    Tags: #llvm

    Differential Revision: (detail/ViewSVN)
    by gribozavr
  52. [LLD] [COFF] Support merging resource object files

    Extend WindowsResourceParser to support using a ResourceSectionRef for
    loading resources from an object file.

    Only allow merging resource object files in mingw mode; keep the
    existing error on multiple resource objects in link mode.

    If there only is one resource object file and no .res resources,
    don't parse and recreate the .rsrc section, but just link it in without
    inspecting it. This allows users to produce any .rsrc section (outside
    of what the parser supports), just like before. (I don't have a specific
    need for this, but it reduces the risk of this new feature.)

    Separate out the .rsrc section chunks in InputFiles.cpp, and only include
    them in the list of section chunks to link if we've determined that there
    only was one single resource object. (We need to keep other chunks from
    those object files, as they can legitimately contain other sections as
    well, in addition to .rsrc section chunks.)

    Differential Revision: (detail/ViewSVN)
    by mstorsjo
  53. [WindowsResource] Remove use of global variables in WindowsResourceParser

    Instead of updating a global variable counter for the next index of
    strings and data blobs, pass along a reference to actual data/string
    vectors and let the TreeNode insertion methods add their data/strings to
    the vectors when a new entry is needed.

    Additionally, if the resource tree had duplicates, that were ignored
    with -force:multipleres in lld, we no longer store all versions of the
    duplicated resource data, now we only keep the one that actually ends
    up referenced.

    Differential Revision: (detail/ViewSVN)
    by mstorsjo
  54. [WindowsResource] Avoid duplicating the input filenames for each resource. NFC.

    Differential Revision: (detail/ViewSVN)
    by mstorsjo
  55. [COFF] Add a ResourceSectionRef method for getting resource contents

    This allows llvm-readobj to print the contents of each resource
    when printing resources from an object file or executable, like it
    already does for plain .res files.

    This requires providing the whole COFFObjectFile to ResourceSectionRef.

    This supports both object files and executables. For executables,
    the DataRVA field is used as is to look up the right section.

    For object files, ideally we would need to complete linking of them
    and fix up all relocations to know what the DataRVA field would end up
    being. In practice, the only thing that makes sense for an RVA field
    is an ADDR32NB relocation. Thus, find a relocation pointing at this
    field, verify that it has the expected type, locate the symbol it
    points at, look up the section the symbol points at, and read from the
    right offset in that section.

    This works both for GNU windres object files (which use one single
    .rsrc section, with all relocations against the base of the .rsrc
    section, with the original value of the DataRVA field being the
    offset of the data from the beginning of the .rsrc section) and
    cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections,
    and one symbol per data entry, with the original pre-relocated DataRVA
    field being set to zero).

    Differential Revision: (detail/ViewSVN)
    by mstorsjo
  56. [MIPS GlobalISel] Lower uitofp

    Add custom lowering for G_UITOFP for MIPS32.

    Differential Revision: (detail/ViewSVN)
    by petar.avramovic
  57. [MIPS GlobalISel] Lower fptoui

    Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
    in TargetLowering::expandFP_TO_UINT.
    Lower G_FPTOUI for MIPS32.

    Differential Revision: (detail/ViewSVN)
    by petar.avramovic
  58. [CodeGen] Fix lowering for returning the result of an extractvalue

    When the number of return values exceeds the number of registers available,
    SelectionDAGBuilder::visitRet transforms a function's return to use a
    pointer to a buffer to hold return values. When the returned value is an
    operator such as extractvalue, the value may have a non-zero result number.
    Add that number to the indexing when obtaining the values to store.

    This fixes

    Differential Revision: (detail/ViewSVN)
    by djg
  59. [clangd] Add distinct highlightings for static fields and methods

    Reviewers: hokein, ilya-biryukov, jvikstrom

    Reviewed By: hokein

    Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits

    Tags: #clang

    Differential Revision: (detail/ViewSVN)
    by nridge
  60. [PowerPC][NFC] Use inline Subtarget->isPPC64()

    To be consistent with all the other instances. (detail/ViewSVN)
    by jsji
  61. [PowerPC][NFC] Use -mtriple in RUN line, remove target triple in tls.ll

    To avoid confusion, especially when -mtriple are also added for PPC32. (detail/ViewSVN)
    by jsji
  62. [PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO

    Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs,
    ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use
    _LO without a paired _HA.

    Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and
    get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO}
    don't have good linker support:

    (a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}.
    (b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation:

      // a.o
      addis 3, 3, tsd_tls@got@tprel@ha
      lwz 3, tsd_tls@got@tprel@l(3)
      add 3, 3, tsd_tls@tls
      // b.o
      .section .tdata,"awT"; .globl tsd_tls; tsd_tls:

      // ld/ld-new a.o b.o
      internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section

    Reviewed By: adalava

    Differential Revision: (detail/ViewSVN)
    by maskray
  63. [clang-scan-deps] NFC, refactor the DependencyScanningWorker to use a consumer
    to report the dependencies to the client

    This will allow the scanner to report modular dependencies to the consumer.
    This will also allow the scanner to accept regular cc1 clang invocations, e.g.
    in an implementation of a libclang C API for clang-scan-deps, that I will add
    follow-up patches for in the future. (detail/ViewSVN)
    by arphaman
  64. [X86] Explicitly list all the always trivially rematerializable instructions.

    Add a default with an llvm_unreachable for anything we don't expect.

    This seems safer that just blindly returning true for anything
    missing from the switch. (detail/ViewSVN)
    by ctopper

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18208
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18209
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18210
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18211
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18212
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18213
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18214
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18215
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18216
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18217
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18218
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18219
originally caused by:

Started by upstream project Clang Stage 2: cmake, R -g Tsan, using Stage 1 RA build number 18220
originally caused by:

This run spent:

  • 17 hr waiting;
  • 17 hr build duration;
  • 1 day 11 hr total from scheduled to completion.

Identified problems

Missing test results

The test result file Jenkins is looking for does not exist after the build.
Indication 1

Ninja target failed

Below is a link to the first failed ninja target.
Indication 2

Regression test failed

This build failed because a regression test in the test suite FAILed. See the test report for details.
Indication 3

Compile Error

This build failed because of a compile error. Below is a list of all errors in the build log:
Indication 4