Test Result: 675 tests failing out of a total of 1,401 tests.0
Build stability: All recent builds failed.0
Build History
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems
 Identified problems


#84 (Dec 3, 2018 5:57:49 AM)

  1. [PDB] Support PDB-backed expressions evaluation (+ fix stuck test)

    This patch contains several small fixes, which makes it possible to evaluate
    expressions on Windows using information from PDB. The changes are:
    - several sanitize checks;
    - make IRExecutionUnit::MemoryManager::getSymbolAddress to not return a magic
      value on a failure, because callers wait 0 in this case;
    - entry point required to be a file address, not RVA, in the ObjectFilePECOFF;
    - do not crash on a debuggee second chance exception - it may be an expression
      evaluation crash. Also fix detection of "crushed" threads in tests;
    - create parameter declarations for functions in AST to make it possible to call
      debugee functions from expressions;
    - relax name searching rules for variables, functions, namespaces and types. Now
      it works just like in the DWARF plugin;
    - fix endless recursion in SymbolFilePDB::ParseCompileUnitFunctionForPDBFunc.

    Reviewers: zturner, asmith, stella.stamenova

    Reviewed By: stella.stamenova, asmith

    Tags: #lldb

    Differential Revision: — aleksandr.urakov / detail
  2. [CodeComplete] Cleanup access checking in code completion

    Summary: Also fixes a crash (see the added 'accessibility-crash.cpp' test).

    Reviewers: ioeric, kadircet

    Reviewed By: kadircet

    Subscribers: cfe-commits

    Differential Revision: — ibiryukov / detail
  3. [Sema] Avoid CallExpr::setNumArgs in Sema::BuildCallToObjectOfClassType

    CallExpr::setNumArgs is the only thing that prevents storing the arguments
    of a call expression in a trailing array since it might resize the argument
    array. setNumArgs is only called in 3 places in Sema, and for all of them it
    is possible to avoid it.

    This deals with the call to setNumArgs in BuildCallToObjectOfClassType.
    Instead of constructing the CXXOperatorCallExpr first and later calling
    setNumArgs if we have default arguments, we first construct a large
    enough SmallVector, do the promotion/check of the arguments, and
    then construct the CXXOperatorCallExpr.

    Incidentally this also avoid reallocating the arguments when the
    call operator has default arguments but this is not the primary goal.

    Differential Revision:

    Reviewed By: aaron.ballman — brunoricci / detail
  4. [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos

    The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
    into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
    and for S_SUB_U64_PSEUDO
    preclude the use of SDWA to encode a constant.
    SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
    but not on VOP3 instructions.

    We desire to fold the bit-and operand into the instruction encoding
    for the V_ADD_I32 instruction. This requires that we transform the
    VOP3 into a VOP2 form of the instruction (_e32).
      %19:vgpr_32 = V_AND_B32_e32 255,
          killed %16:vgpr_32, implicit $exec
      %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
          %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
    %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
          %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

    which then allows the SDWA encoding and becomes
      %47:vgpr_32 = V_ADD_I32_sdwa
          0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
          implicit-def $vcc, implicit $exec
      %48:vgpr_32 = V_ADDC_U32_e32
          0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec

    Differential Revision: — ronlieb / detail
  5. [AST] Fix an uninitialized bug in the bits of FunctionDecl

    FunctionDeclBits.IsCopyDeductionCandidate was not initialized.
    This caused a warning with valgrind. — brunoricci / detail
  6. Portable Python script across Python version

    Python3 does not support type destructuring in function parameters.

    Differential Revision: — serge_sans_paille / detail
  7. [AST][NFC] Pack CXXDeleteExpr

    Use the newly available space in the bit-fields of Stmt.
    This saves 8 bytes per CXXDeleteExpr. NFC. — brunoricci / detail

#83 (Dec 3, 2018 4:20:46 AM)

  1. Portable Python script across version

    Have all classes derive from object: that's implicitly the default in Python3,
    it needs to be done explicilty in Python2.

    Differential Revision: — serge_sans_paille / detail
  2. Portable Python script across Python version

    Python2 supports the two following equivalent construct

    raise ExceptionType, exception_value
    raise ExceptionType(exception_value)

    Only the later is supported by Python3.

    Differential Revision: — serge_sans_paille / detail
  3. [Analyzer] Actually check for -model-path being a directory

    The original patch (r348038) clearly contained a typo and checked
    for '-ctu-dir' twice. — ibiryukov / detail
  4. [Analysis] Properly prepare test env in test/Analysis/undef-call.c

    The test expectes the '%T/ctudir' to be present, but does not create it. — ibiryukov / detail
  5. [clang] Do not read from 'test/SemaCXX/Inputs' inside 'test/AST'

    Our integrate relies on test inputs being taken from the same diretory as the
    test itself. — ibiryukov / detail
  6. ARM: use target-specific SUBS node when combining cmp with cmov.

    This has two positive effects. First, using a custom node prevents
    recombination leading to an infinite loop since the output DAG is notionally a
    little more complex than the input one. Using a flag-setting instruction also
    allows the subtraction to be folded with the related comparison more easily. — Tim Northover / detail
  7. [NFC][AArch64] Split out backend features

    This patch splits backend features currently
    hidden behind architecture versions.

    For example, currently the only way to activate
    complex numbers extension is targeting an v8.3
    architecture, where after the patch this extension
    can be added separately.

    This refactoring is required by the new command lines proposal:

    Reviewers: DavidSpickett, olista01, t.p.northover

    Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

    Differential revision: — dnsampaio / detail
  8. [OpenCL][Sema] Improve BuildResolvedCallExpr handling of builtins

    This is a follow-up on, addressing a few issues.

    - adds a FIXME for later improvement for specific builtins: I previously have only checked OpenCL ones and ensured tests cover those.
    - fixed the CallExpr type.

    Reviewers: riccibruno

    Reviewed By: riccibruno

    Subscribers: yaxunl, Anastasia, kristina, svenvh, cfe-commits

    Differential Revision: — mantognini / detail
  9. [CMake] Add LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR for custom dSYM target directory on Darwin

    Summary: When using `LLVM_EXTERNALIZE_DEBUGINFO` in LLDB, the default dSYM location for the shared library in LLDB.framework is inside the framework bundle. With `LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR` we can easily fix that. I consider it a useful feature to be able to set a global output directory for external debug info (rather then having a target-specific one). Only implemented for Darwin so far.

    Reviewers: beanz, aprantl

    Reviewed By: aprantl

    Subscribers: mgorny, aprantl, #lldb, lldb-commits, llvm-commits

    Differential Revision: — stefan.graenitz / detail
  10. [RISCV] Fix test/MC/Disassembler/RISCV/invalid-instruction.txt after rL347988

    The test for [0x00 0x00] failed due to the introduction of c.unimp.

    This particular test is unnecessary now that c.unimp was defined (and is
    tested in test/MC/RISCV/rv32c-valid.s). — asb / detail
  11. [CMake] Store path to vendor-specific headers in clang-headers target property

    LLDB.framework wants a copy these headers. With this change LLDB can easily glob for the list of files:
    get_target_property(clang_include_dir clang-headers RUNTIME_OUTPUT_DIRECTORY)
    file(GLOB_RECURSE clang_vendor_headers RELATIVE ${clang_include_dir} "${clang_include_dir}/*")

    By default `RUNTIME_OUTPUT_DIRECTORY` is unset for custom targets like `clang-headers`.

    Reviewers: aprantl, JDevlieghere, davide, friss, dexonsmith

    Reviewed By: JDevlieghere

    Subscribers: mgorny, #lldb, cfe-commits, llvm-commits

    Differential Revision: — stefan.graenitz / detail
  12. [llvm-dwarfdump] - Stop printing the bogus empty section name on invalid dwarf.

    When there is no .debug_addr section for some reason,
    llvm-dwarfdump would print the bogus empty section name when dumping ranges
    in .debug_info:

    DW_AT_ranges [DW_FORM_rnglistx]   (indexed (0x0) rangelist = 0x00000004
        [0x0000000000000000, 0x0000000000000001) ""
        [0x0000000000000000, 0x0000000000000002) "")

    That happens because of the code which uses 0 (zero) as a section index as a default value.
    The code should use -1ULL instead because technically 0 is a valid zero section index
    in ELF and -1ULL is a special constant used that means "no section available".

    This is mostly a fix for the overall correctness/safety of the code,
    but a test case is provided too.

    Differential revision: — grimar / detail
  13. [ARM][MC] Move information about variadic register defs into tablegen

    Currently, variadic operands on an MCInst are assumed to be uses,
    because they come after the defs. However, this is not always the case,
    for example the Arm/Thumb LDM instructions write to a variable number of

    This adds a property of instruction definitions which can be used to
    mark variadic operands as defs. This only affects MCInst, because
    MachineInstruction already tracks use/def per operand in each instance
    of the instruction, so can already represent this.

    This property can then be checked in MCInstrDesc, allowing us to remove
    some special cases in ARMAsmParser::isITBlockTerminator.

    Differential revision: — olista01 / detail
  14. [ARM][Asm] Debug trace for the processInstruction loop

    In the Arm assembly parser, we first match an instruction, then call
    processInstruction to possibly change it to a different encoding, to
    match rules in the architecture manual which can't be expressed by the
    table-generated matcher.

    This adds debug printing so that this process is visible when using the
    -debug option.

    To support this, I've added a new overload of MCInst::dump_pretty which
    takes the opcode name as a StringRef, since we don't have an InstPrinter
    instance in the assembly parser. Instead, we can get the same
    information directly from the MCInstrInfo.

    Differential revision: — olista01 / detail
  15. [KMSAN] Enable -msan-handle-asm-conservative by default

    This change enables conservative assembly instrumentation in KMSAN builds
    by default.
    It's still possible to disable it with -msan-handle-asm-conservative=0
    if something breaks. It's now impossible to enable conservative
    instrumentation for userspace builds, but it's not used anyway. — glider / detail

#82 (Dec 3, 2018 1:50:11 AM)

  1. [GlobalISel] Fix test irtranslator-stackprotect-check.ll

    Fix for commit r347862. Use correct AArch64 triple in test
    CodeGen/AArch64/GlobalISel/irtranslator-stackprotect-check.ll. — petr.pavlu / detail

#80 (Dec 3, 2018 12:42:22 AM)

  1. [ARM] FP16: support vld1.16 for vector loads with post-increment

    Differential Revision: — sjoerdmeijer / detail

#72 (Dec 2, 2018 9:15:28 PM)

  1. [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction

    There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
    function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
    These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

    Reviewed By: steven.zhang

    Differential Revision: — zhangkang / detail
  2. [NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not.

    In theory, we should let the PPC target to determine how to lower the TOC Entry for globals.
    And the PPCTargetLowering requires this query to do some optimization for TOC_Entry.

    Differential Revision: — qshanz / detail

#65 (Dec 2, 2018 3:19:08 PM)

  1. [gn build] Fix cosmetic bug in

    Before, #cmakedefine FOO resulted in #define FOO  with a trailing space if FOO
    was set to something truthy. Make it so that it's just #define FOO without a
    trailing space.

    No functional difference.

    Differential Revision: — nico / detail
  2. [gn build] Slightly simplify write_cmake_config.

    Before, the script had a bunch of special cases for #cmakedefine and
    #cmakedefine01 and then did general variable substitution. Now, the script
    always does general variable substitution for all lines and handles the special
    cases afterwards.

    This has no observable effect for the inputs we use, but is easier to explain
    and slightly easier to implement.

    Also mention to link to CMake's configure_file() in the docstring.

    (The new behavior doesn't quite match CMake on lines like #cmakedefine ${FOO},
    but nobody does that.)

    Differential Revision: — nico / detail

#63 (Dec 2, 2018 2:16:13 PM)

  1. [gn build] Add build files for llvm/lib/Analysis and llvm/lib/ProfileData

    Differential Revision: — nico / detail

#60 (Dec 2, 2018 11:57:47 AM)

  1. [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar. — ctopper / detail
  2. [X86] Fix bad comment. NFC — ctopper / detail

#58 (Dec 2, 2018 9:42:18 AM)

  1. Replace FullComment member being visited with parameter

    Reviewers: aaron.ballman

    Subscribers: cfe-commits

    Differential Revision: — steveire / detail
  2. Extend the CommentVisitor with parameter types

    This has precedent in the StmtVisitor.  This change will make it
    possible to clean up the comment handling in ASTDumper.

    Reviewers: aaron.ballman

    Subscribers: cfe-commits

    Differential Revision: — steveire / detail
  3. Remove unecessary methods

    The base class calls VisitExpr — steveire / detail
  4. [test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSD

    Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required
    for sort implementation on NetBSD.  The '-b' modifier is ineffective
    if specified without any key.  Per the manpage:

      Note that the -b option has no effect unless key fields are specified.

    Differential Revision: — mgorny / detail
  5. [test] Fix ScalarEvolution test to allow __func__ with prototype

    Fix ScalarEvolution/solve-quadratic.ll test to account for __func__
    output listing the complete function prototype rather than just its
    name, as it does on NetBSD.

    Example Linux output:

      GetQuadraticEquation: addrec coeff bw: 4
      GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2

    Example NetBSD output:

      llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): addrec coeff bw: 4
      llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2

    Differential Revision: — mgorny / detail
  6. [test] Fix BugPoint/compile-custom.ll to use detected python exec

    Spawn the custom compile command in BugPoint/compile-custom.ll via
    %python rather than relying on implicit 'env python' shebang, in order
    to fix it on systems that don't have 'python' executable such as NetBSD.

    Differential Revision: — mgorny / detail
  7. Fix whitespace — steveire / detail

#57 (Dec 2, 2018 8:43:02 AM)

  1. Add dump tests for ArrayInitLoopExpr and ArrayInitIndexExpr — steveire / detail
  2. [ValueTracking] Support funnel shifts in computeKnownBits()

    If the shift amount is known, we can determine the known bits of the
    output based on the known bits of two inputs.

    This is essentially the same functionality as implemented in D54869,
    but for ValueTracking rather than InstCombine SimplifyDemandedBits.

    Differential Revision: — nikic / detail

#56 (Dec 2, 2018 6:16:33 AM)

  1. [SelectionDAG] fold constant with undef vector per element

    This makes the SDAG behavior consistent with the way we do this in IR.
    It's possible that we were getting the wrong answer before. For example,
    'xor undef, undef --> 0' but 'xor undef, C' --> undef.

    But the most practical improvement is likely as shown in the tests here -
    for FP, we were overconstraining undef lanes to NaN, and that can prevent
    vector simplifications/narrowing (see D51553). — spatel / detail
  2. [DAGCombiner] guard against an oversized shift crash

    This change prevents the crash noted in the post-commit comments
    for rL347478 :

    We can't guarantee that an oversized shift amount is folded away,
    so we have to check for it.

    Note that I committed an incomplete fix for that crash with:

    But as discussed here:
    ...we have to try harder.

    So I'm not sure how to expose the bug now (and apparently no fuzzers have found
    a way yet either).

    On the plus side, we have discovered that we're missing real optimizations by
    not simplifying nodes sooner, so the earlier fix still has value, and there's
    likely more value in extending that so we can simplify more opcodes and simplify
    when doing RAUW and/or putting nodes on the combiner worklist.

    Differential Revision: — spatel / detail
  3. [ValueTracking] add helper function for testing implied condition; NFCI

    We were duplicating code around the existing isImpliedCondition() that
    checks for a predecessor block/dominating condition, so make that a
    wrapper call. — spatel / detail

#44 (Dec 2, 2018 12:18:03 AM)

  1. [X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast.

    Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations. — ctopper / detail

#39 (Dec 1, 2018 9:53:41 PM)

  1. [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.

    Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register. — ctopper / detail
  2. [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.

    The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

    By custom legalizing it we can avoid this churn and maybe produce better code. — ctopper / detail

#29 (Dec 1, 2018 2:42:07 PM)

  1. OpenCL: Improve vector printf warnings

    The vector modifier is considered separate, so
    don't treat it as a conversion specifier.

    This is still not warning on some cases, like
    using a type that isn't a valid vector element.

    Fixes bug 39652 — arsenm / detail

#28 (Dec 1, 2018 2:16:06 PM)

  1. OpenCL: Extend argument promotion rules to vector types

    The spec is ambiguous on whether vector types are allowed to be
    implicitly converted. The only legal context I think this can
    be used for OpenCL is printf, where it seems necessary. — arsenm / detail
  2. [X86] Add vXi8 division/remainder by non-splat constant test cases to prepare for an upcoming patch. — ctopper / detail

#27 (Dec 1, 2018 1:48:53 PM)

  1. [MachineOutliner][AArch64] Improve checks for stack instructions

    If we know that we'll definitely save LR to a register, there's no reason to
    pre-check whether or not a stack instruction is unsafe to fix up.

    This makes it so that we check for that condition before mapping instructions.

    This allows us to outline more, since we don't pessimise as many instructions.

    Also update some tests, since we outline more. — paquette / detail
  2. Replace w16/w17 in machine-outliner.mir with w11/w12

    These registers should not be used here, since they are interprocedural
    scratch registers in AArch64. — paquette / detail

#24 (Dec 1, 2018 11:36:52 AM)

  1. [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1

    Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

    Reviewers: spatel, RKSimon

    Reviewed By: RKSimon

    Subscribers: llvm-commits

    Differential Revision: — ctopper / detail

#20 (Dec 1, 2018 6:54:28 AM)

  1. [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)

    We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

    For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

    Fixes PR37731

    Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

    Differential Revision: — rksimon / detail

#16 (Dec 1, 2018 4:57:59 AM)

  1. [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR

    The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

    Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

    Differential: — gsellers / detail
  2. [llvm-readobj] Improve dynamic section iteration NFC. — Xing / detail
  3. [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification

    D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.

    This patch relaxes this to demanding an element if we need any bit from it.

    Differential Revision: — rksimon / detail

#15 (Dec 1, 2018 4:12:38 AM)

  1. [InstCombine] Support ssub.sat canonicalization for non-splats

    Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
    support non-splat vector constants. This is done by generalizing
    the implementation of the isNotMinSignedValue() helper to return
    true for constants that are non-splat, but don't contain any
    signed min elements.

    Differential Revision: — nikic / detail

#13 (Dec 1, 2018 1:14:33 AM)

  1. Correct indentation. — void / detail

#12 (Dec 1, 2018 12:50:25 AM)

  1. Specify constant context in constant emitter

    The constant emitter may need to evaluate the expression in a constant context.
    For exasmple, global initializer lists. — void / detail

#11 (Dec 1, 2018 12:01:36 AM)

  1. [X86] Remove stale FIXME from test case. NFC

    This was fixed in r346581. I just forgot to remove it. — ctopper / detail

#9 (Nov 30, 2018 9:53:56 PM)

  1. [ThinLTO] Allow importing of functions with var args

    Follow up to D54270, which allowed importing of var args functions
    unless they called va_start. As pointed out in the post-commit comments
    on that patch, the inliner can handle functions that call va_start in
    certain situations as well. Go ahead and enable importing of all var
    args functions. Measurements on a large binary show that this increases
    imports and binary size by an insignificant amount.

    Reviewers: davidxl

    Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

    Differential Revision: — tejohnson / detail

#8 (Nov 30, 2018 9:08:44 PM)

  1. [RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases

    As noted by Eli Friedman <>,
    the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions.
    SRAW assumed that (sext_inreg foo, i32) could only be produced when
    sign-extended an i32. However, it can be produced by input such as:

    define i64 @tricky_ashr(i64 %a, i64 %b) {
      %1 = shl i64 %a, 32
      %2 = ashr i64 %1, 32
      %3 = ashr i64 %2, %b
      ret i64 %3

    It's important not to select sraw in the above case, because sraw only uses
    bits lower 5 bits from the shift, while a shift of 32-63 would be valid.

    Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be
    produced when zero-extending a value that was originally i32 in LLVM IR. This
    is obviously incorrect.

    This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and
    adds test cases that would demonstrate a miscompile if the incorrect patterns
    were re-added. — asb / detail

#4 (Nov 30, 2018 6:42:02 PM)

  1. [Basic] Move DiagnosticsEngine::dump from .h to .cpp

    The two LLVM_DUMP_METHOD methods have a undefined reference on clang::DiagnosticsEngine::DiagStateMap::dump.

    tools/clang/tools/extra/clangd/benchmarks/IndexBenchmark links in
    clangDaemon but does not link in clangBasic explicitly, which causes a
    linker error "undefined symbol" in !NDEBUG + -DBUILD_SHARED_LIBS=on builds.

    Move LLVM_DUMP_METHOD methods to .cpp to fix IndexBenchmark. They should
    be unconditionally defined as they are also used by non-dump-method #pragma clang __debug diag_mapping — maskray / detail
  2. [projects] Use add_llvm_external_project for implicit projects

    This allows disabling implicit projects via the LLVM_TOOL_*_BUILD
    variables, similar to how implicit tools can be disabled. They'll still
    be enabled by default, since add_llvm_external_project defaults the
    LLVM_TOOL_*_BUILD variables to ON for in-tree implciit projects.

    Differential Revision: — smeenai / detail
  3. [X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512. — ctopper / detail

#3 (Nov 30, 2018 5:35:09 PM)

  1. Relax test to also work on Windows. — Adrian Prantl / detail
  2. Honor -fdebug-prefix-map when creating function names for the debug info.

    This adds a callback to PrintingPolicy to allow CGDebugInfo to remap
    file paths according to -fdebug-prefix-map. Otherwise the debug info
    (particularly function names for C++ lambdas) may contain paths that
    should have been remapped in the debug info.


    Differential Revision: — Adrian Prantl / detail
  3. Use RequireNullTerminator=false in identify_magic.

    identify_magic does not need the file to be null terminated.  Passing
    true here causes the file reading code to decide not to use mmap in
    some rare cases (which happen to be true 100% of the time in PDB files)
    which can lead to very large files failing to load.  Since it was
    probably just an accident that we were passing true here (since it is
    the default function parameter), this should be strictly an improvement. — zturner / detail
  4. [lit] Add a generic build script with a lit substitution.

    This adds a script called as well as a lit substitution
    called %build that we can use to invoke it.  The idea is that
    this allows a lit test to build test inferiors without having
    to worry about architecture / platform specific differences,
    command line syntax, finding / configurationg a proper toolchain,
    and other issues.  They can simply write something like:

    %build --arch=32 -o %t.exe %p/Inputs/foo.cpp

    and it will just work.  This paves the way for being able to
    run lit tests with multiple configurations, platforms, and
    compilers with a single test.

    Differential Revision: — zturner / detail
  5. [NVPTX] Add lowering of i128 numbers as struct fields

    Addition to D34555 - override VTs computation with ComputePTXValueVTs
    for struct fields.

    Author: Denys Zariaiev<>

    Differential Revision: — tra / detail
  6. [X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in interleave/strided load/store cost model tests. — ctopper / detail
  7. [windows] Fix two minor bugs on Windows

    1. In ProcessWindows if we fail to allocate memory, we need to return LLDB_INVALID_ADDRESS rather than 0 or nullptr as that is the invalid address that LLDB looks for
    2. In RegisterContextWindows in ReadAllRegisterValues, always create a new buffer. This is what the other platforms do and data_sp is always null in all tested scenarios on Windows as well — stella.stamenova / detail
  8. [gn build] Add action to generate VCSRevision.h and use it to add llvm/lib/Object/

    Differential Revision: — nico / detail

#2 (Nov 30, 2018 3:51:41 PM)

  1. Revert "Revert r347417 "Re-Reinstate 347294 with a fix for the failures.""

    It seems the two failing tests can be simply fixed after r348037

    Fix 3 cases in Analysis/builtin-functions.cpp
    Delete the bad CodeGen/builtin-constant-p.c for now — maskray / detail
  2. [codeview] Remove dead macros for codeview record serialization, NFC

    These weren't needed when we went to the yaml IO style of serialization,
    which has "mapOptional". — rnk / detail
  3. LegacyDivergenceAnalysis: fix uninitialized value

    Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26 — nha / detail
  4. AMDGPU: Divergence-driven selection of scalar buffer load intrinsics

    Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
    the load is really uniform. So select the scalar load intrinsics directly
    to either VMEM or SMRD buffer loads based on divergence analysis.

    If an offset happens to end up in a VGPR -- either because a floating
    point calculation was involved, or due to other remaining deficiencies
    in SIFixSGPRCopies -- we use v_readfirstlane.

    There is some unrelated churn in tests since we now select MUBUF offsets
    in a unified way with non-scalar buffer loads.

    Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

    Reviewers: arsenm, alex-t, rampitec, tpr

    Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

    Differential Revision: — nha / detail
  5. AMDGPU: Fix various issues around the VirtReg2Value mapping

    The VirtReg2Value mapping is crucial for getting consistently
    reliable divergence information into the SelectionDAG. This
    patch fixes a bunch of issues that lead to incorrect divergence
    info and introduces tight assertions to ensure we don't regress:

    1. VirtReg2Value is generated lazily; there were some cases where
       a lookup was performed before all relevant virtual registers were
       created, leading to an out-of-sync mapping. Those cases were:

      - Complex code to lower formal arguments that generated CopyFromReg
        nodes from live-in registers (fixed by never querying the mapping
        for live-in registers).

      - Code that generates CopyToReg for formal arguments that are used
        outside the entry basic block (fixed by never querying the
        mapping for Register nodes, which don't need the divergence info

    2. For complex values that are lowered to a sequence of registers,
       all registers must be reflected in the VirtReg2Value mapping.

    I am not adding any new tests, since I'm not actually aware of any
    bugs that these problems are causing with trunk as-is. However,
    I recently added a test case (in r346423) which fails when D53283 is
    applied without this change. Also, the new assertions should provide
    most of the effective test coverage.

    There is one test change in sdwa-peephole.ll. The underlying issue
    is that since the divergence info is now correct, the DAGISel will
    select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
    COPY which affects the behavior of MachineLICM in a way that ends up
    with the S_MOV_B32 with the constant in a different basic block than
    the V_OR_B32, which is presumably what defeats the peephole.

    Reviewers: alex-t, arsenm, rampitec

    Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

    Differential Revision: — nha / detail
  6. [DA] GPUDivergenceAnalysis for unstructured GPU kernels

    This is patch #3 of the new DivergenceAnalysis


    The GPUDivergenceAnalysis is intended to eventually supersede the existing
    LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces
    incorrect results on unstructured Control-Flow Graphs:


    This patch adds the option -use-gpu-divergence-analysis to the
    LegacyDivergenceAnalysis to turn it into a transparent wrapper for the

    Reviewers: nhaehnle

    Reviewed By: nhaehnle

    Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle

    Differential Revision: — nha / detail
  7. [x86] add tests for undef + partial undef constant folding; NFC

    Keep this file sync'd with the instsimplify version (rL348045). — spatel / detail
  8. [X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512.

    This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors. — ctopper / detail
  9. [InstSimplify] add tests for undef + partial undef constant folding; NFC

    These tests should probably go under a separate test file because they
    should fold with just -constprop, but they're similar to the scalar
    tests already in here. — spatel / detail