1. [MC] Enable .file support on COFF and diagnose it on unsupported targets Summary: The "single parameter" .file directive appears to be an ELF-only feature that is intended to insert the main source filename into the string table table. I noticed that if you assemble an ELF .s file for COFF, typically it will assert right away on a .file directive near the top of the file. My first change was to make this emit a proper error in the asm parser so that we don't assert so easily. However, COFF actually does have some support for this directive, and if you emit an object file, llvm-mc does not assert. When emitting a COFF object, MC will take those file names and create "debug" symbol table entries for them. I'm not familiar with these kinds of symbol table entries, and I'm not aware of any users of them, but @compnerd added them a while ago. They don't introduce absolute paths, and most main source file paths are short enough that this extra entry shouldn't cause any problems, so I enabled the flag in MCAsmInfoCOFF that indicates that it's supported. This has the side effect of adding an extra debug symbol to every object produced by clang, which is a pretty big functional change. My question is, should we keep the functionality or remove it in the name of symbol table minimalism? Reviewers: mstorsjo, compnerd Subscribers: hiraditya, compnerd, llvm-commits Differential Revision:
  2. Silence warning in assert introduced in rL349973. Subscribers: llvm-commits Differential Revision:
  3. [llvm] API for encoding/decoding DWARF discriminators. Summary: Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling) The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues: - communicates overflow back to the user - supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported. Reviewers: davidxl, danielcdh, wmi, dblaikie Reviewed By: dblaikie Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits Differential Revision:
  4. Reapply: DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses) Originally committed in r349333, reverted in r349353. GCC emitted these unconditionally on/before 4.4/March 2012 Clang emitted these unconditionally on/before 3.5/March 2014 This improves performance when parsing CUs (especially those using split DWARF) that contain no code ranges (such as the mini CUs that may be created by ThinLTO importing - though generally they should be/are avoided, especially for Split DWARF because it produces a lot of very small CUs, which don't scale well in a bunch of other ways too (including size)). The revert was due to a (Google internal) test that had some checked in old object files missing DW_AT_ranges. That's since been fixed.
  5. [IR] Add Instruction::isLifetimeStartOrEnd, NFC Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an llvm.lifetime.start or an llvm.lifetime.end intrinsic. This was suggested as a cleanup in D55967. Differential Revision:
  6. [TextAPI][elfabi] Fix failing tests from D56020
  7. [X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops. This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them. I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know. Differential Revision:
  8. Fix comment typo.
  9. Fix `static_assert()` scope in `CombinedAllocator`. It should be at the class scope and not inside the `Init(...)` function because we want to error out as soon as the wrong type is constructed. At the function scope the `static_assert` is only checked if the function might be called. This is a follow up to r349957. rdar://problem/45284065
  10. Fix `static_assert()` scope in `SizeClassAllocator32`. It should be at the class scope and not inside the `Init(...)` function because we want to error out as soon as the wrong type is constructed. At the function scope the `static_assert` is only checked if the function might be called. This is a follow up to r349138. rdar://problem/45284065
  11. [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC
  12. Introduce `AddressSpaceView` template parameter to `CombinedAllocator`. Summary: This is a follow up to . For the ASan and LSan allocatorsthe type declarations have been modified so that it's possible to create a combined allocator type that consistently uses a different type of `AddressSpaceView`. We intend to use this in future patches. For the other sanitizers they just use `LocalAddressSpaceView` by default because we have no plans to use these allocators in an out-of-process manner. rdar://problem/45284065 Reviewers: kcc, dvyukov, vitalybuka, cryptoad, eugenis, kubamracek, george.karpenkov, yln Subscribers: #sanitizers, llvm-commits Differential Revision:
  13. [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used. The BEXTR instruction documents the SF bit as undefined. The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed. Fixes PR40060 Differential Revision:
  14. Switch from static_cast<> to cast<>, update identifier for coding conventions; NFC.
  15. Introduce `AddressSpaceView` template parameter to `SizeClassAllocator64`. Summary: This is a follow up patch to r349138. This patch makes a `AddressSpaceView` a type declaration in the allocator parameters used by `SizeClassAllocator64`. For ASan, LSan, and the unit tests the AP64 declarations have been made templated so that `AddressSpaceView` can be changed at compile time. For the other sanitizers we just hard-code `LocalAddressSpaceView` because we have no plans to use these allocators in an out-of-process manner. rdar://problem/45284065 Reviewers: kcc, dvyukov, vitalybuka, cryptoad, eugenis, kubamracek, george.karpenkov Subscribers: #sanitizers, llvm-commits Differential Revision:
  16. [clang-tidy] Be more liberal about literal zeroes in abseil checks Summary: Previously, we'd only match on literal floating or integral zeroes, but I've now also learned that some users spell that value as int{0} or float{0}, which also need to be matched. Differential Revision:
  17. Convert some ObjC retain/release msgSends to runtime calls. It is faster to directly call the ObjC runtime for methods such as retain/release instead of sending a message to those functions. Differential Revision: Reviewed By: rjmccall
  18. AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision:
  19. [TextAPI][elfabi] Fix YAML support for weak symbols Weak symbols are supposed to be supported in the ELF TextAPI implementation, but the YAML handler didn't read or write the `Weak` member of ELFSymbol. This change adds the YAML mapping and updates tests to ensure correct behavior. Differential Revision:
  20. [Sema][NFC] Fix a Wimplicit-fallthrough warning in CheckSpecializationInstantiationRedecl All cases are covered so add an llvm_unreachable. NFC.
  21. [AST][NFC] Remove stale comment in CXXRecordDecl::is(Virtually)DerivedFrom. The "this" capture was removed in r291939.
  22. [libcxx] Remove unused macro _LIBCPP_HAS_UNIQUE_TYPEINFO Summary: We already have the negation of that as _LIBCPP_HAS_NONUNIQUE_TYPEINFO. Having both defined is confusing, since only one of them is used. Reviewers: EricWF, mclow.lists Subscribers: christof, jkorous, dexonsmith, libcxx-commits Differential Revision:
  23. [BasicAA] Fix AA bug on dynamic allocas and stackrestore Summary: BasicAA has special logic for unescaped allocas, which normally applies equally well to dynamic and static allocas. However, llvm.stackrestore has the power to end the lifetime of dynamic allocas, without referring to them directly. stackrestore is already marked with the most conservative memory modification attributes, but because the alloca is not escaped, the normal logic produces incorrect results. I think BasicAA needs a special case here to teach it about the relationship between dynamic allocas and stackrestore. Fixes PR40118 Reviewers: gbiv, efriedma, george.burgess.iv Subscribers: hiraditya, llvm-commits Differential Revision:
  24. [RuntimeUnrolling] NFC: Add TODO and comments in connectProlog Currently, runtime unrolling does not support loops where multiple exiting blocks exit to the latchExit. Added TODO and other code clarifications for ConnectProlog code.
  25. [analyzer] Tests quickfix.
  26. Remove stat cache chaining as it's no longer needed after PTH support has been removed Stat cache chaining was implemented for a StatListener in the PTH writer so that it could write out the stat information to PTH. r348266 removed support for PTH, and it doesn't seem like there are other uses of stat cache chaining. We can remove the chaining support. Differential Revision:
  27. Switch from cast<> to dyn_cast<>. This avoids a potential failed assertion that is happening on someone's out-of-tree build.
  28. Revert "Revert rL349876 from cfe/trunk: [analyzer] Perform escaping in RetainCountChecker on type mismatch even for inlined functions" This reverts commit b44b33f6e020a2c369da2b0c1d53cd52975f2526. Revert the revert with the fix.
  29. [analyzer] Correct the summary violation diagnostics for the retain count checker It should be in the past tense.
  30. [x86] add movddup specialization for build vector lowering (PR37502) This is admittedly a narrow fix for the problem: ...but as the XOP restriction shows, it's a maze to get this right. In the motivating example, note that we have movddup before SSE4.1 and again with AVX2. That's because insertps isn't available pre-SSE41 and vbroadcast is (more generally) available with AVX2 (and the splat is reduced to movddup via isel pattern). Differential Revision:
  31. [ARM] Set Defs = [CPSR] for COPY_STRUCT_BYVAL, as it clobbers CPSR. Fixes PR35023. Reviewers: MatzeB, t.p.northover, sunfish, qcolombet, efriedma Reviewed By: efriedma Differential Revision:
  32. [AST][NFC] Fix Wsign-compare warning introduced in CXXOperatorCallExpr
  33. [Sema][NFC] Fix Wimplicit-fallthrough warning in getCursorKindForDecl All cases are covered so add an llvm_unreachable. NFC.
  34. [NFC] Fix typo in comment
  35. [SelectionDAG] Remove KnownBits output paramater version. Completes the work started by @bogner in rL340594.
  36. [clang-tidy] Add export-fixes flag to clang-tidy-diff Differential Revision:
  37. [x86] remove excess check lines; NFC Forgot that the integer variants have an extra 's'.
  38. [x86] move misplaced tests; NFC Mixed up integer and FP in rL349923.
  39. [GlobalISel][AArch64] Add support for widening G_FCEIL This adds support for widening G_FCEIL in LegalizerHelper and AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't support full FP 16. This also updates AArch64/f16-instructions.ll to show that we perform the correct transformation.
  40. [AST][NFC] Pack CXXOperatorCallExpr Use the space available in the bit-fields of Stmt. This saves 8 bytes per CXXOperatorCallExpr. NFC.
  41. [x86] add tests for possible horizontal op transform; NFC
  42. ReleaseNotes: Document removal of add_llvm_loadable_module CMake macro This was removed in r349839.
  43. [x86] move test for movddup; NFC This adds an AVX512 run as suggested in D55936. The test didn't really belong with other build vector tests because that's not the pattern here. I don't see much value in adding 64-bit RUNs because they wouldn't exercise the isel patterns that we're aiming to expose.
  44. [pstl] Initial integration with LLVM's CMake Summary: This commit adds a check-pstl CMake target that will run the tests we currently have for pstl. Those tests are not using LLVM lit yet, but switching them over should be a transparent change. With this change, we can start relying on the `check-pstl` target for workflows and CI. Note that this commit purposefully does not support the pre-monorepo layout (with subprojects in projects/), since LLVM is moving towards the monorepo layout anyway. Reviewers: jfb Subscribers: mgorny, jkorous, dexonsmith, libcxx-commits, mclow.lists, rodgert Differential Revision:
  45. [AArch64] Refactor Exynos predicate (NFC) Change order of conditions in predicate.
  46. [Sanitizer] Move the unit test in the right place.
  47. [Sanitizer] Enable strtonum in FreeBSD Reviewers: krytarowski, vitalybuka Reviewed By: krytarowski Differential Revision:
  48. [XCore] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  49. [Sparc] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  50. [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  51. [WebAssembly] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  52. [AST] Store the callee and argument expressions of CallExpr in a trailing array. Since CallExpr::setNumArgs has been removed, it is now possible to store the callee expression and the argument expressions of CallExpr in a trailing array. This saves one pointer per CallExpr, CXXOperatorCallExpr, CXXMemberCallExpr, CUDAKernelCallExpr and UserDefinedLiteral. Given that CallExpr is used as a base of the above classes we cannot use llvm::TrailingObjects. Instead we store the offset in bytes from the this pointer to the start of the trailing objects and manually do the casts + arithmetic. Some notes: 1.) I did not try to fit the number of arguments in the bit-fields of Stmt. This leaves some space for future additions and avoid the discussion about whether x bits are sufficient to hold the number of arguments. 2.) It would be perfectly possible to recompute the offset to the trailing objects before accessing the trailing objects. However the trailing objects are frequently accessed and benchmarks show that it is slightly faster to just load the offset from the bit-fields. Additionally, because of 1), we have plenty of space in the bit-fields of Stmt. Differential Revision: Reviewed By: rjmccall
  53. [ARM] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  54. [AArch64] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  55. [SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  56. [SystemZ] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  57. [Lanai] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  58. [Sema][NFC] Remove some unnecessary calls to getASTContext. The AST context is already easily available. NFC.
  59. [PPC] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.
  60. [X86] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version.
  61. [AST][NFC] Pass the AST context to one of the ctor of DeclRefExpr. All of the other constructors already take a reference to the AST context. This avoids calling Decl::getASTContext in most cases. Additionally move the definition of the constructor from Expr.h to Expr.cpp since it is calling DeclRefExpr::computeDependence. NFC.
  62. [AArch64] Adding missing REQUIRES in aarch64 dwarf test
  63. [xray] [tests] Detect and handle missing LLVMTestingSupport gracefully Add a code to properly test for presence of LLVMTestingSupport library when performing a stand-alone build, and skip tests requiring it when it is not present. Since the library is not installed, llvm-config reported empty --libs for it and the tests failed to link with undefined references. Skipping the two fdr_* test files is better than failing to build, and should be good enough until we find a better solution. NB: both installing LLVMTestingSupport and building it automatically from within compiler-rt sources are non-trivial. The former due to dependency on gtest, the latter due to tight integration with LLVM source tree. Differential Revision:
  64. [ADT] IntervalMap: add overlaps(a, b) method Summary: This function checks whether the mappings in the interval map overlap with the given range [a;b]. The motivation is to enable checking for overlap before inserting a new interval into the map. Reviewers: vsk, dblaikie Subscribers: dexonsmith, kristina, llvm-commits Differential Revision:
  65. [CMake] Print out the list of sanitizers that the sanitizer_common tests will run against. Summary: This is a change requested by Vitaly Buka as prerequisite to landing Reviewers: vitalybuka, kubamracek Subscribers: mgorny, #sanitizers, llvm-commits Differential Revision:
  66. [NewPM] -print-module-scope -print-after now prints module even after invalidated Loop/SCC -print-after IR printing generally can not print the IR unit (Loop or SCC) which has just been invalidated by the pass. However, when working in -print-module-scope mode even if Loop was invalidated there is still a valid module that we can print. Since we can not access invalidated IR unit from AfterPassInvalidated instrumentation point we can remember the module to be printed *before* pass. This change introduces BeforePass instrumentation that stores all the information required for module printing into the stack and then after pass (in AfterPassInvalidated) just print whatever has been placed on stack. Reviewed By: philip.pfaffe Differential Revision:
  67. [Dwarf/AArch64] Return address signing B key dwarf support - When signing return addresses with -msign-return-address=<scope>{+<key>}, either the A key instructions or the B key instructions can be used. To correctly authenticate the return address, the unwinder/debugger must know which key was used to sign the return address. - When and exception is thrown or a break point reached, it may be necessary to unwind the stack. To accomplish this, the unwinder/debugger must be able to first authenticate an the return address if it has been signed. - To enable this, the augmentation string of CIEs has been extended to allow inclusion of a 'B' character. Functions that are signed using the B key variant of the instructions should have and FDE whose associated CIE has a 'B' in the augmentation string. - One must also be able to preserve these semantics when first stepping from a high level language into assembly and then, as a second step, into an object file. To achieve this, I have introduced a new assembly directive '.cfi_b_key_frame ', that tells the assembler the current frame uses return address signing with the B key. - This ensures that the FDE is associated with a CIE that has 'B' in the augmentation string. Differential Revision:
  68. Revert rL349876 from cfe/trunk: [analyzer] Perform escaping in RetainCountChecker on type mismatch even for inlined functions The fix done in D55465 did not previously apply when the function was inlined. rdar://46889541 Differential Revision: ........ Fixes broken buildbot:
  69. [clangd] Cleanup syntax errors in the test, NFC.
  70. [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: Differential Revision:
  71. Fix warning about unused variable [NFC]
  72. [Sema] Produce diagnostics when C++17 aligned allocation/deallocation functions that are unavailable on Darwin are explicitly called or called from deleting destructors. rdar://problem/40736230 Differential Revision:
  73. [WebAssembly] Fix invalid machine instrs in -O0, verify in tests Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  74. Fix test case breakages caused by lexically_relative change
  75. Don't forward declare _FilesystemClock in C++03
  76. Fix copy paste error in file_clock tests
  77. Implement LWG 3096: path::lexically_relative is confused by trailing slashes path("/dir/").lexically_relative("/dir"); now returns "." instead of ""
  78. Implement LWG 3065: Make path operators friends. This prevents things like: using namespace std::filesystem; auto x = L"a/b" == std::string("a/b");
  79. Implement LWG 3145: file_clock breaks ABI for C++17 implementations. This patch adds std::chrono::file_clock, but without breaking the existing ABI for std::filesystem.
  80. AMDGPU/GlobalISel: RegBankSelect for
  81. Implement LWG 2936: Path comparison is defined in terms of the generic format This patch implements path::compare according to the current spec. The only observable change is the ordering of "/foo" and "foo", which orders the two paths based on having or not having a root directory (instead of lexically comparing "/" to "foo").
  82. AMDGPU/GlobalISel: RegBankSelect for some fp ops
  83. GlobalISel: Correct example PartialMapping table When I try to use this, it seems like the second half needs to start where the previous part left off.
  84. AMDGPU/GlobalISel: Redo legality for build_vector It seems better to avoid using the callback if possible since there are coverage assertions which are disabled if this is used. Also fix missing tests. Only test the legal cases since it seems legalization for build_vector is quite lacking.
  85. Mark two filesystem LWG issues as complete - nothing to do
  86. [analyzer] Perform escaping in RetainCountChecker on type mismatch even for inlined functions The fix done in D55465 did not previously apply when the function was inlined. rdar://46889541 Differential Revision:
  87. [analyzer] Fix a bug in RetainCountDiagnostics while printing a note on mismatched summary in inlined functions Previously, we were not printing a note at all if at least one of the parameters was not annotated. rdar://46888422 Differential Revision:
  88. [memcpyopt] Add debug logs when forwarding memcpy src to dst
  89. [mingw] Don't mangle thiscall like fastcall etc GCC does not mangle it when it is not explicit in the source. The mangler as currently written cannot differentiate between explicit and implicit calling conventions, so we can't match GCC. Explicit thiscall conventions are rare, so mangle as if the convention was implicit to be as ABI compatible as possible. Also fixes some tests using %itanium_abi_triple in some configurations as a side effect. Fixes PR40107.
  90. [LoopUnroll] Don't verify domtree by default with +Asserts. This verification is linear in the size of the function, so it can cause a quadratic compile-time explosion in a function with many loops to unroll. Differential Revision:
  91. [X86] Autogenerate complete checks. NFC
  92. [X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp. This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes. This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses.
  93. [X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses. Found while working on another patch
  94. [driver] [analyzer] Fix --analyze -Xanalyzer after r349863. If an -analyzer-config is passed through -Xanalyzer, it is not found while looking for -Xclang. Additionally, don't emit -analyzer-config-compatibility-mode for *every* -analyzer-config flag we encounter; one is enough. rdar://problem/46504165
  95. Revert "Revert "[driver] [analyzer] Fix a backward compatibility issue after r348038."" This reverts commit 144927939587b790c0536f4ff08245043fc8d733. Fixes the bug in the original commit.
  96. [analyzer] RetainCount: Suppress retain detection heuristic on some CM methods. If it ends with "Retain" like CFRetain and returns a CFTypeRef like CFRetain, then it is not necessarily a CFRetain. But it is indeed true that these two return something retained. Differential Revision: rdar://problem/39390714
  97. [ARM] Complete the Thumb1 shift+and->shift+shift transforms. This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision:
  98. [zorg] Print all IPs on Android bot
  99. [CodeGen] Fix a test from r349848 by replacing `objc_` with `llvm.objc.`
  100. Revert "[asan] Disable test on powerpc64be" Now the test is passing on that bot. Some incremental build issues? This reverts commit e00b5a5229ae02088d9f32a4e328eaa08abaf354.
  101. [CodeGen] Fix assertion on emitting cleanup for object with inlined inherited constructor and non-trivial destructor. Fixes assertion > Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file llvm/Support/Casting.h, line 255. It was triggered by trying to cast `FunctionDecl` to `CXXMethodDecl` as `CGF.CurCodeDecl` in `CallBaseDtor::Emit`. It was happening because cleanups were emitted in `ScalarExprEmitter::VisitExprWithCleanups` after destroying `InlinedInheritingConstructorScope`, so `CodeGenFunction.CurCodeDecl` didn't correspond to expected cleanup decl. Fix the assertion by emitting cleanups before leaving `InlinedInheritingConstructorScope` and changing `CurCodeDecl`. Test cases based on a patch by Shoaib Meenai. Fixes PR36748. rdar://problem/45805151 Reviewers: rsmith, rjmccall Reviewed By: rjmccall Subscribers: jkorous, dexonsmith, cfe-commits, smeenai, compnerd Differential Revision:
  102. [InstCombine] [NFC] testcases for canonicalize MUL with NEG operand
  103. Fix Windows build failures caused by r349839
  104. Add support for namespaces on #pragma clang attribute Namespaces are introduced by adding an "identifier." before a push/pop directive. Pop directives with namespaces can only pop a attribute group that was pushed with the same namespace. Push and pop directives that don't opt into namespaces have the same semantics. This is necessary to prevent a pitfall of using multiple #pragma clang attribute directives spread out in a large file, particularly when macros are involved. It isn't easy to see which pop corripsonds to which push, so its easy to inadvertently pop the wrong group. Differential revision:
  105. [asan] Disable test on powerpc64be
  106. Revert "[driver] [analyzer] Fix a backward compatibility issue after r348038." This reverts commits r349824, r349828, r349835. More buildbot failures were noticed. Differential Revision: rdar://problem/46504165
  107. [zorg] Print local IP of Android build bot
  108. [ObjC] Messages to 'self' in class methods that return 'instancetype' should use the pointer to the class as the result type of the message Prior to this commit, messages to self in class methods were treated as instance methods to a Class value. When these methods returned instancetype the compiler only saw id through the instancetype, and not the Interface *. This caused problems when that return value was a receiver in a message send, as the compiler couldn't select the right method declaration and had to rely on a selection from the global method pool. This commit modifies the semantics of such message sends and uses class messages that are dispatched to the interface that corresponds to the class that contains the class method. This ensures that instancetypes are correctly interpreted by the compiler. This change is safe under ARC (as self can't be reassigned), however, it also applies to MRR code as we are assuming that the user isn't doing anything unreasonable. rdar://20940997 Differential Revision:
  109. cmake: Remove uses of add_llvm_loadable_module macro This was removed from llvm in r349839.
  110. cmake: Remove add_llvm_loadable_module() Summary: This function is very similar to add_llvm_library(), so this patch merges it into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...) with add_llvm_library(lib MODULE ...) Reviewers: philip.pfaffe, beanz, chandlerc Reviewed By: philip.pfaffe Subscribers: chapuni, mgorny, llvm-commits Differential Revision:
  111. [zorg] Report results of adb push on Android bot
  112. [zorg] Simplify stage1 build on sanitizer bots
  113. [gn check] Unbreak check-lld if llvm_install_binutils_symlinks is false The check-lld target was missing the dependency on llvm-nm and llvm-objdump in that case. Differential Revision:
  114. [driver] [analyzer] Fix redundant test output. The -c flag causes a .o file to appear every time we run a test. Remove it. Differential Revision: rdar://problem/46504165
  115. [gn build] Add build file for clang/lib/CodeGen and llvm/lib/ProfileData/Coverage Differential Revision:
  116. [gn build] Add build files for clang/lib/{Frontend,Frontend/Rewrite,Serialization} Differential Revision:
  117. [gn build] Add build file for clang/lib/Driver Mostly boring, except for the spurious dependency on StaticAnalyzer/Checkers -- see comments in the code. Differential Revision:
  118. [gn build] Add build file for clang/lib/Parse Nothing really interesting. One thing to consider is where the clang_tablegen() invocations that generate files that are private to a library should be. The CMake build puts them in clang/include/clang/Parse (in this case), but maybe putting them right in clang/lib/Parse/ makes mor sense. (For clang_tablegen() calls that generate .inc files used by the public headers, putting the call in the public BUILD file makes sense.) For now, I've put the build file in the public header folder, since that matches CMake and what I did in the last 2 clang patches, but I'm not sure I like this. Differential Revision:
  119. [gn build] Add build files for clang-format and lib/{Format,Rewrite,Tooling/Core,Tooling/Inclusions} Differential Revision:
  120. [zorg] Skip svn update with "BUILDBOT_REVISION=-"
  121. [driver] [analyzer] Fix buildbots after r349824. Buildbots can't find the linker, which we don't really need in our tests. Differential Revision: rdar://problem/46504165
  122. [zorg] Report exception of svn update failed
  123. [llvm-objcopy] [COFF] Avoid memcpy() with null parameters in more places. NFC. This fixes all cases of errors in asan+ubsan builds. Also use std::copy instead of if+memcpy in the previously updated spot, for consistency.
  124. Declares __cpu_model as dso local __builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local. Differential Revision:
  125. [driver] [analyzer] Fix a backward compatibility issue after r348038. Since r348038 we emit an error every time an -analyzer-config option is not found. The driver, however, suppresses this error with another flag, -analyzer-config-compatibility-mode, so backwards compatibility is maintained, while analyzer developers still enjoy the new typo-free experience. The backwards compatibility turns out to be still broken when the -analyze action is not specified; it is still possible to specify -analyzer-config in that case. This should be fixed now. Patch by Kristóf Umann! Differential Revision: rdar://problem/46504165
  126. [CodeGen] Generate llvm.loop.parallel_accesses instead of llvm.mem.parallel_loop_access metadata. Instead of generating llvm.mem.parallel_loop_access metadata, generate on instructions and llvm.loop.parallel_accesses on loops. There is one access group per generated loop. This is clang part of D52116/r349725. Differential Revision:
  127. [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback. Add it to the switch, and add a test verifying that this happens.
  128. Make the "too many braces in scalar initialization" extension cause SFINAE failures.
  129. DebugInfo: Fix for missing comp_dir handling with r349207 When deciding lazily whether a CU would be split or non-split I accidentally dropped some handling for the line tables comp_dir (by doing it lazily it was too late to be handled properly by the MC line table code). Move that bit of the code back to the non-lazy place.
  130. [sanitizer] Support running without fd 0,1,2. Summary: Support running with no open file descriptors (as may happen to "init" process on linux). * Remove a check that writing to stderr succeeds. * When opening a file (ex. for log_path option), dup the new fd out of [0, 2] range to avoid confusing the program. (2nd attempt, this time without the sanitizer_rtems change) Reviewers: pcc, vitalybuka Subscribers: kubamracek, llvm-commits Differential Revision:
  131. Fix the example checker plugin after r349812.
  132. Fix build failures from r349812 due to a missing argument.
  133. Allow direct navigation to static analysis checker documentation through SARIF exports. This adds anchors to all of the documented checks so that you can directly link to a check by a stable name. This is useful because the SARIF file format has a field for specifying a URI to documentation for a rule and some viewers, like CodeSonar, make use of this information. These links are then exposed through the SARIF exporter.
  134. [Sema] Don't try to account for the size of an incomplete type in CheckArrayAccess When checking that the array access is not out-of-bounds in CheckArrayAccess it is possible that the type of the base expression after IgnoreParenCasts is incomplete, even though the type of the base expression before IgnoreParenCasts is complete. In this case we have no information about whether the array access is out-of-bounds and we should just bail-out instead. This fixes PR39746 which was caused by trying to obtain the size of an incomplete type. Differential Revision: Reviewed By: efriedma
  135. [zorg] Uprev clang on sanitizer bots
  136. [zorg] Fix STEP_FAILURE report on Android bot
  137. [llvm-objcopy] [COFF] Don't call memcpy() with a null argument. NFC. It is invalid to call memcpy with a null pointer, even if the size is zero. This should fix the sanitizer buildbot.
  138. [ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFC Move constant folding tests into ConstantFolding/bitcount.ll and drop various tests in other places. Add coverage for undefs.
  139. [ConstantFolding] Add undef tests for overflow intrinsics; NFC
  140. [ConstantFolding] Regenerate test checks; NFC Bring overflow-ops.ll into current format. Remove redundant entry blocks.
  141. [ConstantFolding] Add tests for funnel shifts with undef operands; NFC
  142. [ConstantFolding] Add tests for sat add/sub with undefs; NFC
  143. [ConstantFolding] Split up saturating add/sub tests; NFC Split each test into a separate function.
  144. [MC] [AArch64] Correctly resolve ":abs_g1:3" etc. We have to treat constructs like this as if they were "symbolic", to use the correct codepath to resolve them. This mostly only affects movz etc. because the other uses of classifySymbolRef conservatively treat everything that isn't a constant as if it were a symbol. Differential Revision:
  145. [MC] [AArch64] Support resolving fixups for abs_g0 etc. This requires a bit more code than other fixups, to distingush between abs_g0/abs_g1/etc. Actually, I think some of the other fixups are missing some checks, but I won't try to address that here. I haven't seen any real-world code that uses a construct like this, but it clearly should work, and we're considering using it in the implementation of localescape/localrecover on Windows (see I've verified that binutils produces the same code as llvm-mc for the testcase. This currently doesn't include support for the *_s variants (that requires a bit more work to set the opcode). Differential Revision:
  146. Revert "[analyzer] pr38668: Do not attempt to cast loaded values..." This reverts commit r349701. The patch was incorrect. The whole point of CastRetrievedVal() is to handle the case in which the type from which the cast is made (i.e., the "type" of value `V`) has nothing to do with the type of the region it was loaded from (i.e., `R->getValueType()`). Differential Revision: rdar://problem/45062567
  147. [zorg] Fix STEP_FAILURE report on Android bot
  148. [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (clang) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. LLVM counterpart: Differential Revision:
  149. [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: Differential Revision:
  150. [LAA] Avoid generating RT checks for known deps preventing vectorization. If we found unsafe dependences other than 'unknown', we already know at compile time that they are unsafe and the runtime checks should always fail. So we can avoid generating them in those cases. This should have no negative impact on performance as the runtime checks that would be created previously should always fail. As a sanity check, I measured the test-suite, spec2k and spec2k6 and there were no regressions. Reviewers: Ayal, anemet, hsaito Reviewed By: Ayal Differential Revision:
  151. Add missing -oso-prepend-path to dsymutil test. Thanks to Galina Kistanova for pointing this out!
  152. [CMake] Add libunwind when 'all' is being passed as LLVM_ENABLE_PROJECTS Reviewers: zturner Subscribers: mgorny, jkorous, dexonsmith, llvm-commits Differential Revision:
  153. Use @llvm.objc.clang.arc.use intrinsic instead of clang.arc.use function. Calls to this function are deleted in the ARC optimizer. However when the ARC optimizer was updated to use intrinsics instead of functions (r349534), the corresponding clang change (r349535) to use intrinsics missed this one so it wasn't being deleted.
  154. [libcxx] Fix order checking in unordered_multimap tests. Some tests assume that iteration through an unordered multimap elements will return them in the same order as at the container creation. This assumption is not true since the container is unordered, so that no specific order of elements is ever guaranteed for such container. This patch introduces checks verifying that any iteration will return elements exactly from a set of valid values and without repetition, but in no particular order. Reviewed as Thanks to Andrey Maksimov for the patch.
  155. Add PLATFORM constants for iOS, tvOS, and watchOS simulators Summary: Add PLATFORM constants for iOS, tvOS, and watchOS simulators, as well as human readable names for these constants, to the Mach-O file format header files. rdar://46854119 Reviewers: ab, davide Reviewed By: ab, davide Subscribers: llvm-commits Differential Revision:
  156. [BPF] Disable relocation for .BTF.ext section Build llvm with assertion on, and then build bcc against this llvm. Run any bcc tool with debug=8 (turning on -g for clang compilation), you will get the following assertion errors, /home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888: void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed. The .BTF.ext ELF section uses Fixup's to get the instruction offsets. The data width of the Fixup is 4 bytes since we only need the insn offset within the section. This caused the above error though since R_BPF_64_32 expects 4-byte value and the Runtime Dyld tried to resolve the actual insn address which is 8 bytes. Actually the offset within the section is all what we need. Therefore, there is no need to perform any kind of relocation for .BTF.ext section and such relocation will actually cause incorrect result. This patch changed BPFELFObjectWriter::getRelocType() such that for Fixup Kind FK_Data_4, if the relocation Target is a temporary symbol, let us skip the relocation (ELF::R_BPF_NONE). Acked-by: Alexei Starovoitov <> Signed-off-by: Yonghong Song <>
  157. [CodeView] Emit global variables within lexical scopes to limit visibility Emit static locals within the correct lexical scope so variables with the same name will not confuse the debugger into getting the wrong value. Differential Revision:
  158. Correct the diagnose_if attribute documentation. Fixes PR35845.
  159. [InstCombine] Preserve access-group metadata. Preserve metadata when combining store instructions. This was forgotten in r349725. Fixes
  160. [x86] add test to show missed movddup load fold; NFC
  161. Test commit Fix a simple typo.
  162. [Hexagon] Add patterns for funnel shifts
  163. [clangd] Try to workaround test failure by increasing the timeouts Ideally we'd figure out a way to run this test without any sleeps, this workaround is only there to avoid annoying people with test failures around the holiday period when everyone is on vacation.
  164. [clangd] Expose FileStatus to LSP. Summary: Add an LSP extension "textDocument/clangd.fileStatus" to emit file-status information. Reviewers: ilya-biryukov Subscribers: javed.absar, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  165. [SelectionDAGBuilder] Enable funnel shift building to custom rotates This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations. AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK. Differential Revision:
  166. [RISCV] Properly evaluate fixup_riscv_pcrel_lo12 This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12. Notable changes: Rebased onto trunk Handle and test S-type Test case pcrel-hilo.s is merged into relocations.s D43157 description: VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup pointing to the real target. Differential Revision: Patch by Chih-Mao Chen and Michael Spencer.
  167. [X86][AVX512] Don't custom lower v16i8 rotations. As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI.
  168. [Sanitizer] Enable vis api on FreeBSD Reviewers: krytarowski Reviewed By: krytarowski Differential Revision:
  169. [SystemZ] "Generic" vector assembler instructions shoud clobber CC There are several vector instructions which may or may not set the condition code register, depending on the value of an argument. For codegen, we use two versions of the instruction, one that sets CC and one that doesn't, which hard-code appropriate values of that argument. But we also have a "generic" version of the instruction that is used for the assembler/disassembler. These generic versions should always be considered to clobber CC just to be safe.
  170. Fix gcc7 -Wdangling-else warning. NFCI.
  171. [InstCombine] Make x86 PADDS/PSUBS constant folding tests generic As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder. PR40110 has been raised to fix the regression with constant folding vectors containing undef elements.
  172. [clang-tidy] Use translationUnitDecl() instead of a custom matcher.
  173. [gn build] Add build files for clang/lib/{Analysis,Edit,Sema} Differential Revision:
  174. [gn build] Add build files for clang/lib/Lex and clang/lib/AST Differential Revision:
  175. [Sema][NFC] Add test for static_assert diagnistics with constexpr template functions.
  176. [Driver] Fix accidentally reversed condition in r349752
  177. [SystemZ] Improve testing of vecintrin.h intrinsics This adds assembly-level tests to verify that the high-level intrinsics generate the instructions they're supposed to. These tests would have caught the codegen bugs I just fixed.
  178. Replace getOS() == llvm::Triple::*BSD with isOS*BSD() [NFCI] Replace multiple comparisons of getOS() value with FreeBSD, NetBSD, OpenBSD and DragonFly with matching isOS*BSD() methods. This should improve the consistency of coding style without changing the behavior. Direct getOS() comparisons were left whenever used in switch or switch- like context. Differential Revision:
  179. [SystemZ] Fix wrong codegen caused by typos in vecintrin.h The following two bugs in SystemZ high-level vector intrinsics are fixes by this patch: - The float case of vec_insert_and_zero should generate a VLLEZF pattern, but currently erroneously generates VLLEZLF. - The float and double versions of vec_orc erroneously generate and-with-complement instead of or-with-complement. The patch also fixes a couple of typos in the associated test.
  180. [clangd] Don't miss the expected type in merge. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  181. [SystemZ] Make better use of VLLEZ This patch fixes two deficiencies in current code that recognizes the VLLEZ idiom: - For the floating-point versions, we have ISel patterns that match on a bitconvert as the top node. In more complex cases, that bitconvert may already have been merged into something else. Fix the patterns to match the inner nodes instead. - For the 64-bit integer versions, depending on the surrounding code, we may get either a DAG tree based on JOIN_DWORDS or one based on INSERT_VECTOR_ELT. Use a PatFrags to simply match both variants.
  182. [SystemZ] Make better use of VGEF/VGEG Current code in SystemZDAGToDAGISel::tryGather refuses to perform any transformation if the Load SDNode has more than one use. This (erronously) counts uses of the chain result, which prevents the optimization in many cases unnecessarily. Fixed by this patch.
  183. Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.
  184. [SystemZ] Make better use of VLDEB We already have special code (DAG combine support for FP_ROUND) to recognize cases where we an use a vector version of VLEDB to perform two floating-point truncates in parallel, but equivalent support for VLEDB (vector floating-point extends) has been missing so far. This patch adds corresponding DAG combine support for FP_EXTEND.
  185. Revert "[sanitizer] Support running without fd 0,1,2." This reverts commit r349699. Reason: the commit breaks compilation of when building for RTEMS.
  186. [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) Pulled out of D55894 to match the clang changes in D55890. Differential Revision:
  187. [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (clang) This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics. LLVM counterpart: Differential Revision:
  188. [X86] Update PADDSW/PSUBSW intrinsic usage with generic saturated intrinsics. As discussed on D55894, this makes no difference to the actual test.
  189. [llvm-objcopy] Use ELFOSABI_NONE instead of 0. NFC. This was requested during the review of D55886. (sorry, forgot to address this)
  190. [asan] Revert still Androind incompatible tests enabled in r349736
  191. [X86] Change 'simple nonmem' intrinsic test to not use PADDSW Those intrinsics will be autoupgraded soon to @llvm.sadd.sat generics (D55894), so to keep a x86-specific case I'm replacing it with @llvm.x86.sse2.pmulhu.w
  192. [llvm-objcopy] - Do not drop the OS/ABI and ABIVersion fields of ELF header This is, Patch teaches llvm-objcopy to preserve OS/ABI and ABIVersion fields of ELF header. (Currently, it drops them to zero). Differential revision:
  193. [yaml2obj/obj2yaml] - Support dumping/parsing ABI version. These tools were assuming ABI version is 0, that is not always true. Patch teaches them to work with that field. Differential revision:
  194. [asan] Fix and re-enable few test on Android
  195. [InstCombine][AMDGPU] Handle more buffer intrinsics Summary: Include the following intrinsics in the InsctCombine simplification: * amdgcn_raw_buffer_load * amdgcn_raw_buffer_load_format * amdgcn_struct_buffer_load * amdgcn_struct_buffer_load_format Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision:
  196. [MSan] Don't emit __msan_instrument_asm_load() calls LLVM treats void* pointers passed to assembly routines as pointers to sized types. We used to emit calls to __msan_instrument_asm_load() for every such void*, which sometimes led to false positives. A less error-prone (and truly "conservative") approach is to unpoison only assembly output arguments.
  197. Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads." Forgot to update PowerPC tests for the GEP->bitcast change.
  198. [NFC] Fix trailing comma after function. lib/Analysis/VectorUtils.cpp:482:2: warning: extra ‘;’ [-Wpedantic]
  199. [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Summary: This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp in just two loads on X86. These were previously calling memcmp. Reviewers: spatel, gchatelet Subscribers: llvm-commits Differential Revision:
  200. [HWASAN] Add support for memory intrinsics This is patch complements D55117 implementing __hwasan_mem* functions in runtime Differential revision:
  201. [Sema] Better static assert diagnostics for expressions involving temporaries/casts/.... Summary: Handles expressions such as: - `std::is_const<T>()` - `std::is_const<T>()()`; - `std::is_same(decltype(U()), V>::value`; Reviewers: aaron.ballman, Quuxplusone Subscribers: cfe-commits, llvm-commits Differential Revision:
  202. [HWASAN] Add support for memory intrinsics Differential revision:
  203. [PowerPC] Implement the isSelectSupported() target hook Summary: PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC does not have vector CR selects, PowerPC does not support scalar condition selects on vectors. In addition to implementing this hook, isSelectSupported() should return false when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects are converted into branch sequences. Reviewed By: steven.zhang, hfinkel Differential Revision:
  204. [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.
  205. Introduce llvm.loop.parallel_accesses and metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by metadata. points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision:
  206. [WebAssembly] Emit a splat for v128 IMPLICIT_DEF Summary: This is a code size savings and is also important to get runnable code while engines do not support v128.const. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  207. Fix build errors introduced by r349712 on aarch64 bots.
  208. [WebAssembly] Gate unimplemented SIMD ops on flag Summary: Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag, since these ops are not implemented yet in V8. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  209. Remove pointless casts.
  210. AMDGPU: Make i1/i64/v2i32 and/or/xor legal The 64-bit types do depend on the register bank, but that's another issue to deal with later.
  211. AMDGPU/GlobalISel: Fix ValueMapping tables for i1 This was incorrectly selecting SGPR for any i1 values, e.g. G_TRUNC to i1 from a VGPR was still an SGPR.
  212. [X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization. Generic legalization should take care of this.
  213. [AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64. This code pattern is an unfortunate side effect of the way some types get split at call lowering. Ideally we'd either not generate it at all or combine it away in the legalizer artifact combiner. Until then, add selection support anyway which is a significant proportion of our current fallbacks on CTMark. rdar://46491420
  214. [binutils] NFC: fix clang-tidy warning: use empty() instead of size() == 0
  215. AMDGPU/GlobalISel: RegBankSelect for fp conversions
  216. AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg
  217. [asan] Undo special treatment of linkonce_odr and weak_odr Summary: On non-Windows these are already removed by ShouldInstrumentGlobal. On Window we will wait until we get actual issues with that. Reviewers: pcc Subscribers: hiraditya, llvm-commits Differential Revision:
  218. [asan] Prevent folding of globals with redzones Summary: ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized globals. Also in general unnamed_addr is not valid here as address now is important for ODR violation detector and redzone poisoning. Before the patch ICF on globals caused: 1. false ODR reports when we register global on the same address more than once 2. globals buffer overflow if we fold variables of smaller type inside of large type. Then the smaller one will poison redzone which overlaps with the larger one. Reviewers: eugenis, pcc Subscribers: hiraditya, llvm-commits Differential Revision:
  219. [asan] Disable test incompatible with new Android
  220. [gn build] Make `ninja check-lld` also run LLD's unit tests And add build files for gtest. With this, the build files for //lld are complete. Differential Revision:
  221. [DwarfExpression] Fix a typo in a doxygen comment. NFC.
  222. [gn build] Add check-lld target and make it work Also add a build file for llvm-lit, which in turn needs llvm/tools/llvm-config. With this, check-lld runs and passes all of lld's lit tests. It doesn't run any of its unit tests yet. Running just ninja -C out/gn will build all prerequisites needed to run tests, but it won't run the tests (so that the build becomes clean after one build). Running ninja -C out/gn check-lld will build prerequisites if needed and run the tests. The check-lld target never becomes clean and runs tests every time. llvm-config's build file is a bit gnarly: Everything not needed to run tests is basically stubbed out. Also, to generate we shell out to llvm-build at build-time. It would be much nicer to get the library dependencies by using the dependency data the GN build contains ( Differential Revision:
  223. [analyzer] pr38668: Do not attempt to cast loaded values of non-scalar types. It is expected to have the same object (memory region) treated as if it has different types in different program points. The correct behavior for RegionStore when an object is stored as an object of type T1 but loaded as an object of type T2 is to store the object as if it has type T1 but cast it to T2 during load. Note that the cast here is some sort of a "reinterpret_cast" (even in C). For instance, if you store a float and load an integer, you won't have your float rounded to an integer; instead, you will have garbage. Admit that we cannot perform the cast as long as types we're dealing with are non-trivial (neither integers, nor pointers). Of course, if the cast is not necessary (eg, T1 == T2), we can still load the value just fine. Differential Revision: rdar://problem/45062567
  224. [sanitizer] Support running without fd 0,1,2. Summary: Support running with no open file descriptors (as may happen to "init" process on linux). * Remove a check that writing to stderr succeeds. * When opening a file (ex. for log_path option), dup the new fd out of [0, 2] range to avoid confusing the program. Reviewers: pcc, vitalybuka Subscribers: kubamracek, llvm-commits Differential Revision:
  225. [analyzer] GenericTaint: Fix formatting to prepare for incoming improvements. Patch by Gábor Borsik! Differential Revision:
  226. [analyzer] Improve modeling for returning an object from the top frame with RVO. Static Analyzer processes the program function-by-function, sometimes diving into other functions ("inlining" them). When an object is returned from an inlined function, Return Value Optimization is modeled, and the returned object is constructed at its return location directly. When an object is returned from the function from which the analysis has started (the top stack frame of the analysis), the return location is unknown. Model it with a SymbolicRegion based on a conjured symbol that is specifically tagged for that purpose, because this is generally the correct way to symbolicate unknown locations in Static Analyzer. Fixes leak false positives when an object is returned from top frame in C++17: objects that are put into a SymbolicRegion-based memory region automatically "escape" and no longer get reported as leaks. This only applies to C++17 return values with destructors, because it produces a redundant CXXBindTemporaryExpr in the call site, which confuses our liveness analysis. The actual fix for liveness analysis is still pending, but it is no longer causing problems. Additionally, re-enable temporary destructor tests in C++17. Differential Revision: rdar://problem/46217550
  227. [X86] Remove TLI variable from ReplaceNodeResults. NFC We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly.
  228. AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision:
  229. [CodeGenPrepare] Fix bad IR created by large offset GEP splitting. Creating the IR builder, then modifying the CFG, leads to an IRBuilder where the BB and insertion point are inconsistent, so new instructions have the wrong parent. Modified an existing test because the test wasn't covering anything useful (the "invoke" was not actually an invoke by the time we hit the code in question). Differential Revision:
  230. Disable -faddsig by default for PS4 target.
  231. Fix test commit Seems that was actually a eight space tab...
  232. Test commit Replace tab with 4 spaces.
  233. [llvm-mca] Rename directory for the Cortex tests (NFC)
  234. [llvm-mca] Update Exynos test cases (NFC)
  235. [AArch64] Improve Exynos predicates Expand the predicate `ExynosResetPred` to include all forms of immediate moves.
  236. [AArch64] Use canonical copy idiom Use only the canonical form of the alias for register transfers in the `IsCopyIdiomPred` predicate.
  237. Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions" This reverts commit r349674. It causes a failure in test-suite enc-3des.execution_time.
  238. [analyzer] CStringChecker: Add the forgotten test file. Differential Revision: rdar://problem/45366551
  239. [analyzer] CStringChecker: Fix a crash on C++ overloads of standard functions. It turns out that it's not all that uncommon to have a C++ override of, say, memcpy that receives a structure (or two) by reference (or by value, if it's being copied from) and copies memory from it (or into it, if it's passed by reference). In this case the argument will be of structure type (recall that expressions of reference type do not exist: instead, C++ classifies expressions into prvalues and lvalues and xvalues). In this scenario we crash because we are trying to assume that, say, a memory region is equal to an empty CompoundValue (the non-lazy one; this is what makeZeroVal() return for compound types and it represents prvalue of an object that is initialized with an empty initializer list). Add defensive checks. Differential Revision: rdar://problem/45366551
  240. [llvm-ar] Simplify string table get-or-insert pattern with .insert, NFC
  241. [x86] add test to show ddup hole; NFC (PR37502)
  242. [gn build] Add build file for clang/lib/Basic and dependencies, 2nd try Adds a build file for clang-tblgen and an action for running it, and uses that to process all the .td files in include/clang/Basic. Also adds an action to write include/clang/Config/config.h and include/clang/Basic/ Differential Revision: (The previous commit of this contained unrelated changes, so I reverted the whole previous commit and I'm now landing only what I intended to land.)
  243. Revert 349677, it contained a whole bunch of stuff I did not mean to commit
  244. [gn build] Add build file for clang/lib/Basic and dependencies Adds a build file for clang-tblgen and an action for running it, and uses that to process all the .td files in include/clang/Basic. Also adds an action to write include/clang/Config/config.h and include/clang/Basic/ Differential Revision:
  245. [libcxx] Use custom allocator's `construct` in C++03 when available. Makes libc++ behavior consistent between C++03 and C++11. Can use `decltype` in C++03 because `include/__config` defines a macro when `decltype` is not available. Reviewers: mclow.lists, EricWF, erik.pilkington, ldionne Reviewed By: ldionne Subscribers: dexonsmith, cfe-commits, howard.hinnant, ldionne, christof, jkorous, Quuxplusone Differential Revision:
  246. [BDCE][DemandedBits] Detect dead uses of undead instructions This (mostly) fixes BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The test case has a couple of cases that are not simplified yet. In particular, we're only looking at uses of instructions right now. I think it would make sense to also extend this to arguments. Furthermore DemandedBits doesn't yet know some of the tricks that InstCombine does for the demanded bits or bitwise or/and/xor in combination with known bits information. Differential Revision:
  247. Re-land "Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen" (was reverted by mistake)
  248. [X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC This reduces indentation and makes it obvious this function always returns something.
  249. llvm-dwarfdump: Improve/fix pretty printing of array dimensions This is to address post-commit feedback from Paul Robinson on r348954. The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)). I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)"). Reviewers: aprantl, probinson, JDevlieghere Differential Revision:
  250. PR40096: Forwards-compatible with C++20 rule regarding aggregates not having user-declared ctors Looks like these were in place to make these types move-only. That's generally not a feature that the type should prescribe (unless it's an inherent limitation) - instead leaving it up to the users of a type.
  251. [ThinLTO] Remove dllimport attribute from locally defined symbols Summary: The LTO/ThinLTO driver currently creates invalid bitcode by setting symbols marked dllimport as dso_local. The compiler often has access to the definition (often dllexport) and the declaration (often dllimport) of an object at link-time, leading to a conflicting declaration. This patch resolves the inconsistency by removing the dllimport attribute. Reviewers: tejohnson, pcc, rnk, echristo Reviewed By: rnk Subscribers: dmikulin, wristow, mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits Differential Revision:
  252. [sanitizer] Remove spurious semi-colon Summary: An extra ';' at the end of a namespace triggers a pedantic warning: ``` .../sanitizer_common/sanitizer_type_traits.h:42:2: warning: extra ‘;’ [-Wpedantic] }; // namespace __sanitizer ``` Reviewers: eugenis, delcypher Reviewed By: eugenis Subscribers: kubamracek, #sanitizers, llvm-commits Differential Revision:
  253. [GlobalISel][AArch64] Add support for @llvm.ceil This adds a G_FCEIL generic instruction and uses it in AArch64. This adds selection for floating point ceil where it has a supported, dedicated instruction. Other cases aren't handled here. It updates the relevant gisel tests and adds a select-ceil test. It also adds a check to arm64-vcvt.ll which ensures that we don't fall back when we run into one of the relevant cases.
  254. Work around GCC 9.0 regression
  255. [llvm-mca] Rename an error variable.
  256. [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI. This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect. Differential Revision:
  257. [X86] Fix assert fails in pass X86AvoidSFBPass Fixes The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap. But it currently looks only at the previous one, which will miss some cases that result in assert. This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed. Patch by Pengfei Wang Differential Revision:
  258. [llvm-mca] Add an error handler for error from parseCodeRegions Summary: It's a bit tricky to add a test for the failing path right now, binary support will have an easier path to exercise the path here. * Ran clang-format. Reviewers: andreadb Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Differential Revision:
  259. [OPENMP]Mark the loop as started when initialized. Need to mark the loop as started when the initialization statement is found. It is required to prevent possible incorrect loop iteraton variable detection during template instantiation and fix the compiler crash during the codegen.
  260. Revert r349517 "[CMake] Default options for faster executables on MSVC"
  261. [CodeComplete] Properly determine qualifiers of 'this' in a lambda Summary: The clang used to pick up the qualifiers of the lamba's call operator (which is always const) and fail to show non-const methods of 'this' in completion results. Reviewers: kadircet Reviewed By: kadircet Subscribers: cfe-commits Differential Revision:
  262. Revert r349517 "[CMake] Default options for faster executables on MSVC"
  263. [AArch64] Improve the Exynos M3 pipeline model
  264. [llvm-mca] Split test (NFC) Split the Exynos test of the register offset addressing mode into separate loads and stores tests.
  265. [Driver] [NetBSD] Add -D_REENTRANT when using sanitizers NetBSD intends to support only reentrant interfaces in interceptors. When -lpthread is used without _REENTRANT defined, things are not guaranteed to work. This is especially important for <stdio.h> and sanitization of interfaces around FILE. Some APIs have alternative modes depending on the _REENTRANT definition, and NetBSD intends to support sanitization of the _REENTRANT ones. Differential Revision:
  266. [Driver] Add .hasAnySanitizer() to SanitizerArgs Add a simple method to query whether any sanitizer was enabled, via SanitizerArgs. This will be used in the NetBSD driver to pass additional definitions that are required by all sanitizers. Differential Revision:
  267. [Basic] Correct description of SanitizerSet.empty() Differential Revision:
  268. [Driver] Disable -faddrsig by default on NetBSD Avoid passing -faddrsig by default on NetBSD. This platform is still using old GNU binutils that crashes on executables containing those sections. Differential Revision:
  269. Regenerate test
  270. [sanitizer_common] Fix sha2 interceptors not to use vars in array len Fix the sha2 interceptor macros to use a constant for array parameter length rather than referencing the extern variable. Since the digest length is provided in hash name, reuse the macro parameter for it. Verify that the calculated value matches the one provided by system headers. Differential Revision:
  271. Test commit Fix typos.
  272. [X86] Remove already upgraded llvm.x86.avx512.mask.padds/psubs tests Duplicate tests have already been moved to avx512bw-intrinsics-upgrade.ll
  273. [ValueTracking] remove unused parameters from helper functions; NFC
  274. [BPF] Generate BTF DebugInfo under BPF target This patch implements BTF (BPF Type Format). The BTF is the debug info format for BPF, introduced in the below linux patch: and further extended several times, e.g., The main advantage of implementing in LLVM is: . better integration/deployment as no extra tools are needed. . bpf JIT based compilation (like bcc, bpftrace, etc.) can get BTF without much extra effort. . BTF line_info needs selective source codes, which can be easily retrieved when inside the compiler. This patch implemented BTF generation by registering a BPF specific DebugHandler in BPFAsmPrinter. Signed-off-by: Yonghong Song <> Differential Revision:
  275. Add missing include to test. NFC
  276. [gn build] Merge r349605
  277. [Object] Deduplicate long archive member names Summary: Import libraries as created by llvm-dlltool always use the same archive member name for every object file (namely, the DLL library name). Ensure that long names are not repeatedly stored in the string table. Reviewed By: ruiu Differential Revision:
  278. [clang-tidy] Diagnose abseil-duration-comparison on macro arguments Summary: This change relaxes the requirements on the utility `rewriteExprFromNumberToDuration` function, and introduces new checking inside of the `abseil-duration-comparison` check to allow macro argument expression transformation. Differential Revision:
  279. [OpenMP] Fix data sharing analysis in nested clause Without this patch, clang doesn't complain that X needs explicit data sharing attributes in the following: ``` #pragma omp target teams default(none) { #pragma omp parallel num_threads(X) ; } ``` However, clang does produce that complaint after the braces are removed. With this patch, clang complains in both cases. Reviewed By: ABataev Differential Revision:
  280. [compiler-rt][builtins][PowerPC] Enable builtins tests on PowerPC 64 bit LE This patch aims to enable the tests for the compiler-rt builtin functions (that currently already exist within compiler-rt) for PowerPC 64bit LE (ppc64le). Previously when unit tests are run, these tests would be reported as UNSUPPORTED. This patch updates the REQUIRES line for each test (to enable for ppc64le), and each test is linked against compiler-rt when running. Differential Revision:
  281. Test commit
  282. [clangd] Fix a syntax error on the test.
  283. [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (clang) Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble. Differential Revision:
  284. [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (llvm) Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence. I'm intending to deal with the signed saturation equivalents as well. Clang counterpart: Differential Revision:
  285. [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases. Differential Revision:
  286. [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 1 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. Differential Revision:
  287. Portable Python script across Python version urllib2 as been renamed into urllib and the library layout has changed. Workaround that in a consistent manner. Differential Revision:
  288. [Index] Index paremeters in lambda expressions. Summary: This fixes clangd couldn't find references for lambda parameters. Reviewers: ilya-biryukov Subscribers: ioeric, arphaman, kadircet, cfe-commits Differential Revision:
  289. [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091) As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision:
  290. Let TableGen write output only if it changed, instead of doing so in cmake, attempt 2 This relands r330742: """ Let TableGen write output only if it changed, instead of doing so in cmake. Removes one subprocess and one temp file from the build for each tablegen invocation. No intended behavior change. """ In particular, if you see rebuilds after this change that you didn't see before this change, that's unintended and it's fine to revert this change again (but let me know). r330742 got reverted because some people reported that llvm-tblgen ran on every build after it. This could happen if the depfile output got deleted without deleting the main .inc output. To fix, make TableGen always write the depfile, but keep writing the main .inc output only if it has changed. This matches what we did in cmake before. Differential Revision:
  291. [clang-tidy] use "const SourceManager&" parameter, NFC.
  292. Fix test MC/AMDGPU/reloc.s Missed this change in r349620 Change-Id: I5123e31ed4bb99ad6903b9ede4de4dbe2cc6d453
  293. [X86][SSE] Remove use of SSE ADDS/SUBS saturation intrinsics from schedule/stack tests These are due to be upgraded soon, but good to replace them with generic llvm sadd_sat/ssub_sat intrinsics now. The avx512 masked cases need doing as well but require a bit of tidyup first.
  294. AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1 Summary: Using HI here makes no logical sense, since the dword is only 32 bits to begin with. Current Mesa master does not look at the relocation type at all, so this change is fine. Future Mesa will rely on this, however. Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision:
  295. Reimplement Thread Static Data ASan routines with TLS Summary: Thread Static Data cannot be used in early init on NetBSD and FreeBSD. Reuse the ASan TSD API for compatibility with existing code with an alternative implementation using Thread Local Storage. New version uses Thread Local Storage to store a pointer with thread specific data. The destructor from TSD has been replaced with a TLS destrucutor that is called upon thread exit. Reviewers: joerg, vitalybuka, jfb Reviewed By: vitalybuka Subscribers: dim, emaste, ro, jfb, devnexen, kubamracek, mgorny, llvm-commits, #sanitizers Tags: #sanitizers Differential Revision:
  296. [clangd] Unify path canonicalizations in the codebase Summary: There were a few different places where we canonicalized paths, each one had its own flavor. This patch tries to unify them all under one place. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  297. [llvm-objdump] - Fix one more BB. Should fix the /home/grosser/buildslave/polly-amd64-linux/llvm.src/tools/llvm-objdump/llvm-objdump.cpp:539:25: error: conditional expression is ambiguous; 'std::string' (aka 'basic_string<char>') can be converted to 'typename std::remove_reference<StringRef>::type' (aka 'llvm::StringRef') and vice versa Target = Demangle ? demangle(*SymName) : *SymName;
  298. [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs. This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument. I've updated SelectionDAG::simplifyShift to demonstrate its use. Differential Revision:
  299. [X86][SSE] Remove SSE ADDUS/SUBUS saturation intrinsics from schedule/stack tests These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow). The avx512 masked cases need doing as well but require a bit of tidyup first.
  300. [llvm-objdump] - Fix BB. Move the helper method before the first incocation in the file.
  301. [llvm-objdump] - Demangle the symbols when printing symbol table and relocations. This is, llvm-objdump does not demangle the symbols when prints symbol table and/or relocations. Patch teaches it to do that. Differential revision:
  302. AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged Summary: Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged. This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds. Irreducible loop test has been extended based on a CTS failure detected for GFX9. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision:
  303. [ARM GlobalISel] Support G_CONSTANT for Thumb2 All we have to do is mark it as legal. This allows us to select a lot of new patterns handled by TableGen. This patch adds tests for them and splits up the existing test file for binary operators into 2 files, one for arithmetic ops and one for logical ones.
  304. tsan: align default value of detect_deadlocks flag with actual behavior I tricked myself into thinking that deadlock detection is off by default in TSan by looking at the default value of the detect_deadlocks flag and outdated docs. (Created a pull request to update docs.) I even managed to confuse others:!topic/thread-sanitizer/xYvnAYwtoDk However, the default value is overwritten in code ( The TSan/deadlock tests also rely on this This changes aligns the default value of the flag with the actual default behavior. Author: yln (Julian Lettner) Reviewed in:
  305. AMDGPU/GlobalISel: Regbankselect for fsub
  306. [llvm-objcopy] [COFF] Fix the Object forward declaration This fixes build warnings with clang, and linker errors with MSVC.
  307. [llvm-objcopy] Initial COFF support This is an initial implementation of no-op passthrough copying of COFF with objcopy. Differential Revision:
  308. Use "EvaluateAsRValue" instead of as a known int, because if it's not a known integer we want to emit a diagnostic instead of asserting.
  309. Revert accidentally included code.
  310. [DebugInfo] Make AsmPrinter struct HandlerInfo and Handlers protected In AsmPrinter, make struct HandlerInfo and SmallVector Handlers protected, so target extended AsmPrinter will be able to add their own handlers. Signed-off-by: Yonghong Song <> Differential Revision:
  311. [bugpoint][PR29027] Reduce function attributes Summary: In addition to reducing the functions in an LLVM module, bugpoint now reduces the function attributes associated with each of the remaining functions. To test this, add a -bugpoint-crashfuncattr test pass, which crashes if a function in the module has a "bugpoint-crash" attribute. A test case demonstrates that the IR is reduced to just that one attribute. Reviewers: MatzeB, silvas, davide, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision:
  312. Fix use-after-free with profile remapping. We need to keep the underlying profile reader alive as long as the profile data, because the profile data may contain StringRefs referring to strings in the reader's name table.
  313. [PowerPC]Exploit P9 vabsdu for unsigned vselect patterns For type v4i32/v8ii16/v16i8, do following transforms: (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b) (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b) Differential Revision:
  314. [gn build] Add build file for llvm-objcopy Needed by check-lld. This should've been part of r349486 but I messed up. Differential Revision:
  315. Re-land "Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen" Previously, when compiling Visual Studio targets, one could see random build errors. This was caused by tablegen projects using the same build folders. This workaround simply chains tablegen projects. Differential Revision:
  316. Add llvm-objdump man page Differential Revision:
  317. [asan] Disable ODR test on Android
  318. [Driver] Also obey -nostdlib++ when rewriting -lstdc++. Reviewers: pirama Reviewed By: pirama Subscribers: cfe-commits Differential Revision:
  319. [AArch64] Simplify the Exynos M3 pipeline model
  320. [AArch64] Fix instructions order (NFC)
  321. [llvm-mca] Improve test (NFC) Add more instruction variations for Exynos.
  322. Portability fix: add missing includes and static_casts. Reviewed as Thanks to Andrey Maksimov for the patch.
  323. [DebugInfo] Move several private headers to include directory This patch moved the following files in lib/CodeGen/AsmPrinter/ AsmPrinterHandler.h DbgEntityHistoryCalculator.h DebugHandlerBase.h to include/llvm/CodeGen directory. Such a change will enable Target to extend DebugHandlerBase and emit Target specific debug info sections. Signed-off-by: Yonghong Song <> Differential Revision:
  324. Emit ASM input in a constant context Summary: Some ASM input constraints (e.g., "i" and "n") require immediate values. At O0, very few code transformations are performed. So if we cannot resolve to an immediate when emitting the ASM input we shouldn't delay its processing. Reviewers: rsmith, efriedma Reviewed By: efriedma Subscribers: rehana, efriedma, craig.topper, jyknight, cfe-commits Differential Revision:
  325. [InstCombine] add tests for extract of vector load; NFC There's a mismatch internally about how we are handling these patterns. We count loads as cheapToScalarize(), but then we don't actually scalarize them, so that can leave extra instructions compared to where we started when scalarizing other ops. If it's cheapToScalarize, then we should be scalarizing.
  326. Preserve the linkage for objc* intrinsics as clang will set them to weak_external in some cases Clang uses weak linkage for objc runtime functions when they are not available on the platform. The intrinsic has this linkage so we just need to pass that on to the runtime call.
  327. Add nonlazybind to objc_retain/objc_release when converting from intrinsics. For performance reasons, clang set nonlazybind on these functions. Now that we are using intrinsics instead of runtime calls, we should set this attribute when creating the runtime functions.
  328. [LAA] Introduce enum for vectorization safety status (NFC). This patch adds a VectorizationSafetyStatus enum, which will be extended in a follow up patch to distinguish between 'safe with runtime checks' and 'known unsafe' dependences. Reviewers: anemet, anna, Ayal, hsaito Reviewed By: Ayal Differential Revision:
  329. [asan] Restore ODR-violation detection on vtables Summary: unnamed_addr is still useful for detecting of ODR violations on vtables Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false reports which can be avoided with --icf=none or by using private aliases with -fsanitize-address-use-odr-indicator Reviewers: eugenis Reviewed By: eugenis Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision:
  330. [LoopVectorize] auto-generate complete checks; NFC The first test claims to show that the vectorizer will generate a vector load/loop, but then this file runs other passes which might scalarize that op. I'm removing instcombine from the RUN line here to break that dependency. Also, I'm generating full checks to make it clear exactly what the vectorizer has done.
  331. Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG. SelectionDAG currently changes these intrinsics to function calls, but that won't work for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming from the front-end which we can't do in SelectionDAG.
  332. [OPENMP] parsing and sema support for 'close' map-type-modifier A map clause with the close map-type-modifier is a hint to prefer that the variables are mapped using a copy into faster memory. Patch by Ahsan Saghir (saghir) Differential Revision:
  333. [AArch64] Avoid crashing on .seh directives in assembly Differential Revision:
  334. [InstCombine] auto-generate complete checks; NFC
  335. Fix errors with the Clang natvis file. This updates the FunctionProtoType visualizer to use the proper bits for determining parameter information and the DeclarationName visualizer to use the detail namespace. It also adds support for viewing newer special declaration names (like deduction guides). Patch with help of Bruno Ricci.
  336. Revert r349541 (Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen)
  337. [asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at Let's look through bitcasts when iterating over entries in the llvm.asan.globals list. Differential Revision:
  338. [AARCH64] Added test case for PR40091
  339. [CodeGen] Handle mixed-width ops in mixed-sign mul-with-overflow lowering The special lowering for __builtin_mul_overflow introduced in r320902 fixed an ICE seen when passing mixed-sign operands to the builtin. This patch extends the special lowering to cover mixed-width, mixed-sign operands. In a few common scenarios, calls to muloti4 will no longer be emitted. This should address the latest comments in PR34920 and work around the link failure seen in: Testing: - check-clang - A/B output comparison with: Differential Revision:
  340. Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen Previously, when compiling Visual Studio targets, one could see random build errors. This was caused by tablegen projects using the same build folders. This workaround simply chains tablegen projects. Differential Revision:
  341. [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission.
  342. [llvm-mca] Update the Exynos test cases (NFC) Add more entropy to the test cases.
  343. [llvm-mca] Dump mask in hex Dump the resources masks as hexadecimal.
  344. Generate objc intrinsics instead of runtime calls as the ARC optimizer now works only on intrinsics Differential Revision: Reviewers: rjmccall
  345. Change the objc ARC optimizer to use the new objc.* intrinsics We're moving ARC optimisation and ARC emission in clang away from runtime methods and towards intrinsics. This is the part which actually uses the intrinsics in the ARC optimizer when both analyzing the existing calls and emitting new ones. Differential Revision: Reviewers: ahatanak
  346. [X86] Add BSR to isUseDefConvertible. We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there. This addresses one issue from PR40090.
  347. [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check Checking whether a number has a certain number of trailing / leading zeros means checking whether it is of the form XXXX1000 / 0001XXXX, which can be done with an and+icmp. Related to As a next step, this can be extended to non-equality predicates. Differential Revision:
  348. [AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen
  349. DebugInfo: Fix missing local imported entities after r349207 Post commit review/bug reported by Pavel Labath - thanks!
  350. [SCCP] Get rid of redundant call for getPredicateInfoFor (NFC). We can use the result fetched a few lines above.
  351. [X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. Let type legalization and op legalization deal with it. Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us.
  352. [OPENMP][NVPTX]Added extra sync point to the inter-warp copy function. The parallel reduction operation requires an extra synchronization point in the inter-warp copy function to avoid divergence.
  353. [InstCombine] refactor isCheapToScalarize(); NFC As the FIXME indicates, this has the potential to go overboard. So I'm not sure if it's even worth keeping this vs. iteratively doing simple matches, but we might as well clean it up.
  354. Rework the C strings tests to use ASSERT_SAME_TYPE. NFC there. Also change cwchar.pass.cpp to avoid constructing a couple things from zero - since apparently they can be enums in some weird C library. NFC there, either, since the values were never used.
  355. [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision:
  356. [X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision:
  357. Buildfix for r345516 (Clang compilation failing).
  358. [CMake] Default options for faster executables on MSVC - Disable incremental linking by default. /INCREMENTAL adds extra thunks in the EXE, which makes execution slower. - Set /MT (static CRT lib) by default instead of CMake's default /MD (dll CRT lib). The previous default /MD makes all DLL functions to be thunked, thus making execution slower (memcmp, memset, etc.) - Adds LLVM_ENABLE_INCREMENTAL_LINK which is set to OFF by default. Differential revision:
  359. [llvm-symbolizer] Omit stderr output when symbolizing a crash Differential revision:
  360. [InstCombine] add tests for scalarization; NFC We miss pattern matching a splat constant if it has undef elements.
  361. Add FMF management to common fp intrinsics in GlobalIsel Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE. Reviewers: aditya_nandakumar, bogner Reviewed By: bogner Subscribers: rovka, kristof.beyls, javed.absar Differential Revision:
  362. [LoopVectorize] Rename pass options. NFC. Rename: NoUnrolling to InterleaveOnlyWhenForced and AlwaysVectorize to !VectorizeOnlyWhenForced Contrary to what the name 'AlwaysVectorize' suggests, it does not unconditionally vectorize all loops, but applies a cost model to determine whether vectorization is profitable to all loops. Hence, passing false will disable the cost model, except when a loop is marked with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested by @hfinkel in D55716) better matches this behavior. Similarly, 'NoUnrolling' disables the profitability cost model for interleaving (a term to distinguish it from unrolling by the LoopUnrollPass); rename it for consistency. Differential Revision:
  363. [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.
  364. [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops. When using clang with `-fno-unroll-loops` (implicitly added with `-O1`), the LoopUnrollPass is not not added to the (legacy) pass pipeline. This also means that it will not process any loop metadata such as llvm.loop.unroll.enable (which is generated by #pragma unroll or WarnMissedTransformationsPass emits a warning that a forced transformation has not been applied (see Such explicit transformations should take precedence over disabling heuristics. This patch unconditionally adds LoopUnrollPass to the optimizing pipeline (that is, it is still not added with `-O0`), but passes a flag indicating whether automatic unrolling is dis-/enabled. This is the same approach as LoopVectorize uses. The new pass manager's pipeline builder has no option to disable unrolling, hence the problem does not apply. Differential Revision:
  365. [Driver][PS4] Do not implicitly link against asan or ubsan if -nostdlib or -nodefaultlibs on PS4. NFC for targets other than PS4. Respect -nostdlib and -nodefaultlibs when enabling asan or ubsan. Differential Revision:
  366. [NFC] Fix usage of Builder.insert(new Bitcast...)in CodeGenFunction This is exactly a "CreateBitCast", so refactor this to get rid of a 'new'. Note that this slightly changes the test, as the Builder is now seemingly smart enough to fold one of the bitcasts into the annotation call. Change-Id: I1733fb1fdf91f5c9d88651067130b9a4e7b5ab67
  367. Portable Python script across Python version Make scripts more future-proof by importing most __future__ stuff. Differential Revision:
  368. Portable Python script across Python version commands.getoutput has been move to subprocess module in Python3 Differential Revision:
  369. [clangd] Try to fix buildbot failure after r349496 Increase timeout from 10ms to 100ms. See
  370. Portable Python script across Python version In Python3, dict.items, dict.keys, dict.values, zip, map and filter no longer return lists, they create generator instead. The portability patch consists in forcing an extra `list` call if the result is actually used as a list. `map` are replaced by list comprehension and `filter` by filtered list comprehension. Differential Revision:
  371. [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.
  372. [MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM and use integer type of correct size when creating arguments for CLI.lowerCall. Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64 on MIPS32. Differential Revision:
  373. Emit -Wformat properly for bit-field promotions. Only explicitly look through integer and floating-point promotion where the result type is actually a promotion, which is not always the case for bit-fields in C. Patch by Bevin Hansson.
  374. [clangd] BackgroundIndex rebuilds symbol index periodically. Summary: Currently, background index rebuilds symbol index on every indexed file, which can be inefficient. This patch makes it only rebuild symbol index periodically. As the rebuild no longer happens too often, we could also build more efficient dex index. Reviewers: ilya-biryukov, kadircet Reviewed By: kadircet Subscribers: dblaikie, MaskRay, jkorous, arphaman, jfb, cfe-commits Differential Revision:
  375. [AST] Unify the code paths of traversing lambda expressions. Summary: This supposes to be a non-functional change. We have two code paths when traversing lambda expressions: 1) traverse the function proto typeloc when parameters and return type are explicit; 2) otherwise fallback to traverse parameter decls and return type loc individually; This patch unifies the code path to always traverse parameters and return type, rather than relying on traversing the full type-loc. Reviewers: ilya-biryukov Subscribers: arphaman, cfe-commits Differential Revision:
  376. Fix a gcc -Wpedantix warning
  377. [gn build] Add build file for llvm-pdbutil Needed for check-lld. Differential Revision:
  378. [PowerPC] Make no-PIC default to match GCC - CLANG Make -fno-PIC default on PowerPC for Little Endian Linux. Differential Revision:
  379. [gn build] Add build file for llvm-bcanalyzer Needed for check-lld. Differential Revision:
  380. [gn build] Add build files for llvm-ar, llvm-nm, llvm-objdump, llvm-readelf Also add build files for deps DebugInfo/Symbolize, ToolDrivers/dll-tool. Also add gn/build/libs/xar (needed by llvm-objdump). Also delete an incorrect part of the symlink description in // (it used to be true before I made the symlink step write a stamp file; now it's no longer true). These are all binaries needed by check-lld that need symlinks. Differential Revision:
  381. [libcxx] Remove XFAILs for older macOS versions That test doesn't fail anymore since r349378, since the assertions that r349378 removed must have been bugs in the dylib at some point.
  382. [X86][SSE] Add shift combine 'out of range' tests with UNDEFs Shows failure to simplify out of range shift amounts to UNDEF if any element is UNDEF.
  383. [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision:
  384. [SelectionDAG][X86] Fix [US](ADD|SUB)SAT vector legalization, add tests Integer result promotion needs to use the scalar size, and we need support for result widening. This is in preparation for D55787.
  385. [docs] Improve HowToCrossCompilerBuiltinsOnArm Some recent experience on llvm-dev pointed out some errors in the document: - Assumption of ninja - Use of --march rather than -march - Problems with host include files when a multiarch setup was used - Insufficient target information passed to assembler - Instructions on using the cmake cache file BaremetalARM.cmake were incomplete There was also insufficient guidance on what to do when various stages failed due to misconfiguration or missing steps. Summary of changes: - Fixed problems above - Added a troubleshooting section with common errors. - Cleared up one "at time of writing" that is no longer a problem. Differential Revision:
  386. [llvm-dwarfdump] - Do not error out on R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations. This is, If we have the following code (test.cpp): thread_local int tdata = 24; and build an .o file with debug information: clang --target=x86_64-pc-linux -c bar.cpp -g Then object produced may have R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations. (clang emits R_X86_64_DTPOFF64 and gcc emits R_X86_64_DTPOFF32 for the code above for me) Currently, llvm-dwarfdump fails to compute this TLS relocation when dumping object and reports an error: failed to compute relocation: R_X86_64_DTPOFF64, Invalid data was encountered while parsing the file This relocation represents the offset in the TLS block and resolved by the linker, but this info is unavailable at the point when the object file is dumped by this tool. The patch adds the simple evaluation for such relocations to avoid emitting errors. Resulting behavior seems to be equal to GNU dwarfdump. Differential revision:
  387. [MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR Add narrowScalar for G_AND and G_XOR. Legalize G_AND G_OR and G_XOR for types other then s32 with clampScalar on MIPS32. Differential Revision:
  388. [AArch64] - Return address signing dwarf support - Reapply changes intially introduced in r343089 - The archtecture info is no longer loaded whenever a DWARFContext is created - The runtimes libraries (santiziers) make use of the dwarf context classes but do not intialise the target info - The architecture of the object can be obtained without loading the target info - Adding a method to the dwarf context to get this information and multiplex the string printing later on Differential Revision:
  389. [X86][AVX] Add 256/512-bit vector funnel shift tests Extra coverage for D55747
  390. [X86][SSE] Add 128-bit vector funnel shift tests Extra coverage for D55747
  391. [IPO][AVR] Create new Functions in the default address space specified in the data layout This modifies the IPO pass so that it respects any explicit function address space specified in the data layout. In targets with nonzero program address spaces, all functions should, by default, be placed into the default program address space. This is required for Harvard architectures like AVR. Without this, the functions will be marked as residing in data space, and thus not be callable. This has no effect to any in-tree official backends, as none use an explicit program address space in their data layouts. Patch by Tim Neumann.
  392. AMDGPU: Legalize/regbankselect frame_index
  393. AMDGPU: Legalize/regbankselect fma
  394. [TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.
  395. SROA: preserve alignment tags on loads and stores. When splitting up an alloca's uses we were dropping any explicit alignment tags, which means they default to the ABI-required default alignment and this can cause miscompiles if the real value was smaller. Also refactor the TBAA metadata into a parent class since it's shared by both children anyway.
  396. GlobalISel: Improve crash on invalid mapping If NumBreakDowns is 0, BreakDown is null. This trades a null dereference with an assert somewhere else.
  397. AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub
  398. [X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 This works better as part of SimplifyDemandedBits than part of the general combine.
  399. [X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling.
  400. Introduce control flow speculation tracking pass for AArch64 The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision:
  401. Portable Python script across Python version In Python2, division between integer yields an integer, while it yields a float in Python3. Use a combination of from __future__ import division and // operator to get a portable behavior. Differential Revision:
  402. Portable Python script across Python version Using from __future__ import print_function it is possible to have a compatible behavior of `print(...)` across Python version. Differential Revision:
  403. [unittests] Remove superfluous semicolon, fixing warnings with GCC. NFC.
  404. [Driver] Automatically enable -munwind-tables if -fseh-exceptions is enabled For targets where SEH exceptions are used by default (on MinGW, only x86_64 so far), -munwind-tables are added automatically. If -fseh-exeptions is enabled on a target where SEH exeptions are availble but not enabled by default yet (aarch64), we need to pass -munwind-tables if -fseh-exceptions was specified. Differential Revision:
  405. [AArch64] [MinGW] Allow enabling SEH exceptions The default still is dwarf, but SEH exceptions can now be enabled optionally for the MinGW target. Differential Revision:
  406. [X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when the flag result is used. A similar things happen to TBM instructions which we already have tests for.
  407. Portable Python script across Python version ConfigParser module has been renamed as configparser in Python3 Differential Revision:
  408. Portable Python script across Python version Replace `xrange(...)` by either `range(...)` or `list(range(...))` depending on the context. Differential Revision:
  409. Portable Python script across Python version dict no longer have the `has_key` method in Python3. Instead, one can use the `in` keyword which already works in Python2.
  410. [PowerPC][NFC]Update vabsd cases with vselect test cases Power9 VABSDU* instructions can be exploited for some special vselect sequences. Check in the orignal test case here, later the exploitation patch will update this and reviewers can check the differences easily.
  411. [PowerPC] Exploit power9 new instruction setb Check the expected pattens feeding to SELECT_CC like: (select_cc lhs, rhs, 1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, 1, -1, cc2), seteq) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, -1, 1, cc2), seteq) Further transform the sequence to comparison + setb if hits. Differential Revision:
  412. [ExprConstant] Handle compound assignment when LHS has integral type and RHS has floating point type Fixes PR39858 Differential Revision:
  413. [NFC] Add new test to cover the lhs scheduling issue for P9.
  414. Automatic variable initialization Summary: Add an option to initialize automatic variables with either a pattern or with zeroes. The default is still that automatic variables are uninitialized. Also add attributes to request uninitialized on a per-variable basis, mainly to disable initialization of large stack arrays when deemed too expensive. This isn't meant to change the semantics of C and C++. Rather, it's meant to be a last-resort when programmers inadvertently have some undefined behavior in their code. This patch aims to make undefined behavior hurt less, which security-minded people will be very happy about. Notably, this means that there's no inadvertent information leak when: - The compiler re-uses stack slots, and a value is used uninitialized. - The compiler re-uses a register, and a value is used uninitialized. - Stack structs / arrays / unions with padding are copied. This patch only addresses stack and register information leaks. There's many more infoleaks that we could address, and much more undefined behavior that could be tamed. Let's keep this patch focused, and I'm happy to address related issues elsewhere. To keep the patch simple, only some `undef` is removed for now, see `replaceUndef`. The padding-related infoleaks are therefore not all gone yet. This will be addressed in a follow-up, mainly because addressing padding-related leaks should be a stand-alone option which is implied by variable initialization. There are three options when it comes to automatic variable initialization: 0. Uninitialized This is C and C++'s default. It's not changing. Depending on code generation, a programmer who runs into undefined behavior by using an uninialized automatic variable may observe any previous value (including program secrets), or any value which the compiler saw fit to materialize on the stack or in a register (this could be to synthesize an immediate, to refer to code or data locations, to generate cookies, etc). 1. Pattern initialization This is the recommended initialization approach. Pattern initialization's goal is to initialize automatic variables with values which will likely transform logic bugs into crashes down the line, are easily recognizable in a crash dump, without being values which programmers can rely on for useful program semantics. At the same time, pattern initialization tries to generate code which will optimize well. You'll find the following details in `patternFor`: - Integers are initialized with repeated 0xAA bytes (infinite scream). - Vectors of integers are also initialized with infinite scream. - Pointers are initialized with infinite scream on 64-bit platforms because it's an unmappable pointer value on architectures I'm aware of. Pointers are initialize to 0x000000AA (small scream) on 32-bit platforms because 32-bit platforms don't consistently offer unmappable pages. When they do it's usually the zero page. As people try this out, I expect that we'll want to allow different platforms to customize this, let's do so later. - Vectors of pointers are initialized the same way pointers are. - Floating point values and vectors are initialized with a negative quiet NaN with repeated 0xFF payload (e.g. 0xffffffff and 0xffffffffffffffff). NaNs are nice (here, anways) because they propagate on arithmetic, making it more likely that entire computations become NaN when a single uninitialized value sneaks in. - Arrays are initialized to their homogeneous elements' initialization value, repeated. Stack-based Variable-Length Arrays (VLAs) are runtime-initialized to the allocated size (no effort is made for negative size, but zero-sized VLAs are untouched even if technically undefined). - Structs are initialized to their heterogeneous element's initialization values. Zero-size structs are initialized as 0xAA since they're allocated a single byte. - Unions are initialized using the initialization for the largest member of the union. Expect the values used for pattern initialization to change over time, as we refine heuristics (both for performance and security). The goal is truly to avoid injecting semantics into undefined behavior, and we should be comfortable changing these values when there's a worthwhile point in doing so. Why so much infinite scream? Repeated byte patterns tend to be easy to synthesize on most architectures, and otherwise memset is usually very efficient. For values which aren't entirely repeated byte patterns, LLVM will often generate code which does memset + a few stores. 2. Zero initialization Zero initialize all values. This has the unfortunate side-effect of providing semantics to otherwise undefined behavior, programs therefore might start to rely on this behavior, and that's sad. However, some programmers believe that pattern initialization is too expensive for them, and data might show that they're right. The only way to make these programmers wrong is to offer zero-initialization as an option, figure out where they are right, and optimize the compiler into submission. Until the compiler provides acceptable performance for all security-minded code, zero initialization is a useful (if blunt) tool. I've been asked for a fourth initialization option: user-provided byte value. This might be useful, and can easily be added later. Why is an out-of band initialization mecanism desired? We could instead use -Wuninitialized! Indeed we could, but then we're forcing the programmer to provide semantics for something which doesn't actually have any (it's uninitialized!). It's then unclear whether `int derp = 0;` lends meaning to `0`, or whether it's just there to shut that warning up. It's also way easier to use a compiler flag than it is to manually and intelligently initialize all values in a program. Why not just rely on static analysis? Because it cannot reason about all dynamic code paths effectively, and it has false positives. It's a great tool, could get even better, but it's simply incapable of catching all uses of uninitialized values. Why not just rely on memory sanitizer? Because it's not universally available, has a 3x performance cost, and shouldn't be deployed in production. Again, it's a great tool, it'll find the dynamic uses of uninitialized variables that your test coverage hits, but it won't find the ones that you encounter in production. What's the performance like? Not too bad! Previous publications [0] have cited 2.7 to 4.5% averages. We've commmitted a few patches over the last few months to address specific regressions, both in code size and performance. In all cases, the optimizations are generally useful, but variable initialization benefits from them a lot more than regular code does. We've got a handful of other optimizations in mind, but the code is in good enough shape and has found enough latent issues that it's a good time to get the change reviewed, checked in, and have others kick the tires. We'll continue reducing overheads as we try this out on diverse codebases. Is it a good idea? Security-minded folks think so, and apparently so does the Microsoft Visual Studio team [1] who say "Between 2017 and mid 2018, this feature would have killed 49 MSRC cases that involved uninitialized struct data leaking across a trust boundary. It would have also mitigated a number of bugs involving uninitialized struct data being used directly.". They seem to use pure zero initialization, and claim to have taken the overheads down to within noise. Don't just trust Microsoft though, here's another relevant person asking for this [2]. It's been proposed for GCC [3] and LLVM [4] before. What are the caveats? A few! - Variables declared in unreachable code, and used later, aren't initialized. This goto, Duff's device, other objectionable uses of switch. This should instead be a hard-error in any serious codebase. - Volatile stack variables are still weird. That's pre-existing, it's really the language's fault and this patch keeps it weird. We should deprecate volatile [5]. - As noted above, padding isn't fully handled yet. I don't think these caveats make the patch untenable because they can be addressed separately. Should this be on by default? Maybe, in some circumstances. It's a conversation we can have when we've tried it out sufficiently, and we're confident that we've eliminated enough of the overheads that most codebases would want to opt-in. Let's keep our precious undefined behavior until that point in time. How do I use it: 1. On the command-line: -ftrivial-auto-var-init=uninitialized (the default) -ftrivial-auto-var-init=pattern -ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang 2. Using an attribute: int dont_initialize_me __attribute((uninitialized)); [0]: [1]: [2]: [3]: [4]: [5]: I've also posted an RFC to cfe-dev: <rdar://problem/39131435> Reviewers: pcc, kcc, rsmith Subscribers: JDevlieghere, jkorous, dexonsmith, cfe-commits Differential Revision:
  415. [X86] Add test case for PR40060. NFC
  416. [X86] Const correct some helper functions X86InstrInfo.cpp. NFC
  417. [NFC] fix test case issue that with wrong label check.
  418. [CaptureTracking] Pass MaxUsesToExplore from wrappers to the actual implementation This is a follow up for rL347910. In the original patch I somehow forgot to pass the limit from wrappers to the function which actually does the job.
  419. [PowerPC] Improve vec_abs on P9 Improve the current vec_abs support on P9, generate ISD::ABS node for vector types, combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns, do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity. Differential Revision:
  420. [Support] Fix GNU/kFreeBSD build Patch by James Clarke. Differential Revision:
  421. [codeview] Update comment on aligning symbol records
  422. [FileCheck] Try to fix test on windows due to r349418
  423. [codeview] Align symbol records to save 441MB during linking clang.pdb In PDBs, symbol records must be aligned to four bytes. However, in the object file, symbol records may not be aligned. MSVC does not pad out symbol records to make sure they are aligned. That means the linker has to do extra work to insert the padding. Currently, LLD calculates the required space with alignment, and copies each record one at a time while padding them out to the correct size. It has a fast path that avoids this copy when the records are already aligned. This change fixes a bug in that codepath so that the copy is actually saved, and tweaks LLVM's symbol record emission to align symbol records. Here's how things compare when doing a plain clang Release+PDB build: - objs are 0.65% bigger (negligible) - link is 3.3% faster (negligible) - saves allocating 441MB - new LLD high water mark is ~1.05GB
  424. Recommit r348806: DebugInfo: Use symbol difference for CU length to simplify assembly reading/editing Mucking about simplifying a test case ( ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer). Fix: Predicated all the changes (including creating the labels, even if they aren't used/needed) behind the NVPTX useSectionsAsReferences, avoiding emitting labels in NVPTX where ptxas can't parse them. Reviewers: JDevlieghere, probinson, ABataev Differential Revision:
  425. hwasan: Allow range of frame descriptors to be empty. As of r349413 it's now possible for a binary to contain an empty hwasan frame section. Handle that case simply by doing nothing. Differential Revision:
  426. [libcxx] Handle AppleClang 9 and 10 in XFAILs for aligned allocation tests I forgot that those don't behave like Clang trunk, again.
  427. [libcxx] Properly mark aligned allocation macro test as XFAIL on OS X This test was initially marked as XFAIL using `XFAIL: macosx10.YY`, and was then moved to `UNSUPPORTED: macosx10.YY`. The intent is to mark the test as XFAILing when a deployment target older than macosx10.14 is used, and the right way to do this is `XFAIL: availability=macosx10.YY`.
  428. [FileCheck] Annotate input dump (final tweaks) Apply final suggestions from probinson for this patch series plus a few more tweaks: * Improve various docs, for MatchType in particular. * Rename some members of MatchType. The main problem was that the term "final match" became a misnomer when CHECK-COUNT-<N> was created. * Split InputStartLine, etc. declarations into multiple lines. Differential Revision: Reviewed By: probinson
  429. [FileCheck] Annotate input dump (7/7) This patch implements annotations for diagnostics reporting CHECK-NOT failed matches. These diagnostics are enabled by -vv. As for diagnostics reporting failed matches for other directives, these annotations mark the search ranges using `X~~`. The difference here is that failed matches for CHECK-NOT are successes not errors, so they are green not red when colors are enabled. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - ^~~ marks good match (reported if -v) - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - CHECK-DAG overlapping match (discarded, reported if -vv) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - CHECK-NOT not found (success, reported if -vv) - CHECK-DAG not found after discarded matches (error) - ? marks fuzzy match when no match is found - colors success, error, fuzzy match, discarded match, unmatched input If you are not seeing color above or in input dumps, try: -color $ FileCheck -vv -dump-input=always check5 < input5 |& sed -n '/^<<<</,$p' <<<<<< 1: abcdef check:1 ^~~ not:2 X~~ 2: ghijkl not:2 ~~~ check:3 ^~~ 3: mnopqr not:4 X~~~~~ 4: stuvwx not:4 ~~~~~~ 5: eof:4 ^ >>>>>> $ cat check5 CHECK: abc CHECK-NOT: foobar CHECK: jkl CHECK-NOT: foobar $ cat input5 abcdef ghijkl mnopqr stuvwx ``` Reviewed By: george.karpenkov, probinson Differential Revision:
  430. [FileCheck] Annotate input dump (6/7) This patch implements input annotations for diagnostics reporting CHECK-DAG discarded matches. These diagnostics are enabled by -vv. These annotations mark discarded match ranges using `!~~` because they are bad matches even though they are not errors. CHECK-DAG discarded matches create another case where there can be multiple match results for the same directive. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - ^~~ marks good match (reported if -v) - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - CHECK-DAG overlapping match (discarded, reported if -vv) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - CHECK-DAG not found after discarded matches (error) - ? marks fuzzy match when no match is found - colors success, error, fuzzy match, discarded match, unmatched input If you are not seeing color above or in input dumps, try: -color $ FileCheck -vv -dump-input=always check4 < input4 |& sed -n '/^<<<</,$p' <<<<<< 1: abcdef dag:1 ^~~~ dag:2'0 !~~~ discard: overlaps earlier match 2: cdefgh dag:2'1 ^~~~ check:3 X~ error: no match found >>>>>> $ cat check4 CHECK-DAG: abcd CHECK-DAG: cdef CHECK: efgh $ cat input4 abcdef cdefgh ``` This shows that the line 3 CHECK fails to match even though its pattern appears in the input because its search range starts after the line 2 CHECK-DAG's match range. The trouble might be that the line 2 CHECK-DAG's match range is later than expected because its first match range overlaps with the line 1 CHECK-DAG match range and thus is discarded. Because `!~~` for CHECK-DAG does not indicate an error, it is not colored red. Instead, when colors are enabled, it is colored cyan, which suggests a match that went cold. Reviewed By: george.karpenkov, probinson Differential Revision:
  431. [FileCheck] Annotate input dump (5/7) This patch implements input annotations for diagnostics enabled by -v, which report good matches for directives. These annotations mark match ranges using `^~~`. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - ^~~ marks good match (reported if -v) - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - ? marks fuzzy match when no match is found - colors success, error, fuzzy match, unmatched input If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check3 < input3 |& sed -n '/^<<<</,$p' <<<<<< 1: abc foobar def check:1 ^~~ not:2 !~~~~~ error: no match expected check:3 ^~~ >>>>>> $ cat check3 CHECK: abc CHECK-NOT: foobar CHECK: def $ cat input3 abc foobar def ``` -vv enables these annotations for FileCheck's implicit EOF patterns as well. For an example where EOF patterns become relevant, see patch 7 in this series. If colors are enabled, `^~~` is green to suggest success. -v plus color enables highlighting of input text that has no final match for any expected pattern. The highlight uses a cyan background to suggest a cold section. This highlighting can make it easier to spot text that was intended to be matched but that failed to be matched in a long series of good matches. CHECK-COUNT-<num> good matches are another case where there can be multiple match results for the same directive. Reviewed By: george.karpenkov, probinson Differential Revision:
  432. [FileCheck] Annotate input dump (4/7) This patch implements input annotations for diagnostics that report unexpected matches for CHECK-NOT. Like wrong-line matches for CHECK-NEXT, CHECK-SAME, and CHECK-EMPTY, these annotations mark match ranges using red `!~~` to indicate bad matches that are errors. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - CHECK-NOT found (error) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - ? marks fuzzy match when no match is found - colors error, fuzzy match If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check3 < input3 |& sed -n '/^<<<</,$p' <<<<<< 1: abc foobar def not:2 !~~~~~ error: no match expected >>>>>> $ cat check3 CHECK: abc CHECK-NOT: foobar CHECK: def $ cat input3 abc foobar def ``` Reviewed By: george.karpenkov, probinson Differential Revision:
  433. [FileCheck] Annotate input dump (3/7) This patch implements input annotations for diagnostics that report wrong-line matches for the directives CHECK-NEXT, CHECK-SAME, and CHECK-EMPTY. Instead of the usual `^~~`, which is used by later patches for good matches, these annotations use `!~~` to mark the bad match ranges so that this category of errors is visually distinct. Because such matches are errors, these annotates are red when colors are enabled. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - !~~ marks bad match, such as: - CHECK-NEXT on same line as previous match (error) - X~~ marks search range when no match is found, such as: - CHECK-NEXT not found (error) - ? marks fuzzy match when no match is found - colors error, fuzzy match If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check2 < input2 |& sed -n '/^<<<</,$p' <<<<<< 1: foo bar next:2 !~~ error: match on wrong line >>>>>> $ cat check2 CHECK: foo CHECK-NEXT: bar $ cat input2 foo bar ``` Reviewed By: george.karpenkov, probinson Differential Revision:
  434. [FileCheck] Annotate input dump (2/7) This patch implements input annotations for diagnostics that suggest fuzzy matches for directives for which no matches were found. Instead of using the usual `^~~`, which is used by later patches for good matches, these annotations use `?` so that fuzzy matches are visually distinct. No tildes are included as these diagnostics (independently of this patch) currently identify only the start of the match. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the only match result for a pattern of type T from line L of the check file - T:L'N labels the Nth match result for a pattern of type T from line L of the check file - X~~ marks search range when no match is found - ? marks fuzzy match when no match is found - colors error, fuzzy match If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check1 < input1 |& sed -n '/^<<<</,$p' <<<<<< 1: ; abc def 2: ; ghI jkl next:3'0 X~~~~~~~~ error: no match found next:3'1 ? possible intended match >>>>>> $ cat check1 CHECK: abc CHECK-SAME: def CHECK-NEXT: ghi CHECK-SAME: jkl $ cat input1 ; abc def ; ghI jkl ``` This patch introduces the concept of multiple "match results" per directive. In the above example, the first match result for the CHECK-NEXT directive is the failed match, for which the annotation shows the search range. The second match result is the fuzzy match. Later patches will introduce other cases of multiple match results per directive. When colors are enabled, `?` is colored magenta. That is, it doesn't indicate the actual error, which a red `X~~` marker indicates, but its color suggests it's closely related. Reviewed By: george.karpenkov, probinson Differential Revision:
  435. [FileCheck] Annotate input dump (1/7) Extend FileCheck to dump its input annotated with FileCheck's diagnostics: errors, good matches if -v, and additional information if -vv. The goal is to make it easier to visualize FileCheck's matching behavior when debugging. Each patch in this series implements input annotations for a particular category of FileCheck diagnostics. While the first few patches alone are somewhat useful, the annotations become much more useful as later patches implement annotations for -v and -vv diagnostics, which show the matching behavior leading up to the error. This first patch implements boilerplate plus input annotations for error diagnostics reporting that no matches were found for a directive. These annotations mark the search ranges of the failed directives. Instead of using the usual `^~~`, which is used by later patches for good matches, these annotations use `X~~` so that this category of errors is visually distinct. For example: ``` $ FileCheck -dump-input=help The following description was requested by -dump-input=help to explain the input annotations printed by -dump-input=always and -dump-input=fail: - L: labels line number L of the input file - T:L labels the match result for a pattern of type T from line L of the check file - X~~ marks search range when no match is found - colors error If you are not seeing color above or in input dumps, try: -color $ FileCheck -v -dump-input=always check1 < input1 |& sed -n '/^Input file/,$p' Input file: <stdin> Check file: check1 -dump-input=help describes the format of the following dump. Full input was: <<<<<< 1: ; abc def 2: ; ghI jkl next:3 X~~~~~~~~ error: no match found >>>>>> $ cat check1 CHECK: abc CHECK-SAME: def CHECK-NEXT: ghi CHECK-SAME: jkl $ cat input1 ; abc def ; ghI jkl ``` Some additional details related to the boilerplate: * Enabling: The annotated input dump is enabled by `-dump-input`, which can also be set via the `FILECHECK_OPTS` environment variable. Accepted values are `help`, `always`, `fail`, or `never`. As shown above, `help` describes the format of the dump. `always` is helpful when you want to investigate a successful FileCheck run, perhaps for an unexpected pass. `-dump-input-on-failure` and `FILECHECK_DUMP_INPUT_ON_FAILURE` remain as a deprecated alias for `-dump-input=fail`. * Diagnostics: The usual diagnostics are not suppressed in this mode and are printed first. For brevity in the example above, I've omitted them using a sed command. Sometimes they're perfectly sufficient, and then they make debugging quicker than if you were forced to hunt through a dump of long input looking for the error. If you think they'll get in the way sometimes, keep in mind that it's pretty easy to grep for the start of the input dump, which is `<<<`. * Colored Annotations: The annotated input is colored if colors are enabled (enabling colors can be forced using -color). For example, errors are red. However, as in the above example, colors are not vital to reading the annotations. I don't know how to test color in the output, so any hints here would be appreciated. Reviewed By: george.karpenkov, zturner, probinson Differential Revision:
  436. [X86] Add baseline tests for D55780 This adds tests for (add (umax X, C), -C) as part of fixing PR40053
  437. Fix ms-layout_version declspec test and add missing new test Now that MSVC compatibility versions are stored as a four digit number (1912) instead of a two digit number (19), we need to adjust how we handle this attribute. Also add a new test that was intended to be part of r349414.
  438. Update Microsoft name mangling scheme for exception specifiers in the type system Summary: The msvc exception specifier for noexcept function types has changed from the prior default of "Z" to "_E" if the function cannot throw when compiling with /std:C++17. Patch by Zachary Henkel! Reviewers: zturner, rnk Reviewed By: rnk Subscribers: cfe-commits Differential Revision:
  439. hwasan: Move ctor into a comdat. Differential Revision:
  440. [VFS] Add isLocal to ProxyFileSystem and add unit tests. Differential Revision:
  441. [libcxx][NFC] Properly indent nested #ifdefs and #defines I just realized I had always been reading this wrong because of the lack of indentation, so I'm re-indenting this properly.
  442. [X86][SSE] Improve immediate vector shift known bits handling. Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768.
  443. [WebAssembly] Fix assembler parsing of br_table. Summary: We use `variable_ops` in the tablegen defs to denote the list of branch targets in `br_table`, but unlike other uses of `variable_ops` (e.g. call) the these branch targets need to actually be encoded in the instruction. The existing tables for `variable_ops` cause not operands to be accepted by the assembly matcher. Following the example of ARM: we introduce a new operand type to capture this list, and we use the same {} syntax as ARM as well to differentiate them from regular integer operands. Also removed definition and use of TSFlags in tablegen defs, since `br_table` now has a non-variable_ops immediate operand, so the previous logic of only the variable_ops arguments being labels didn't make sense anymore. Reviewers: dschuff, aheejin, sunfish Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits Differential Revision:
  444. [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr. These seem to have been missed when the other TBM instructions were added.
  445. [codeview] Flush labels before S_DEFRANGE* fragments This was a pre-existing bug that could be triggered with assembly like this: .p2align 2 .LtmpN: .cv_def_range "..." I noticed this when attempting to change clang to emit aligned symbol records.
  446. Don't trigger sanitizer initialization from `sysctlbyname` and `sysctl` interceptor. Summary: This fixes the `ThreadSanitizer-x86_64-iossim` testsuite which broke when r348770 ( landed. The root cause of the problem is that early-on during the iOS simulator init process a call to `sysctlbyname` is issued. If the TSan initializer is triggered at this point it will eventually trigger a call to `__cxa_at_exit(...)`. This call then aborts because the library implementing this function is not yet had its initialization function called. rdar://problem/46696934 Reviewers: kubamracek, george.karpenkov, devnexen, vitalybuka, krytarowski Subscribers: #sanitizers, llvm-commits Differential Revision:
  447. [X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling. First step towards adding more capable combines to fix comments in D55768.
  448. [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924) Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. In the worst case (a target that doesn't have rotate instructions), we will expand this into a branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very likely to be a perf improvement over the original code. The motivating source code pattern for this is shown in: Background: I looked at several different options before deciding where to try this - instcombine, simplifycfg, CGP - because it doesn't fit cleanly anywhere AFAIK. The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to have the IR converted before we reach things like vectorization because the reduced code can make a loop much simpler to transform. Technically, this could be included in instcombine, but it's a large pattern match that includes control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch. So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. This only runs at -O3, but that seems like a reasonable limitation given that source code has many options to avoid this pattern (including the recently added clang intrinsics for rotates). I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug with the new pass manager that's independent of this change (but it will be masked if we canonicalize harder to funnel shift intrinsics in instcombine). Differential Revision:
  449. DebugInfo: Update gold plugin tests due to CU attribute reordering in r349207
  450. [analyzer] MoveChecker: Squash the bit field because it causes a GCC warning. The warning seems spurious (GCC bug 51242), but the bit field is simply not worth the hassle. rdar://problem/41349073
  451. Make test/Driver/darwin-sdk-version.c pass on hosts < macOS10.14 The test test/Driver/darwin-sdk-version.c from r349380 checks if the macOS deployment target can be correctly inferred from the SDK version. When the SDK version is > host version, the driver will pick the host version, so the old test failed on macOS < 10.14. This commit makes this test more resilient by using an older SDK version.
  452. [Sanitizer] capsicum variadic api subset Reviewers: markj, vitalybuka Reviewed By: markj Differential Revision:
  453. [SDAG] Clarify the origin of chain in REG_SEQUENCE in comment, NFC
  454. [SelectionDAG] Fix noop detection for vectors in AssertZext/AssertSext in getNode The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them. Similarly for the assert above it. I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868
  455. [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032) The problem is shown specifically for a case with vector multiply here: ...and this might mask the original backend bug for ARM shown in: As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision:
  456. Fix build after r349380
  457. Fix FP comparisons when SSE isn't available
  458. Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968.
  459. [NFC] Test commit: tweak whitespace in comment
  460. [darwin][arm64] use the "cyclone" CPU for Darwin even when `-arch` is not specified The -target option allows the user to specify the build target using LLVM triple. The triple includes the arch, and so the -arch option is redundant. This should work just as well without the -arch. However, the driver has a bug in which it doesn't target the "Cyclone" CPU for darwin if -target is used without -arch. This commit fixes this issue. rdar://46743182 Differential Revision:
  461. [Driver] Don't override '-march' when using '-arch x86_64h' On Darwin, using '-arch x86_64h' would always override the option passed through '-march'. This patch allows users to use '-march' with x86_64h, while keeping the default to 'core-avx2' Differential Revision:
  462. [darwin] parse the SDK settings from SDKSettings.json if it exists and pass in the -target-sdk-version to the compiler and backend This commit adds support for reading the SDKSettings.json file in the Darwin driver. This file is used by the driver to determine the SDK's version, and it uses that information to pass it down to the compiler using the new -target-sdk-version= option. This option is then used to set the appropriate SDK Version module metadata introduced in r349119. Note: I had to adjust the two ast tests as the SDKROOT environment variable on macOS caused SDK version to be picked up for the compilation of source file but not the AST. rdar://45774000 Differential Revision:
  463. [test] Add target_info for NetBSD, and XFAIL some of locale tests Add a target_info definition for NetBSD. The definition is based on the one used by FreeBSD, with libcxxrt replaced by libc++abi, and using llvm-libunwind since we need to use its unwinder implementation to build anyway. Additionally, XFAIL the 30 tests that fail because of non-implemented locale features. According to the manual, NetBSD implements only LC_CTYPE part of locale handling. However, there is a locale database in the system and locale specifications are validated against it, so it makes sense to list the common locales as supported. If I'm counting correctly, this change enables additional 43 passing tests. Differential Revision:
  464. [test] [re.traits] Remove asserts failing due to invalid UTF-8 Remove the two test cases for \xDA and \xFA with UTF-8 locale, as both characters alone are invalid in UTF-8 (short sequences). Upon removing them, the test passes on Linux again (and also on NetBSD, after adding appropriate locale configuration). Differential Revision:
  465. NFC: remove unused variable D55768 removed its use.
  466. AsmParser: test .double NaN and .double inf Summary: It looks like this support was added to match GNU AS, but only tests .float and not .double. I asked RedHat folks to confirm that 0x7fffffffffffffff was indeed the right value for NaN. Same for infinity, but it only has positive / negative encodings. Reviewers: scanon, rjmccall Subscribers: jkorous, dexonsmith, llvm-commits Differential Revision:
  467. [AMDGPU][MC][DOC] A fix for build failure in r349370
  468. [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision:
  469. Unbreak green dragon bots w/o __builtin_launder
  470. [AMDGPU][MC][DOC] A fix for build failure in r349368
  471. [InstSimplify] Simplify saturating add/sub + icmp If a saturating add/sub has one constant operand, then we can determine the possible range of outputs it can produce, and simplify an icmp comparison based on that. The implementation is based on a similar existing mechanism for simplifying binary operator + icmps. Differential Revision:
  472. [AMDGPU][MC][DOC] Updated AMD GPU assembler description Stage 2: added detailed description of operands See bug 36572:
  473. FastIsel: take care to update iterators when removing instructions. We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory.
  474. Expect Clang diagnostics in std::launder test
  475. Add missing include file.
  476. [CodeComplete] Fix test failure on different host and target configs This should fix PR40033.
  477. [PDB] Add some helper functions for working with scopes.
  478. [MS Demangler] Add a helper function to print a Node as a string.
  479. [libcxx] Speeding up partition_point/lower_bound/upper_bound This is a re-application of r345525, which had been reverted by fear of a regression. Reviewed as Thanks to Denis Yaroshevskiy for the patch.
  480. Build ASTImporterTest.cpp with /bigobj on MSVC builds to keep llvm-clang-x86_64-expensive-checks-win buildbot happy
  481. [MIPS GlobalISel] Remove switch statement (fix r349346 for MSVC) Temporarily remove switch statement without any case labels in function legalizeCustom in order to fix r349346 for MSVC.
  482. ARM: use acquire/release instruction variants when available. These features (fairly) recently got split out into their own feature, so we should make CodeGen use them when available. The main change here is that the check used to be based on the triple, but now it's based on CPU features.
  483. [MCA] Add support for BeginGroup/EndGroup.
  484. Revert "DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses)" This reverts commit r349333. It caused internal test to fail. I have sent more information to the author.
  485. [MCA] Don't assume that createMCInstrAnalysis() always returns a valid pointer. Class InstrBuilder wrongly assumed that llvm targets were always able to return a non-null pointer when createMCInstrAnalysis() was called on them. This was causing crashes when simulating executions for targets that don't provide an MCInstrAnalysis object. This patch fixes the issue by making MCInstrAnalysis optional.
  486. [ASTImporter] Add importer specific lookup Summary: There are certain cases when normal C/C++ lookup (localUncachedLookup) does not find AST nodes. E.g.: Example 1: template <class T> struct X { friend void foo(); // this is never found in the DC of the TU. }; Example 2: // The fwd decl to Foo is not found in the lookupPtr of the DC of the // translation unit decl. struct A { struct Foo *p; }; In these cases we create a new node instead of returning with the old one. To fix it we create a new lookup table which holds every node and we are not interested in any C++ specific visibility considerations. Simply, we must know if there is an existing Decl in a given DC. Reviewers: a_sidorin, a.sidorin Subscribers: mgorny, rnkovacs, dkrupp, Szelethus, cfe-commits Differential Revision:
  487. Regenerate test in prep for SimplifyDemandedBits improvements.
  488. [ASTImporter] Fix redecl chain of classes and class templates Summary: The crux of the issue that is being fixed is that lookup could not find previous decls of a friend class. The solution involves making the friend declarations visible in their decl context (i.e. adding them to the lookup table). Also, we simplify `VisitRecordDecl` greatly. This fix involves two other repairs (without these the unittests fail): (1) We could not handle the addition of injected class types properly when a redecl chain was involved, now this is fixed. (2) DeclContext::removeDecl failed if the lookup table in Vector form did not contain the to be removed element. This caused troubles in ASTImporter::ImportDeclContext. This is also fixed. Reviewers: a_sidorin, balazske, a.sidorin Subscribers: rnkovacs, dkrupp, Szelethus, cfe-commits Differential Revision:
  489. [clangd] Change diskbackedstorage to be atomic Summary: There was a chance that multiple clangd instances could try to write same shard, in which case we would get a malformed file most likely. This patch changes the writing mechanism to first write to a temporary file and then rename it to fit real destination. Which is guaranteed to be atomic by POSIX. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, jfb, cfe-commits Differential Revision:
  490. [AggressiveInstCombine] add test for rotate insertion point; NFC As noted in D55604 - we need a test to make sure that the new intrinsic is inserted into a valid position.
  491. [MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADD Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32. Differential Revision:
  492. [clangd] Only reduce priority of a thread for indexing. Summary: We'll soon have tasks pending for reading shards from disk, we want them to have normal priority. Because: - They are not CPU intensive, mostly IO bound. - Give a good coverage for the project at startup, therefore it is worth spending some cycles. - We have only one task per whole CDB rather than one task per file. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, jfb, cfe-commits Differential Revision:
  493. Revert rC349281 '[analyzer][MallocChecker][NFC] Document and reorganize some functions' Accidentally commited earlier with the same commit title, but really it should've been "Revert rC349283 '[analyzer][MallocChecker] Improve warning messages on double-delete errors'"
  494. Fix "enumeral mismatch in conditional expression" gcc7 warning. NFCI.
  495. Fix "enumeral mismatch in conditional expression" gcc7 warnings. NFCI.
  496. Revert rCTE349288 'Fix a lit test failure after MallocChecker changes'
  497. Revert rC349281 '[analyzer][MallocChecker][NFC] Document and reorganize some functions'
  498. [AArch64] Re-run load/store optimizer after aggressive tail duplication The Load/Store Optimizer runs before Machine Block Placement. At O3 the Tail Duplication Threshold is set to 4 instructions and this can create new opportunities for the Load/Store Optimizer. It seems worthwhile to run it once again.
  499. Reverting bitfield size to attempt to fix a windows buildbot
  500. [Docs] Expand -fstack-protector and -fstack-protector-all Improve the description of these command line options by providing specific heuristic information, as outlined for the ssp function attribute(s) in LLVM's documentation. Also rewords -fstack-protector-all for affinity. Differential Revision:
  501. DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses) GCC emitted these unconditionally on/before 4.4/March 2012 Clang emitted these unconditionally on/before 3.5/March 2014 This improves performance when parsing CUs (especially those using split DWARF) that contain no code ranges (such as the mini CUs that may be created by ThinLTO importing - though generally they should be/are avoided, especially for Split DWARF because it produces a lot of very small CUs, which don't scale well in a bunch of other ways too (including size)).
  502. [llvm-mca] Move llvm-mca library to llvm/lib/MCA. Summary: See PR38731. Reviewers: andreadb Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits Differential Revision:
  503. [X86] Add test case for PR39968. NFC
  504. [X86] Fix bad operand lookup for cmov introduced in r349315 The CC is operand 2 not operand 3.
  505. [Power9][NFC]update vabsd case for better dumping Appended options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to get the more descriptive output. Also removed useless function attributes.
  506. [analyzer] MoveChecker: Enable by default as cplusplus.Move. This checker warns you when you re-use an object after moving it. Mostly developed by Peter Szecsi! Differential Revision:
  507. [analyzer] MoveChecker: Add an option to suppress warnings on locals. Re-using a moved-from local variable is most likely a bug because there's rarely a good motivation for not introducing a separate variable instead. We plan to keep emitting such warnings by default. Introduce a flag that allows disabling warnings on local variables that are not of a known move-unsafe type. If it doesn't work out as we expected, we'll just flip the flag. We still warn on move-unsafe objects and unsafe operations on known move-safe objects. Differential Revision:
  508. Speculatively re-apply "[analyzer] MoveChecker: Add checks for dereferencing..." This re-applies commit r349226 that was reverted in r349233 due to failures on clang-x64-windows-msvc. Specify enum type as unsigned for use in bit field. Otherwise overflows may cause UB. Differential Revision:
  509. [Power9][NFC]Make pre-inc-disable case more robust With some patch adopted for Power9 vabsd* insns, some CHECKs can't get the expected results. But it's false alarm, we should update the case more robust.
  510. [gn build] Add build files for opt and its dependency Transforms/Couroutines Needed for check-lld. Differential Revision:
  511. [EarlyCSE] If DI can't be salvaged, mark it as unavailable. Fixes PR39874.
  512. [InstCombine] Add cttz/ctlz + select non-bitwidth tests; NFC
  513. [InstCombine] Regenerate test checks; NFC Also drop unnecessary entry blocks and avoid use of anonymous variables.
  514. [analyzer] Fix some expressions staying live too long. Add a debug checker. StaticAnalyzer uses the CFG-based RelaxedLiveVariables analysis in order to, in particular, figure out values of which expressions are still needed. When the expression becomes "dead", it is garbage-collected during the dead binding scan. Expressions that constitute branches/bodies of control flow statements, eg. `E1' in `if (C1) E1;' but not `E2' in `if (C2) { E2; }', were kept alive for too long. This caused false positives in MoveChecker because it relies on cleaning up loop-local variables when they go out of scope, but some of those live-for-too-long expressions were keeping a reference to those variables. Fix liveness analysis to correctly mark these expressions as dead. Add a debug checker, debug.DumpLiveStmts, in order to test expressions liveness. Differential Revision:
  515. [X86] Pull out constant splat rotation detection. We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts.
  516. [InstCombine] Make cttz/ctlz knownbits tests more robust; NFC Tests checking for the addition of !range metadata should be preserved if cttz/ctlz + icmp is optimized.
  517. Regenerate test (merges X86+X64 cases). NFCI.
  518. [X86] Remove truncation handling from EmitTest. Replace it with a DAG combine. I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly.
  519. [X86] Autogenerate complete checks. NFC
  520. Revert "[InstCombine] Regenerate test checks; NFC" This reverts commit r349311. Didn't check this carefully enough...
  521. [InstCombine] Regenerate test checks; NFC
  522. [InstCombined] Add more tests for cttz/ctlz + icmp; NFC Test cases other than icmp with the bitwidth.
  523. [InstCombine] Add additional saturating add/sub + icmp tests; NFC These test comparisons with saturating add/sub in non-canonical form.
  524. Thread safety analysis: Avoid intermediate copies [NFC] The main reason is to reduce the number of constructor arguments though, especially since many of them had the same type.
  525. [InstCombine] regenerate test checks; NFC
  526. [InstCombine] add tests for vector widening transforms (PR40032); NFC
  527. [test] [support] Use socket()+bind() to create unix sockets portably Replace the mknod() call with socket() + bind() for creating unix sockets. The mknod() method is not portable and does not work on NetBSD while binding the socket should work on all systems supporting unix sockets. Differential Revision:
  528. [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859) This is part of fixing PR39859: We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: Differential Revision:
  529. [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision:
  530. Update the list of platforms & archs
  531. Use backquotes to avoid a sphinx unexpected error: Unknown target name: "bootstrap".
  532. Thread safety analysis: Allow scoped releasing of capabilities Summary: The pattern is problematic with C++ exceptions, and not as widespread as scoped locks, but it's still used by some, for example Chromium. We are a bit stricter here at join points, patterns that are allowed for scoped locks aren't allowed here. That could still be changed in the future, but I'd argue we should only relax this if people ask for it. Fixes PR36162. Reviewers: aaron.ballman, delesley, pwnall Reviewed By: delesley, pwnall Subscribers: pwnall, cfe-commits Differential Revision:
  533. Document the usage of BOOTSTRAP_XXX with stage2 builds
  534. [SelectionDAG] Add FSHL/FSHR support to computeKnownBits Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).
  535. [X86] Add computeKnownBits tests for funnel shift intrinsics
  536. Improve the comment in previous
  537. Expand TSan sysroot workaround to NetBSD
  538. [test] [ctime] Ignore -Wformat-zero-length warnings Explicitly disable the -Wformat-zero-length diagnostic when running ctime tests, since one of the test cases passes zero-length format string to strftime(). When strftime() is appropriately decorated with __attribute__(format, ...), this caused the test to fail because of this warning (e.g. on NetBSD). Differential Revision:
  539. [regex] Use distinct __regex_word on NetBSD NetBSD defines character classes up to 0x2000. Use 0x8000 as a safe __regex_word that hopefully will not collide with other values in the foreseeable future. Differential Revision:
  540. [gn build] Merge r349167
  541. [gn build] Add build files for obj2yaml, yaml2obj, and lib/ObjectYAML The two executables are needed by check-lld. Differential Revision:
  542. [gn build] Add build files for llvm-as, llvm-dis, llvm-dwarfdump, llvm-mc, FileCheck, count, not These executables are needed by check-lld. Differential Revision:
  543. Fix a lit test failure after MallocChecker changes
  544. [X86] Autogenerate complete checks. NFC
  545. [X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI. In preparation for converting to funnel shifts.
  546. [X86] Lower to SHLD/SHRD on slow machines for optsize Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard......
  547. [X86] Add optsize SHLD/SHRD tests
  548. [analyzer][MallocChecker] Improve warning messages on double-delete errors Differential Revision:
  549. [analyzer][MallocChecker][NFC] Document and reorganize some functions This patch merely reorganizes some things, and features no functional change. In detail: * Provided documentation, or moved existing documentation in more obvious places. * Added dividers. (the //===----------===// thing). * Moved getAllocationFamily, printAllocDeallocName, printExpectedAllocName and printExpectedDeallocName in the global namespace on top of the file where AllocationFamily is declared, as they are very strongly related. * Moved isReleased and MallocUpdateRefState near RefState's definition for the same reason. * Realloc modeling was very poor in terms of variable and structure naming, as well as documentation, so I renamed some of them and added much needed docs. * Moved function IdentifierInfos to a separate struct, and moved isMemFunction, isCMemFunction adn isStandardNewDelete inside it. This makes the patch affect quite a lot of lines, should I extract it to a separate one? * Moved MallocBugVisitor out of MallocChecker. * Preferred switches to long else-if branches in some places. * Neatly organized some RUN: lines. Differential Revision:
  550. [analyzer][NFC] Merge ClangCheckerRegistry to CheckerRegistry Now that CheckerRegistry lies in Frontend, we can finally eliminate ClangCheckerRegistry. Fortunately, this also provides us with a DiagnosticsEngine, so I went ahead and removed some parameters from it's methods. Differential Revision:
  551. Link examples/clang-interpreter against clangSerialization
  552. Fix a compilation error in examples/
  553. Add NetBSD support in needsRuntimeRegistrationOfSectionRange. Use linker script magic to get data/cnts/name start/end.
  554. Register kASan shadow offset for NetBSD/amd64 The NetBSD x86_64 kernel uses the 0xdfff900000000000 shadow offset.
  555. [analyzer][NFC] Move CheckerRegistry from the Core directory to Frontend ClangCheckerRegistry is a very non-obvious, poorly documented, weird concept. It derives from CheckerRegistry, and is placed in lib/StaticAnalyzer/Frontend, whereas it's base is located in lib/StaticAnalyzer/Core. It was, from what I can imagine, used to circumvent the problem that the registry functions of the checkers are located in the clangStaticAnalyzerCheckers library, but that library depends on clangStaticAnalyzerCore. However, clangStaticAnalyzerFrontend depends on both of those libraries. One can make the observation however, that CheckerRegistry has no place in Core, it isn't used there at all! The only place where it is used is Frontend, which is where it ultimately belongs. This move implies that since include/clang/StaticAnalyzer/Checkers/ClangCheckers.h only contained a single function: class CheckerRegistry; void registerBuiltinCheckers(CheckerRegistry &registry); it had to re purposed, as CheckerRegistry is no longer available to clangStaticAnalyzerCheckers. It was renamed to BuiltinCheckerRegistration.h, which actually describes it a lot better -- it does not contain the registration functions for checkers, but only those generated by the tblgen files. Differential Revision:
  556. [analyzer] Prefer returns values to out-params in CheckerRegistry.cpp Renaming collectCheckers to getEnabledCheckers Changing the functionality to acquire all enabled checkers, rather then collect checkers for a specific CheckerOptInfo (for example, collecting all checkers for { "core", true }, which meant enabling all checkers from the core package, which was an unnecessary complication). Removing CheckerOptInfo, instead of storing whether the option was claimed via a field, we handle errors immediately, as getEnabledCheckers can now access a DiagnosticsEngine. Realize that the remaining information it stored is directly accessible through AnalyzerOptions.CheckerControlList. Fix a test with -analyzer-disable-checker -verify accidentally left in.
  557. [CodeGen] Enhance machine PHIs optimization Summary: Make machine PHIs optimization to work for single value register taken from several different copies. This is the first step to fix PR38917. This change allows to get rid of redundant PHIs (see opt_phis2.mir test) to make the subsequent optimizations (like CSE) possible and simpler. For instance, before this patch the code like this: %b = COPY %z ... %a = PHI %bb1, %a; %bb2, %b could be optimized to: %a = %b but the code like this: %c = COPY %z ... %b = COPY %z ... %a = PHI %bb1, %a; %bb2, %b; %bb3, %c would remain unchanged. With this patch the latter case will be optimized: %a = %z```. Committed on behalf of: Anton Afanasyev Reviewers: RKSimon, MatzeB Subscribers: llvm-commits Differential Revision:
  558. Regenerate neon copy tests. NFCI.
  559. [analyzer] Assume that we always have a SubEngine available The removed codepath was dead. Differential Revision:
  560. Fix -Wunused-variable warning. NFCI.
  561. [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts Differential Revision:
  562. Enable test/msan/ for NetBSD
  564. [InstSimplify] Add tests for saturating add/sub + icmp; NFC If a saturating add/sub with a constant operand is compared to another constant, we should be able to determine that the condition is always true/false in some cases (but currently don't).
  565. [libclang] Add dependency on clangSerialization to unbreak -DBUILD_SHARED_LIBS=1 build after rC349237 Frontend headers have undefined reference on the symbol `clang::PCHContainerOperations::PCHContainerOperations()` through some shared_ptr usage. Any dependents will get the undefined reference which can only be resolved by explicit dependency on clangSerialization (due to -z defs).
  566. [mips] Fix test typo in rL348914 RUN; -> RUN:
  567. Fix internal_sleep() for NetBSD This is a follow up of a similar fix for Linux from D55692.
  568. [MinGW] Produce a vtable and RTTI for dllexported classes without a key function This matches what GCC does in these situations. This fixes compiling Qt in debug mode. In release mode, references to the vtable of this particular class ends up optimized away, but in debug mode, the compiler creates references to the vtable, which is expected to be dllexported from a different DLL. Make sure the dllexported version actually ends up emitted. Differential Revision:
  569. Fix typo in test cases as well.
  570. hwasan: Fix typo: Previosly -> Previously.
  571. Fix static assert diagnostic checks in i386
  572. [Power9][NFC] add setb exploitation test case Add an original test case for setb before the exploitation actually takes effect, later we can check the difference. Differential Revision:
  573. Fix includes and dependencies for libclang Remove unneeded includes Add needed include Remove dependency on Serialization
  574. Try 2: Fix bug in buildbot start script
  575. Fix bug in buildbot start script
  576. Rework docker setup to make it easier to work around bugs on buildbots
  577. Revert "[analyzer] MoveChecker: Add checks for dereferencing a smart pointer..." This reverts commit r349226. Fails on an MSVC buildbot.
  578. Move static analyzer core diagnostics to common.
  579. [analyzer] Fix unknown block calls to have zero parameters. Right now they report to have one parameter with null decl, because initializing an ArrayRef of pointers with a nullptr yields an ArrayRef to an array of one null pointer. Fixes a crash in the OSObject section of RetainCountChecker. Differential Revision:
  580. [analyzer] ObjCDealloc: Fix a crash when a class attempts to deallocate a class. The checker wasn't prepared to see the dealloc message sent to the class itself rather than to an instance, as if it was +dealloc. Additionally, it wasn't prepared for pure-unknown or undefined self values. The new guard covers that as well, but it is annoying to test because both kinds of values shouldn't really appear and we generally want to get rid of all of them (by modeling unknown values with symbols and by warning on use of undefined values before they are used). The CHECK: directive for FileCheck at the end of the test looks useless, so i removed it. Differential Revision:
  581. [analyzer] ObjCContainers: Track index values. Use trackExpressionValue() (previously known as trackNullOrUndefValue()) to track index value in the report, so that the user knew what Static Analyzer thinks the index is. Additionally, implement printState() to help debugging the checker later. Differential Revision:
  582. [analyzer] MoveChecker: Add checks for dereferencing a smart pointer after move. Calling operator*() or operator->() on a null STL smart pointer is undefined behavior. Smart pointers are specified to become null after being moved from. So we can't warn on arbitrary method calls, but these two operators definitely make no sense. The new bug is fatal because it's an immediate UB, unlike other use-after-move bugs. The work on a more generic null smart pointer dereference checker is still pending. Differential Revision:
  583. [analyzer] MoveChecker: NFC: De-duplicate a few checks. No functional change intended. Differential Revision:
  584. [SILoadStoreOptimizer] Use std::abs to avoid truncation. Using regular abs() causes the following warning error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value] (uint32_t)abs(Dist) > MaxDist) { ^ lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead which causes a bot to fail:
  585. [X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the instruction that only modify the O flag to the waiver list. The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF. Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change.
  586. [X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that indicates which result is the flag result. NFCI hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction. After this patch we now do a result number check in both and rely on the caller to provide the result number. This shouldn't change behavior it was just an odd difference between the two functions that I noticed.
  587. [WebAssembly] Check if the section order is correct Summary: This patch checks if the section order is correct when reading a wasm object file in `WasmObjectFile` and converting YAML to wasm object in yaml2wasm. (It is not possible to check when reading YAML because it is handled exclusively by the YAML reader.) This checks the ordering of all known sections (core sections + known custom sections). This also adds section ID DataCount section that will be scheduled to be added in near future. Reviewers: sbc100 Subscribers: dschuff, mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision:
  588. [NewGVN] Update use counts for SSA copies when replacing them by their operands. The current code relies on LeaderUseCount to determine if we can remove an SSA copy, but in that the LeaderUseCount does not refer to the SSA copy. If a SSA copy is a dominating leader, we use the operand as dominating leader instead. This means we removed a user of a ssa copy and we should decrement its use count, so we can remove the ssa copy once it becomes dead. Fixes PR38804. Reviewers: efriedma, davide Reviewed By: davide Differential Revision:
  589. [Util] Refer to [s|z]exts of args when converting dbg.declares (fix PR35400) When converting dbg.declares, if the described value is a [s|z]ext, refer to the ext directly instead of referring to its operand. This fixes a narrowing bug (the debugger got the sign of a variable wrong, see The main reason to refer to the ext's operand was that an optimization may remove the ext itself, leading to a dropped variable. Now that InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this is less of a concern. Other passes can/should adopt this API as needed to fix dropped variable bugs. Differential Revision:
  590. [NVPTX] Lower instructions that expand into libcalls. The change is an effort to split and refactor abandoned D34708 into smaller parts. Here the behaviour of unsupported instructions is changed to match the behaviour of explicit intrinsics calls. Currently LLVM crashes with: > Assertion getInstruction() && "Not a call or invoke instruction!" failed. With this patch LLVM produces a more sensible error message: > Cannot select: ... i32 = ExternalSymbol'__foobar' Author: Denys Zariaiev <> Differential Revision:
  591. Mangle calling conventions into function pointer types where GCC does Summary: GCC 5.1 began mangling these Windows calling conventions into function types, since they can be used for overloading. They've always been mangled in the MS ABI, but they are new to the Itanium mangler. Note that the calling convention doesn't appear as part of the main declaration, it only appears on function parameter types and other types. Fixes PR39860 Reviewers: rjmccall, efriedma Subscribers: cfe-commits Differential Revision:
  592. [libFuzzer] make len_control less aggressive
  593. Add AddressSpace mangling to MS mode All of the symbols demangle on llvm-undname and This address space qualifier is useful for when we want to use opencl C++ in Windows mode. Additionally, C++ address-space using functions will now be usable on windows. Differential Revision: Change-Id: Ife4506613c3cce778a783456d62117fbf7d83c26
  594. DebugInfo: Avoid using split DWARF when the split unit would be empty. In ThinLTO many split CUs may be effectively empty because of the lack of support for cross-unit references in split DWARF. Using a split unit in those cases is just a waste/overhead - and turned out to be one contributor to a significant symbolizer performance issue when global variable debug info was being imported (see r348416 for the primary fix) due to symbolizers seeing CUs with no ranges, assuming there might still be addresses covered and walking into the split CU to see if there are any ranges (when that split CU was in a DWP file, that meant loading the DWP and its index, the index was extra large because of all these fractured/empty CUs... and so was very expensive to load). (the 3rd fix which will follow, is to assume that a CU with no ranges is empty rather than merely missing its CU level range data - and to not walk into its DIEs (split or otherwise) in search of address information that is generally not present)
  595. Revert "Add extension to always default-initialize nullptr_t." This reverts commit 46efdf2ccc2a80aefebf8433dbf9c7c959f6e629. Richard Smith commented just after I submitted this that this is the wrong solution. Reverting so that I can fix differently.
  596. [codeview] Add begin/endSymbolRecord helpers, NFC Previously beginning a symbol record was excessively verbose. Now it's a bit simpler. This follows the same pattern as begin/endCVSubsection.
  597. DebugInfo: Move addAddrBase from DwarfUnit to DwarfCompileUnit Only CUs need an address table reference.
  598. [Hexagon] Add patterns for shifts of v2i16 This fixes
  599. Add extension to always default-initialize nullptr_t. Core issue 1013 suggests that having an uninitialied std::nullptr_t be UB is a bit foolish, since there is only a single valid value. This DR reports that DR616 fixes it, which does so by making lvalue-to-rvalue conversions from nullptr_t be equal to nullptr. However, just implementing that results in warnings/etc in many places. In order to fix all situations where nullptr_t would seem uninitialized, this patch instead (as an otherwise transparent extension) default initializes uninitialized VarDecls of nullptr_t. Differential Revision: Change-Id: I84d72a9290054fa55341e8cbdac43c8e7f25b885
  600. [GlobalISel] LegalizerHelper: Implement fewerElementsVector for G_LOAD/G_STORE Reviewers: aemerson, dsanders, bogner, paquette, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, tschuett, llvm-commits Differential Revision:
  601. [Hexagon] Use IMPLICIT_DEF to any-extend 32-bit values to 64 bits
  602. Using llvm::find_if() instead of a range-based for loop; NFC. This addresses post-commit review feedback from r349188.
  603. [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision:
  604. [Clang] Add __builtin_launder Summary: This patch adds `__builtin_launder`, which is required to implement `std::launder`. Additionally GCC provides `__builtin_launder`, so thing brings Clang in-line with GCC. I'm not exactly sure what magic `__builtin_launder` requires, but based on previous discussions this patch applies a ``. As noted in previous discussions, this may not be enough to correctly handle vtables. Reviewers: rnk, majnemer, rsmith Reviewed By: rsmith Subscribers: kristina, Romain-Geissler-1A, erichkeane, amharc, jroelofs, cfe-commits, Prazek Differential Revision:
  605. Add missing includes and forward decls to unbreak build
  606. [OPENMP][NVPTX]Improved interwarp copy function. Inlined runtime with the current implementation of the interwarp copy function leads to the undefined behavior because of the not quite correct implementation of the barriers. Start using generic __kmpc_barier function instead of the custom made barriers.
  607. [analyzer] MoveChecker Pt.6: Suppress the warning for the move-safe STL classes. Some C++ standard library classes provide additional guarantees about their state after move. Suppress warnings on such classes until a more precise behavior is implemented. Warnings for locals are not suppressed anyway because it's still most likely a bug. Differential Revision:
  608. [analyzer] MoveChecker: Improve invalidation policies. If a moved-from object is passed into a conservatively evaluated function by pointer or by reference, we assume that the function may reset its state. Make sure it doesn't apply to const pointers and const references. Add a test that demonstrates that it does apply to rvalue references. Additionally, make sure that the object is invalidated when its contents change for reasons other than invalidation caused by evaluating a call conservatively. In particular, when the object's fields are manipulated directly, we should assume that some sort of reset may be happening. Differential Revision:
  609. Tolerate Clangs new static_assert messages
  610. Update our SARIF support from 10-10 to 11-28. Functional changes include: * The run.files property is now an array instead of a mapping. * fileLocation objects now have a fileIndex property specifying the array index into run.files. * The resource.rules property is now an array instead of a mapping. * The result object was given a ruleIndex property that is an index into the resource.rules array. * rule objects now have their "id" field filled out in addition to the name field. * Updated the schema and spec version numbers to 11-28.
  611. [libcxx] Mark some tests as still failing on macosx10.14
  612. [SDAG] Ignore chain operand in REG_SEQUENCE when emitting instructions
  613. [AArch64] Simplify the scheduling predicates (NFC) The instruction encodings make it unnecessary to distinguish extended W-form from X-form instructions.
  614. [TransformWarning] Do not warn missed transformations in optnone functions. Optimization transformations are intentionally disabled by the 'optnone' function attribute. Therefore do not warn if transformation metadata is still present. Using the legacy pass manager structure, the `skipFunction` method takes care for the optnone attribute (already called before this patch). For the new pass manager, there is no equivalent, so we check for the 'optnone' attribute manually. Differential Revision:
  615. When resolving a merge conflict, I put something inside an #ifdef. Fixed.
  616. [x86] add tests for extractelement of FP binops; NFC
  617. Implement P1209 - Adopt Consistent Container Erasure from Library Fundamentals 2 for C++20. Reviewed as
  618. [ARM] make test immune to scalarization improvements; NFC
  619. [x86] make tests immune to scalarization improvements; NFC
  620. [globalisel][combiner] Fix r349167 for release mode bots This test relies on -debug-only which is unavailable in non-asserts builds.
  621. [ADT] Fix bugs in SmallBitVector. Fixes: * find_last/find_last_unset - off-by-one error * Compound assignment ops and operator== when mixing big/small modes Patch by Brad Moody Differential Revision:
  622. Fix Visual Studio PointerIntPair visualizer Patch by: Trass3r Differential Revision:
  623. [libcxx] Make sure use_system_cxx_lib does not override cxx_runtime_root for DYLD_LIBRARY_PATH Otherwise, even specifying a runtime root different from the library we're linking against won't work -- the library we're linking against is always used. This is undesirable if we try testing something like linking against a recent libc++.dylib but running the tests against an older version (the back-deployment use case).
  624. [Transforms] Preserve metadata when converting invoke to call. The `changeToCall` function did not preserve the invoke's metadata. Currently, there is probably no metadata that depends on being applied on a CallInst or InvokeInst. Therefore we can replace the instruction's metadata. This fixes Suggested-by: Moritz Kreutzer <> Differential Revision:
  625. [MS Demangler] Fail gracefully on invalid pointer types. Once we detect a 'P', we know we a pointer type is upcoming, so we make some assumptions about the output that follows. If those assumptions didn't hold, we would assert. Instead, we should fail gracefully and propagate the error up.
  626. [MS Demangler] Add a regression test for an invalid mangled name.
  627. [globalisel][combiner] Make the CombinerChangeObserver a MachineFunction::Delegate Summary: This allows us to register it with the MachineFunction delegate and be notified automatically about erasure and creation of instructions. However, we still need explicit notification for modifications such as those caused by setReg() or replaceRegWith(). There is a catch with this though. The notification for creation is delivered before any operands can be added. While appropriate for scheduling combiner work. This is unfortunate for debug output since an opcode by itself doesn't provide sufficient information on what happened. As a result, the work list remembers the instructions (when debug output is requested) and emits a more complete dump later. Another nit is that the MachineFunction::Delegate provides const pointers which is inconvenient since we want to use it to schedule future modification. To resolve this GISelWorkList now has an optional pointer to the MachineFunction which describes the scope of the work it is permitted to schedule. If a given MachineInstr* is in this function then it is permitted to schedule work to be performed on the MachineInstr's. An alternative to this would be to remove the const from the MachineFunction::Delegate interface, however delegates are not permitted to modify the MachineInstr's they receive. In addition to this, the observer has three interface changes. * erasedInstr() is now erasingInstr() to indicate it is about to be erased but still exists at the moment. * changingInstr() and changedInstr() have been added to report changes before and after they are made. This allows us to trace the changes in the debug output. * As a convenience changingAllUsesOfReg() and finishedChangingAllUsesOfReg() will report changingInstr() and changedInstr() for each use of a given register. This is primarily useful for changes caused by MachineRegisterInfo::replaceRegWith() With this in place, both combine rules have been updated to report their changes to the observer. Finally, make some cosmetic changes to the debug output and make Combiner and CombinerHelp Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar Reviewed By: aditya_nandakumar Subscribers: mgorny, rovka, kristof.beyls, llvm-commits Differential Revision:
  628. [AArch64] make test immune to scalarization improvements; NFC This is explicitly implementing what the comment says rather than relying on the implicit zext of a costant operand.
  629. Fix a crash in llvm-undname with invalid types.
  630. [SystemZ] make test immune to scalarization improvements; NFC The undef operands mean this test is probably still too fragile to accomplish what the comments suggest.
  631. [Hexagon] make test immune to scalarization improvements; NFC
  632. [x86] auto-generate complete checks; NFC
  633. [x86] regenerate test checks; NFC
  634. [x86] make tests immune to scalarization improvements; NFC
  635. Mark as passing for NetBSD and asan-dynamic-runtime
  636. NFC. Adding an empty line to test the updated commit credentials.
  637. Set shared_libasan_path in lit tests for NetBSD Reuse the Linux code path.
  638. Implement -frecord-command-line (-frecord-gcc-switches) Implement options in clang to enable recording the driver command-line in an ELF section. Implement a new special named metadata, llvm.commandline, to support frontends embedding their command-line options in IR/ASM/ELF. This differs from the GCC implementation in some key ways: * In GCC there is only one command-line possible per compilation-unit, in LLVM it mirrors llvm.ident and multiple are allowed. * In GCC individual options are separated by NULL bytes, in LLVM entire command-lines are separated by NULL bytes. The advantage of the GCC approach is to clearly delineate options in the face of embedded spaces. The advantage of the LLVM approach is to support merging multiple command-lines unambiguously, while handling embedded spaces with escaping. Differential Revision: Clang Differential Revision:
  639. [dexp] Change FuzzyFind to also print scope of symbols Summary: When there are multiple symbols in the result of a fuzzy find with the same name, one has to perform an additional query to figure out which of those symbols are coming from the "interesting" scope. This patch prints the scope in fuzzy find results to get rid of the second symbol. Reviewers: hokein Subscribers: ilya-biryukov, ioeric, jkorous, arphaman, cfe-commits Differential Revision:
  640. [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's generated is a KILL of the value), so when creating split constraints if the live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead of PrefReg. Differential Revision:
  641. clang-include-fixer.el: support remote files Summary: Support remote files (e.g., Tramp) in the Emacs integration for clang-include-fixer Reviewers: klimek Reviewed By: klimek Subscribers: cfe-commits Differential Revision:
  642. [clangd] Use buildCompilerInvocation to simplify the HeadersTests, NFC.
  643. [ARM GlobalISel] Thumb2: casts between int and ptr Mark as legal and add tests. Nothing special to do.
  644. [ARM GlobalISel] Remove duplicate test. NFCI Fixup for r349026. I forgot to delete these test functions from the original file when I moved them to arm-legalize-exts.mir.
  645. [clangd] Fix memory leak in ClangdTests. Summary: createInvocationFromCommandLine sets DisableFree to true by default, which leads memory leak in clangd. The fix is to use the `BuildCompilationInvocation` to create CI with the correct options (DisableFree is false). Fix Reviewers: kadircet Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  646. [clangd] Fix an assertion failure in background index. Summary: When indexing a file which contains an uncompilable error, we will trigger an assertion failure -- the IndexFileIn data is not set, but we access them in the backgound index. Reviewers: kadircet Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  647. [ARM GlobalISel] Minor refactoring. NFCI Refactor the ARMInstructionSelector to cache some opcodes in the constructor instead of checking all the time if we're in ARM or Thumb mode.
  648. [ARM GlobalISel] Allow simple binary ops in Thumb2 Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM and Thumb2. Extract the legalizer tests for these opcodes into another file. Add tests for the instruction selector.
  649. [TableGen:AsmWriter] Cope with consecutive tied operands. When you define an instruction alias as a subclass of InstAlias, you specify all the MC operands for the instruction it expands to, except for operands that are tied to a previous one, which you leave out in the expectation that the Tablegen output code will fill them in automatically. But the code in Tablegen's AsmWriter backend that skips over a tied operand was doing it using 'if' instead of 'while', because it wasn't expecting to find two tied operands in sequence. So if an instruction updates a pair of registers in place, so that its MC representation has two input operands tied to the output ones (for example, Arm's UMLAL instruction), then any alias which wants to expand to a special case of that instruction is likely to fail to match, because the indices of subsequent operands will be off by one in the generated printAliasInstr function. This patch re-indents some existing code, so it's clearest when viewed as a diff with whitespace changes ignored. Reviewers: fhahn, rengolin, sdesmalen, atanasyan, asb, jholewinski, t.p.northover, kparzysz, craig.topper, stoklund Reviewed By: rengolin Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision:
  650. Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision: ........ Patch wasn't approved and breaks buildbots
  651. Introduce `AddressSpaceView` template parameter to `SizeClassAllocator32`, `FlatByteMap`, and `TwoLevelByteMap`. Summary: This is a follow up patch to r346956 for the `SizeClassAllocator32` allocator. This patch makes `AddressSpaceView` a template parameter both to the `ByteMap` implementations (but makes `LocalAddressSpaceView` the default), some `AP32` implementations and is used in `SizeClassAllocator32`. The actual changes to `ByteMap` implementations and `SizeClassAllocator32` are very simple. However the patch is large because it requires changing all the `AP32` definitions, and users of those definitions. For ASan and LSan we make `AP32` and `ByteMap` templateds type that take a single `AddressSpaceView` argument. This has been done because we will instantiate the allocator with a type that isn't `LocalAddressSpaceView` in the future patches. For the allocators used in the other sanitizers (i.e. HWAsan, MSan, Scudo, and TSan) use of `LocalAddressSpaceView` is hard coded because we do not intend to instantiate the allocators with any other type. In the cases where untemplated types have become templated on a single `AddressSpaceView` parameter (e.g. `PrimaryAllocator`) their name has been changed to have a `ASVT` suffix (Address Space View Type) to indicate they are templated. The only exception to this are the `AP32` types due to the desire to keep the type name as short as possible. In order to check that template is instantiated in the correct a way a `static_assert(...)` has been added that checks that the `AddressSpaceView` type used by `Params::ByteMap::AddressSpaceView` matches the `Params::AddressSpaceView`. This uses the new `sanitizer_type_traits.h` header. rdar://problem/45284065 Reviewers: kcc, dvyukov, vitalybuka, cryptoad, eugenis, kubamracek, george.karpenkov Subscribers: mgorny, llvm-commits, #sanitizers Differential Revision:
  652. [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision:
  653. [llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision:
  654. [ThinLTO] Fix test added in rL349076
  655. [sanitizer] Fix nolibc internal_sleep Reviewers: kubamracek, vitalybuka Reviewed By: vitalybuka Subscribers: delcypher, llvm-commits, #sanitizers Differential Revision:
  656. [Object] Rename getRelrRelocationType to getRelativeRelocationType Summary: The two utility functions were added in D47919 to support SHT_RELR. However, these are just relative relocations types and are't necessarily be named Relr. Reviewers: phosek, dberris Reviewed By: dberris Subscribers: llvm-commits Differential Revision:
  657. [clang-tidy] Remove extra config.h includes It's included in a new header ClangTidyForceLinker.h and should not be included the second time. Follow up for the
  658. [clang-tidy] Share the forced linking code between clang-tidy tool and plugin Extract code that forces linking to the separate header and include it in both plugin and standalone tool. Try 2: missing header guard and "clang/Config/config.h" are added to the new header. Differential Revision:
  659. [llvm-xray] Use correct variable name This fixes the compiler error introduced in r349129.
  660. [llvm-xray] Store offset pointers in temporaries DataExtractor::getU64 modifies the OffsetPtr which also pass to RelocateOrElse which breaks on Windows. This addresses the issue introduced in r349120. Differential Revision:
  661. Update google benchmark again
  662. Update google benchmark version
  663. Fix up diagnostics. Move some diagnostics around between Diagnostic* files. Diagnostics used in multiple places were moved to Diagnostics listed in the wrong place (ie, Sema diagnostics listed in were moved to the correct places. One diagnostic split into two so that the diagnostic string is in the .td file instead of in code. Cleaned up the diagnostic includes after all the changes.
  664. [gn build] Merge r348963 and r349076
  665. [clang-tidy] Improve google-objc-function-naming diagnostics 📙 Summary: The diagnostics from google-objc-function-naming check will be more actionable if they provide a brief description of the requirements from the Google Objective-C style guide. The more descriptive diagnostics may help clarify that functions in the global namespace must have an appropriate prefix followed by Pascal case (engineers working previously with static functions might not immediately understand the different requirements of static and non-static functions). Test Notes: Verified against the clang-tidy tests. Reviewers: benhamilton, aaron.ballman Reviewed By: benhamilton Subscribers: MyDeveloperDay, xazax.hun, cfe-commits Differential Revision:
  666. Revert "[clang-tidy] Share the forced linking code between clang-tidy tool and plugin" This reverts commit r349038 as it was causing test failures:
  667. [llvm-xray] Support for PIE When the instrumented binary is linked as PIE, we need to apply the relative relocations to sleds. This is handled by the dynamic linker at runtime, but when processing the file we have to do it ourselves. Differential Revision:
  668. [macho] save the SDK version stored in module metadata into the version min and build version load commands in the object file This commit introduces a new metadata node called "SDK Version". It will be set by the frontend to mark the platform SDK (macOS/iOS/etc) version which was used during that particular compilation. This node is used when machine code is emitted, by either saving the SDK version into the appropriate macho load command (version min/build version), or by emitting the assembly for these load commands with the SDK version specified as well. The assembly for both load commands is extended by allowing it to contain the sdk_version X, Y [, Z] trailing directive to represent the SDK version respectively. rdar://45774000 Differential Revision:
  669. Revert "Try to update the test to fix the breakage With the new warning, we are showing one more output in the test." This reverts commit r349064. This wasn't updating the right test. Causing (not the different line number from the previous revert): ====================================================================== FAIL: test_diagnostic_warning (tests.cindex.test_diagnostics.TestDiagnostics) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/buildslave/jenkins/workspace/clang-stage1-configure-RA/llvm/tools/clang/bindings/python/tests/cindex/", line 18, in test_diagnostic_warning self.assertEqual(len(tu.diagnostics), 2) AssertionError: 1 != 2
  670. Revert "Make -Wstring-plus-int warns even if when the result is not out of bounds" This reverts commit r349054. It's causing: FAILED: tools/clang/bindings/python/tests/CMakeFiles/check-clang-python FAIL: test_diagnostic_range (tests.cindex.test_diagnostics.TestDiagnostics) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/buildslave/jenkins/workspace/clang-stage1-configure-RA/llvm/tools/clang/bindings/python/tests/cindex/", line 55, in test_diagnostic_range self.assertEqual(len(tu.diagnostics), 1) AssertionError: 2 != 1 ====================================================================== FAIL: test_diagnostic_warning (tests.cindex.test_diagnostics.TestDiagnostics) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/buildslave/jenkins/workspace/clang-stage1-configure-RA/llvm/tools/clang/bindings/python/tests/cindex/", line 18, in test_diagnostic_warning self.assertEqual(len(tu.diagnostics), 2) AssertionError: 1 != 2
  671. Windows ASan: Instrument _msize_base() Summary: A recent update to the VS toolchain in chromium [1] broke the windows ASan bot because the new toolchain calls _msize_base() instead of _msize() in a number of _aligned_* UCRT routines. Instrument _msize_base() as well. [1] Reviewers: rnk, #sanitizers, vitalybuka Reviewed By: rnk, #sanitizers, vitalybuka Subscribers: vitalybuka, kubamracek, llvm-commits Differential Revision:
  672. [Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature. For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops. Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled. This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available. Should fix PR40014 Differential Revision:
  673. Silence CMP0048 warning in the benchmark utility library I'm testing this in LLVM before sending it upstream. Part of PR38874
  674. [gn build] Add infrastructure to create symlinks and use it to create lld's symlinks This is slightly involved, see the comments in the code. The GN build now builds a functional lld! Differential Revision:
  675. [DAGCombiner] clean up visitEXTRACT_VECTOR_ELT This isn't quite NFC, but I don't know how to expose any outward diffs from these changes. Mostly, this was confusing because it used 'VT' to refer to the operand type rather the usual type of the input node. There's also a large block at the end that is dedicated solely to matching loads, but that wasn't obvious. This could probably be split up into separate functions to make it easier to see. It's still not clear to me when we make certain transforms because the legality and constant conditions are intertwined in a way that might be improved.
  676. [X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely.
  677. Revert "Switch Android from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)" Breaks sanitizer-android buildbot. This reverts commit 85e02baff327e7b67ea5b47897302901abb2aa5d.
  678. Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)" Breaks sanitizer-android buildbot. This reverts commit af8443a984c3b491c9ca2996b8d126ea31e5ecbe.
  679. [AArch64] Fix Exynos predicates (NFC) Fix the logic in the definition of the `ExynosShiftExPred` as a more specific version of `ExynosShiftPred`. But, since `ExynosShiftExPred` is not used yet, this change has NFC.
  680. [SampleFDO] handle ProfileSampleAccurate when initializing function entry count ProfileSampleAccurate is used to indicate the profile has exact match to the code to be optimized. Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle ProfileSampleAccurate in multiple places. Differential Revision:
  681. [CUDA] Make all host-side shadows of device-side variables undef. The host-side code can't (and should not) access the values that may only exist on the device side. E.g. address of a __device__ function does not exist on the host side as we don't generate the code for it there. Differential Revision:
  682. Attempt to fix code completion test to handle LLP64 platforms
  683. Fix test after -Wstring-plus-int warning was enabled Use array indexing instead of addition.
  684. Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute This patch breaks RADV (and probably RadeonSI as well)
  685. Fix debug-info-abspath.c on Windows by removing /tmp/t.o line This object seemed unused, so I believe we can just remove this compiler invocation without losing any test coverage.
  686. Update the scan-build to generate SARIF. This updates the scan-build perl script to allow outputting to sarif in a more natural fashion by specifying -sarif as a command line argument, similar to how -plist is already supported.
  687. AMDGPU/GlobalISel: Legalize/regbankselect block_addr
  688. [libc++] Fix _LIBCPP_EXPORTED_FROM_ABI when visibility annotations are disabled Fixes a bug where functions would get exported when building with -fvisibility=hidden and defining _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS. No visibility annotations should be added in this case. The new logic for _LIBCPP_EXPORTED_FROM_ABI matches that of the other visibility annotations around it. Differential Revision:
  689. Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail" Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes The previous version of this patch did not perform dependency checking properly: While the dependency is checked at the position of the memset, the used size must be that of the memcpy. Previously the size of the memset was used, which missed modification in the region MemSetSize..CopySize, resulting in miscompiles. The added tests cover variations of this issue. Differential Revision:
  690. Implement a small subset of the C++ `type_traits` header inside sanitizer_common so we can avoid depending on system C++ headers. Summary: In particular we implement the `is_same<T,U>` templated type. This is useful for doing compile-time comparison of types in `static_assert`s. The plan is to use this in another patch ( ). Reviewers: kcc, dvyukov, vitalybuka, cryptoad, eugenis, kubamracek, george.karpenkov Subscribers: mgorny, #sanitizers, llvm-commits Differential Revision:
  691. [ThinLTO] Compute synthetic function entry count Summary: This patch computes the synthetic function entry count on the whole program callgraph (based on module summary) and writes the entry counts to the summary. After function importing, this count gets attached to the IR as metadata. Since it adds a new field to the summary, this bumps up the version. Reviewers: tejohnson Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision:
  692. [llvm] Address base discriminator overflow in X86DiscriminateMemOps Summary: Macros are expanded on a single line. In case of large expansions, with sufficiently many instructions with memory operands (and when -fdebug-info-for-profiling is requested), we may be unable to generate new base discriminator values - new values overflow (base discriminators may not be larger than 2^12). This CL warns instead of asserting in such a case. A subsequent CL will add APIs to check for overflow before creating new debug info. See Reviewers: davidxl, wmi, gbedwell Reviewed By: davidxl Subscribers: aprantl, llvm-commits Differential Revision:
  693. [llvm-size][libobject] Add explicit "inTextSegment" methods similar to "isText" section methods to calculate size correctly. Summary: llvm-size uses "isText()" etc. which seem to indicate whether the section contains code-like things, not whether or not it will actually go in the text segment when in a fully linked executable. The unit test added (elf-sizes.test) shows some types of sections that cause discrepencies versus the GNU size tool. llvm-size is not correctly reporting sizes of things mapping to text/data segments, at least for ELF files. This fixes pr38723. Reviewers: echristo, Bigcheese, MaskRay Reviewed By: MaskRay Subscribers: llvm-commits Differential Revision:
  694. [clang-tidy] Add the abseil-duration-subtraction check Summary: This check uses the context of a subtraction expression as well as knowledge about the Abseil Time types, to infer the type of the second operand of some subtraction expressions in Duration conversions. For example: absl::ToDoubleSeconds(duration) - foo can become absl::ToDoubleSeconds(duration - absl::Seconds(foo)) This ensures that time calculations are done in the proper domain, and also makes it easier to further deduce the types of the second operands to these expressions. Reviewed By: JonasToth Tags: #clang-tools-extra Differential Revision:
  695. [CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction This is split from D55452 with the correct patch this time. Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle. Differential Revision:
  696. [libcxx] Fix pop_back() tests to make sure they don't always just pass
  697. [CMake] llvm_codesign workaround for Xcode double-signing errors Summary: When using Xcode to build LLVM with code signing, the post-build rule is executed even if the actual build-step was skipped. This causes double-signing errors. We can currently only avoid it by passing the `--force` flag. Plus some polishing for my previous patch D54443. Reviewers: beanz, kubamracek Reviewed By: kubamracek Subscribers: #lldb, mgorny, llvm-commits Differential Revision:
  698. [LoopUtils] Use i32 instead of `void`. The actual type of the first argument of the @dbg intrinsic doesn't really matter as we're setting it to `undef`, but the bitcode reader is picky about `void` types.
  699. Don't add unnecessary compiler flags to llvm-config output Summary: llvm-config --cxxflags --cflags, should only output the minimal flags required to link against the llvm libraries. They currently contain all flags used to compile llvm including flags like -g, -pedantic, -Wall, etc, which users may not always want. This changes the llvm-config output to only include flags that have been explictly added to the COMPILE_FLAGS property of the llvm-config target by the llvm build system. Output from llvm-config when running cmake with: cmake -G Ninja .. -DCMAKE_CXX_FLAGS=-funroll-loops Before: --cppflags: -I$HEADERS_DIR/llvm/include -I$HEADERS_DIR/llvm/build/include -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS --cflags: -I$HEADERS_DIR/llvm/include -I$HEADERS_DIR/llvm/build/include -fPIC -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings \ -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough \ -Wno-comment -fdiagnostics-color -g -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS \ -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS --cxxflags: -I$HEADERS_DIR/llvm/include -I$HEADERS_DIR/llvm/build/include\ -funroll-loops -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall \ -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers \ -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized \ -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment \ -fdiagnostics-color -g -fno-exceptions -fno-rtti -D_GNU_SOURCE -D_DEBUG \ -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS" After: --cppflags: -I$HEADERS_DIR/llvm/include -I$HEADERS_DIR/llvm/build/include \ -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS --cflags: -I$HEADERS_DIR/llvm/include -I$HEADERS_DIR/llvm/build/include \ -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS --cxxflags: -I$HEADERS_DIR/llvm/include -I$HEADERS_DIR/llvm/build/include \ -std=c++11 -fno-exceptions -fno-rtti \ -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS Reviewers: sylvestre.ledru, infinity0, mgorny Reviewed By: sylvestre.ledru, mgorny Subscribers: mgorny, dmgreen, llvm-commits Differential Revision:
  700. Correctly handle skewed streams in drop_front() method. When calling BinaryStreamArray::drop_front(), if the stream is skewed it means we must never drop the first bytes of the stream since offsets which occur in records assume the existence of those bytes. So if we want to skip the first record in a stream, then what we really want to do is just set the begin pointer to the next record. But we shouldn't actually remove those bytes from the underlying view of the data.
  701. Reinstate DW_AT_comp_dir support after D55519. The DIFile used by the CU is special and distinct from the main source file. Its directory part specifies what becomes the DW_AT_comp_dir (the compilation directory), even if the source file was specified with an absolute path. To support the .dwo workflow, a valid DW_AT_comp_dir is necessary even if source files were specified with an absolute path.
  702. Try to update the test to fix the breakage With the new warning, we are showing one more output in the test.
  703. [CodeComplete] Adhere to LLVM naming style in CodeCompletionTest. NFC Also reuses the same var for multiple to reduce the chance of accidentally referecing the previous test.
  704. [CodeComplete] Temporarily disable failing assertion Found the case in the clang codebase where the assertion fires. To avoid crashing assertion-enabled builds before I re-add the missing operation. Will restore the assertion alongside the upcoming fix.
  705. [MachO][TLOF] Add support for local symbols in the indirect symbol table On 32-bit archs, before, we would assume that an indirect symbol will never have local linkage. This can lead to miscompiles where the symbol's value would be 0 and the linker would use that value, because the indirect symbol table would contain the value `INDIRECT_SYMBOL_LOCAL` for that specific symbol. Differential Revision:
  706. Fix CodeCompleteTest.cpp for older gcc plus ccache builds Some versions of gcc, especially when invoked through ccache (-E), can have trouble with raw string literals inside macros. This moves the string out of the macro.
  707. [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from number of uses to an opcode test for DELETED_NODE based on existing similar code. Differential Revision:
  708. [X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode
  709. revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract This causes an address sanitizer bot failure:
  710. Recommit r349041: [tblgen][disasm] Separate encodings from instructions Removed const from the ArrayRef<const EncodingAndInst> to avoid the std::vector<const EncodingAndInst> that G++ saw
  711. Make -Wstring-plus-int warns even if when the result is not out of bounds Summary: Patch by Arnaud Bienner Reviewers: sylvestre.ledru, thakis Reviewed By: thakis Subscribers: cfe-commits Differential Revision:
  712. [CodeComplete] Fill preferred type on binary expressions Reviewers: kadircet Reviewed By: kadircet Subscribers: arphaman, cfe-commits Differential Revision:
  713. [X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243) There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches.
  714. [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract Differential Revision:
  715. [CodeComplete] Set preferred type to bool on conditions Reviewers: kadircet Reviewed By: kadircet Subscribers: cfe-commits Differential Revision:
  716. [clangd] Enable cross-namespace completions by default in clangd Summary: Code completion will suggest symbols from any scope (incl. inaccessible scopes) when there's no qualifier explicitly specified. E.g. {F7689815} As we are assigning relatively low scores for cross-namespace completion items, the overall code completion quality doesn't regress. The feature has been tried out by a few folks, and the feedback is generally positive, so I think it should be ready to be enabled by default. Reviewers: hokein, ilya-biryukov, kadircet Reviewed By: hokein, ilya-biryukov Subscribers: MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  717. [Sparc] Add membar assembler tags Summary: The Sparc V9 membar instruction can enforce different types of memory orderings depending on the value in its immediate field. In the architectural manual the type is selected by combining different assembler tags into a mask. This patch adds support for these tags. Reviewers: jyknight, venkatra, brad Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, jfb, llvm-commits Differential Revision:
  718. [X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243)
  719. Revert r349041: [tblgen][disasm] Separate encodings from instructions One of the GCC based bots is objecting to a vector of const EncodingAndInst's: In file included from /usr/include/c++/8/vector:64, from /export/users/atombot/llvm/clang-atom-d525-fedora-rel/llvm/utils/TableGen/CodeGenInstruction.h:22, from /export/users/atombot/llvm/clang-atom-d525-fedora-rel/llvm/utils/TableGen/FixedLenDecoderEmitter.cpp:15: /usr/include/c++/8/bits/stl_vector.h: In instantiation of 'class std::vector<const {anonymous}::EncodingAndInst, std::allocator<const {anonymous}::EncodingAndInst> >': /export/users/atombot/llvm/clang-atom-d525-fedora-rel/llvm/utils/TableGen/FixedLenDecoderEmitter.cpp:375:32: required from here /usr/include/c++/8/bits/stl_vector.h:351:21: error: static assertion failed: std::vector must have a non-const, non-volatile value_type static_assert(is_same<typename remove_cv<_Tp>::type, _Tp>::value, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/c++/8/bits/stl_vector.h:354:21: error: static assertion failed: std::vector must have the same value_type as its allocator static_assert(is_same<typename _Alloc::value_type, _Tp>::value, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  720. [Sparc] Use float register for integer constrained with "f" in inline asm Summary: Constraining an integer value to a floating point register using "f" causes an llvm_unreachable to trigger. This patch allows i32 integers to be placed in a single precision float register and i64 integers to be placed in a double precision float register. This matches the behavior of GCC. For other types the llvm_unreachable is removed to instead trigger an error message that points out the offending line. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision:
  721. [PowerPC][NFC] Sorting out Pseudo related classes to avoid confusion There are several Pseudo in PowerPC backend. eg: * ISel Pseudo-instructions , which has let usesCustomInserter=1 in td ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them. * Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.) ExpandPostRAPseudos -> expandPostRAPseudo will expand them * Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction * Pseudo instruction in CodeEmitter, which has encoding of 0. Currently, in td files, especially, we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly. This patch is to * Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter * Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo * Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo Differential Revision:
  722. [mir] Fix uninitialized variable in r349035 noticed by clang-atom-d525-fedora-rel and 3 other bots
  723. [Sanitizer] capsicum further support of the API Reviewers: vitalybuka, krytarowski, emaste Reviewed By: emaste Differential Revision:
  724. [tblgen][disasm] Separate encodings from instructions Summary: Separate the concept of an encoding from an instruction. This will enable the definition of additional encodings for the same instruction which can be used to support variable length instruction sets in the disassembler (and potentially assembler but I'm not working towards that right now) without causing an explosion in the number of Instruction records that CodeGen then has to pick between. Reviewers: bogner, charukcs Reviewed By: bogner Subscribers: kparzysz, llvm-commits Differential Revision:
  725. [X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI. Merged the repeated code into a single if().
  726. [clang-tidy] Share the forced linking code between clang-tidy tool and plugin Extract code that forces linking to the separate header and include it in both plugin and standalone tool Differential Revision:
  727. [SystemZ] Pass copy-hinted regs first from getRegAllocationHints(). When computing register allocation hints for a GRX32Bit register, make sure that any of the hinted registers that are also copy hints are returned first in the list. Review: Ulrich Weigand.
  728. [mir] Serialize DILocation inline when not possible to use a metadata reference Summary: Sometimes MIR-level passes create DILocations that were not present in the LLVM-IR. For example, it may merge two DILocations together to produce a DILocation that points to line 0. Previously, the address of these DILocations were printed which prevented the MIR from being read back into LLVM. With this patch, DILocations will use metadata references where possible and fall back on serializing them inline like so: MOV32mr %stack.0.x.addr, 1, _, 0, _, %0, debug-location !DILocation(line: 1, scope: !15) Reviewers: aprantl, vsk, arphaman Reviewed By: aprantl Subscribers: probinson, llvm-commits Tags: #debug-info Differential Revision:
  729. [X86][BWI] Don't custom lower vXi8 rotations. We always expand to shifts anyhow - test changes are just different scheduling only.
  730. [clangd] Refine the way of checking a declaration is referenced by the written code. Summary: The previous solution (checking the AST) is not a reliable way to determine whether a declaration is explicitly referenced by the source code, we are still missing a few cases. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  731. [clangd] Avoid emitting Queued status when we are able to acquire the Barrier. Reviewers: ilya-biryukov Subscribers: javed.absar, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  732. [clangd] Move the utility function to anonymous namespace, NFC.
  733. [NFC][PowerPC] add verify-machineinstrs check After rL349029 and rL348566, sj-ctr-loop.ll is ok for verify-machineinstrs check.
  734. [PowerPC] intrinsic should not have flag isBarrier. Differential Revision:
  735. [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used).
  736. [ARM GlobalISel] Support exts and truncs for Thumb2 Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for them in the instruction selector. This uses handwritten code again because the patterns that are generated with TableGen are tuned for what the DAG combiner would produce and not for simple sext/zext nodes. Luckily, we only need to update the opcodes to use the Thumb2 variants, everything else can be reused from ARM.
  737. [TargetLowering] Add ISD::ROTL/ROTR vector expansion Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion.
  738. [RISCV] Add support for the various RISC-V FMA instruction variants Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd). The criteria for choosing whether a fused add or subtract is used, as well as whether the product is negated or not, is whether some of the arguments to the llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd instructions were added to avoid the negation being performed using a xor trick, which prevented the proper FMA forms from being selected and thus tested. The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 - rs3), but they should be correct. The misleading names were inherited from MIPS, where the negation happens after computing the sum. The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions, as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd. Some comments in the test files about what type of instructions are there tested were updated, to better reflect the current content of those test files. Differential Revision: Patch by Luís Marques.
  739. [AArch64] Catch some more CMN opportunities. Fixes
  740. Add a new interceptors for cdbr(3) and cdbw(3) API from NetBSD Summary: cdb - formats of the constant database. cdbr, cdbr_open, cdbr_open_mem, cdbr_entries, cdbr_get, cdbr_find, cdbr_close - constant database access methods. cdbw_open, cdbw_put, cdbw_put_data, cdbw_put_key, cdbw_stable_seeder, cdbw_output, cdbw_close - creates constant databases. Add a dedicated test for this API. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  741. [OpenCL] Add generic AS to 'this' pointer Address spaces are cast into generic before invoking the constructor. Added support for a trailing Qualifiers object in FunctionProtoType. Note: This recommits the previously reverted patch, but now it is commited together with a fix for lldb. Differential Revision:
  742. Add new interceptors for vis(3) API in NetBSD Summary: Add interceptors for the NetBSD style of vis(3) present inside libc: - vis - nvis - strvis - stravis - strnvis - strvisx - strnvisx - strenvisx - svis - snvis - strsvis - strsnvis - strsvisx - strsnvisx - strsenvisx - unvis - strunvis - strnunvis - strunvisx - strnunvisx Add a dedicated test verifying the installed interceptors. Based on original work by Yang Zheng. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: tomsun.0.7, kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  743. [CodeGen] Allow mempcy/memset to generate small overlapping stores. Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision:
  744. [asan] Don't check ODR violations for particular types of globals Summary: private and internal: should not trigger ODR at all. unnamed_addr: current ODR checking approach fail and rereport false violation if a linker merges such globals linkonce_odr, weak_odr: could cause similar problems and they are already not instrumented for ELF. Reviewers: eugenis, kcc Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision:
  745. AMDGPU/GlobalISel: Legalize f64 fadd/fmul
  746. Fix missing C++ mode comment in header
  747. AMDGPU/GlobalISel: RegBankSelect some simple operations
  748. AMDGPU/GlobalISel: Test cleanups Remove IR and registers sections
  749. Portable Python script across Python version SocketServer has been renamed socketserver in Python3. Differential Revision:
  750. Portable Python script across Python version Queue module as been renamed into queue in Python3 Differential Revision:
  751. Portable Python script across Python version Use higher-level and more compatible threading module to start a new thread. Differential Revision:
  752. [X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC
  753. [AMDGPU] Fix build failure, second attempt Some compilers complain that variable is captured and some complain when it is not. Switch to [&].
  754. [AMDGPU] Fix build failure Fixed error 'lambda capture 'CondReg' is not required to be captured for this use'.
  755. [clang] Add AST matcher for block expressions 🔍 Summary: This change adds a new AST matcher for block expressions. Test Notes: Ran the clang unit tests. Reviewers: aaron.ballman Reviewed By: aaron.ballman Subscribers: cfe-commits Differential Revision:
  756. [AMDGPU] Simplify negated condition Optimize sequence: %sel = V_CNDMASK_B32_e64 0, 1, %cc %cmp = V_CMP_NE_U32 1, %1 $vcc = S_AND_B64 $exec, %cmp S_CBRANCH_VCC[N]Z => $vcc = S_ANDN2_B64 $exec, %cc S_CBRANCH_VCC[N]Z It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC(). Differential Revision:
  757. Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail" This revision caused trucated memsets for structs with padding. See:
  758. Remove unused Args parameter from EmitFunctionBody, NFC
  759. [analyzer] RunLoopAutoreleaseLeakChecker: Come up with a test for r348822. Statement memoization was removed in r348822 because it was noticed to cause memory corruption. This was happening because a reference to an object in a DenseMap was used after being invalidated by inserting a new key into the map. This test case crashes reliably under ASan (i.e., when Clang is built with -DLLVM_USE_SANITIZER="Address") on at least some machines before r348822 and doesn't crash after it.
  760. [LoopUtils] Prefer a set over a map. NFCI.
  761. [test] Add a set of test for constant folding deopt operands with CVP For anyone curious, the first test example is illustrative of a real code idiom produced by branching on the result of a three way comparison.
  762. [Support] Fix FileNameLength passed to SetFileInformationByHandle The rename_internal function used for Windows has a minor bug where the filename length is passed as a character count instead of a byte count. Windows internally ignores this field, but other tools that hook NT api's may use the documented behavior: MSDN documentation specifying the size should be in bytes: Patch by Ben Hillis. Differential Revision:
  763. [libcxx] Add assertion in deque::pop_back when popping from an empty deque Also, add tests making sure that vector and deque both catch the problem when assertions are enabled. Otherwise, deque would segfault and vector would never terminate.
  764. [gn build] Fix defines define on Windows On Windows, we won't go into the `host_os != "win"` block, so `defines` won't have been defined, and we'll run into an undefined identifier error when we try to later append to it. Unconditionally define it at the start and append to it everywhere else. Differential Revision:
  765. [globalisel] Add GISelChangeObserver::changingInstr() Summary: In addition to knowing that an instruction is changed. It's also useful to know when it's about to change. For example, it might print the instruction so you can track the changes in a debug log, it might remove it from some queue while it's being worked on, or it might want to change several instructions as a single transaction and act on all the changes at once. Added changingInstr() to all existing uses of changedInstr() Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision:
  766. Emit a proper diagnostic when attempting to forward inalloca arguments The previous assertion was relatively easy to trigger, and likely will be easy to trigger going forward. EmitDelegateCallArg is relatively popular. This cleanly diagnoses PR28299 while I work on a proper solution.
  767. [WebAssembly] Update dylink section parsing This updates the format of the dylink section in accordance with recent "spec" change: Differential Revision:
  768. [LoopDeletion] Update debug values after loop deletion. When loops are deleted, we don't keep track of variables modified inside the loops, so the DI will contain the wrong value for these. e.g. int b() { int i; for (i = 0; i < 2; i++) ; patatino(); return a; -> 6 patatino(); 7 return a; 8 } 9 int main() { b(); } (lldb) frame var i (int) i = 0 We mark instead these values as unavailable inserting a @llvm.dbg.value(undef to make sure we don't end up printing an incorrect value in the debugger. We could consider doing something fancier, for, e.g. constants, in the future. PR39868. rdar://problem/46418795) Differential Revision:
  769. [InstCombine] Fix negative GEP offset evaluation for 32-bit pointers This fixes The evaluateGEPOffsetExpression() function simplifies GEP offsets for use in comparisons against zero, basically by converting X*Scale+Offset==0 to X+Offset/Scale==0 if Scale divides Offset. However, before this is done, Offset is masked down to the pointer size. This results in incorrect results for negative Offsets, because we basically end up dividing the 32-bit offset *zero* extended to 64-bit bits (rather than sign extended). Fix this by explicitly sign extending the truncated value. Differential Revision:
  770. [hwasan] Link ubsan_cxx to shared runtime library. Summary: This is needed for C++-specific ubsan and cfi error reporting to work. Reviewers: kcc, vitalybuka Subscribers: srhines, kubamracek, mgorny, llvm-commits Differential Revision:
  771. [llvm-objcopy] Change Segment::Type from uint64_t to uint32_t Summary: In both Elf{32,64}_Phdr, the field Elf{32,64}_World p_type is uint32_t. Also reorder the fields to be similar to Elf64_Phdr (which is different from Elf32_Phdr but quite similar). Reviewers: rupprecht, jhenderson, jakehehrlich, alexshap, espindola Reviewed By: rupprecht Subscribers: emaste, arichardson, llvm-commits Differential Revision:
  772. Switch Android from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6) Summary: The TLS_SLOT_TSAN slot is available starting in N, but its location (8) is incompatible with the proposed solution for implementing ELF TLS on Android (i.e. bump ARM/AArch64 alignment to reserve an 8-word TCB). Instead, starting in Q, Bionic replaced TLS_SLOT_DLERROR(6) with TLS_SLOT_SANITIZER(6). Switch compiler-rt to the new slot. Reviewers: eugenis, srhines, enh Reviewed By: eugenis Subscribers: ruiu, srhines, kubamracek, javed.absar, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision:
  773. [hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6) Summary: The change is needed to support ELF TLS in Android. See D55581 for the same change in compiler-rt. Reviewers: srhines, eugenis Reviewed By: eugenis Subscribers: srhines, llvm-commits Differential Revision:
  774. Revert "Declares __cpu_model as dso local" This reverts r348978
  775. [PhaseOrdering] add test for funnel shift (rotate); NFC As mentioned in D55604, there are 2 bugs here: 1. The new pass manager is speculating wildly by default. 2. The old pass manager is not converting this to funnel shift.
  776. [hwasan] Verify Android TLS slot at startup. Summary: Add a check that TLS_SLOT_TSAN / TLS_SLOT_SANITIZER, whichever android_get_tls_slot is using, is not conflicting with TLS_SLOT_DLERROR. Reviewers: rprichard, vitalybuka Subscribers: srhines, kubamracek, llvm-commits Differential Revision:
  777. Declares __cpu_model as dso local __builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local. Differential Revision:
  778. [AST] Store "UsesADL" information in CallExpr. Summary: Currently the Clang AST doesn't store information about how the callee of a CallExpr was found. Specifically if it was found using ADL. However, this information is invaluable to tooling. Consider a tool which renames usages of a function. If the originally CallExpr was formed using ADL, then the tooling may need to additionally qualify the replacement. Without information about how the callee was found, the tooling is left scratching it's head. Additionally, we want to be able to match ADL calls as quickly as possible, which means avoiding computing the answer on the fly. This patch changes `CallExpr` to store whether it's callee was found using ADL. It does not change the size of any AST nodes. Reviewers: fowles, rsmith, klimek, shafik Reviewed By: rsmith Subscribers: aaron.ballman, riccibruno, calabrese, titus, cfe-commits Differential Revision:
  779. [globalisel] Rename GISelChangeObserver's erasedInstr() to erasingInstr() and related nits. NFC Summary: There's little of interest that can be done to an already-erased instruction. You can't inspect it, write it to a debug log, etc. It ought to be notification that we're about to erase it. Rename the function to clarify the timing of the event and reflect current usage. Also fixed one case where we were trying to print an erased instruction. Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision:
  780. [X86] Don't emit MULX by default with BMI2 MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer. Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it. Differential Revision:
  781. Fix for llvm-dwarfdump changes for subroutine types
  782. [test] [depr.c.headers] XFAIL uchar.h on NetBSD
  783. [X86] Move stack folding test for MULX to a MIR test. Add a MULX32 case as well A future patch may stop using MULX by default so use MIR to ensure we're always testing MULX. Add the 32-bit case that we couldn't do in the 64-bit mode IR test due to it being promoted to a 64-bit mul.
  784. [AMDGPU] Support for "uniform-work-group-size" attribute Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision:
  785. Support: use internal `call_once` on PPC64le Use the replacement execute once threading support in LLVM for PPC64le. It seems that GCC does not define `__ppc__` and so we would actually call out to the C++ runtime there which is not what the current code intended. Check both `__ppc__` and `__PPC__`. This avoids the need for checking the endianness. Thanks to nemanjai for the hint about GCC's behaviour and the fact that the reviewed condition could be simplified. Original patch by Sarvesh Tamba!
  786. Teach __builtin_unpredictable to work through implicit casts. The __builtin_unpredictable implementation is confused by any implicit casts, which happen in C++. This patch strips those off so that if/switch statements now work with it in C++. Change-Id: I73c3bf4f1775cd906703880944f4fcdc29fffb0a
  787. [test] [filesystems] NetBSD can do symlink permissions too
  788. [test] [filesystems] Extend FreeBSD tv_sec==-1 workaround to NetBSD NetBSD also uses tv_sec==-1 as error status indicator, and does not support setting such a value.
  789. [X86] Added missing constant pool checks. NFCI. So the extra checks in D55600 don't look like a regression.
  790. DebugInfo/DWARF: Pretty print subroutine types Doesn't handle varargs and other fun things, but it's a start. (also doesn't print these strictly as valid C++ when it's a pointer to function, it'll print as "void(int)*" instead of "void (*)(int)")
  791. [AMDGPU] Emit MessagePack HSA Metadata for v3 code object Continue to present HSA metadata as YAML in ASM and when output by tools (e.g. llvm-readobj), but encode it in Messagepack in the code object. Differential Revision:
  792. DebugInfo/DWARF: Improve dumping of pointers to members ('int foo::*' rather than 'int*')
  793. DebugInfo/DWARF: Refactor type dumping to dump types, rather than DIEs that reference types This lays the foundation for dumping types not referenced by DW_AT_type attributes (in the near-term, that'll be DW_AT_containing_type for a DW_TAG_ptr_to_member_type - in the future, potentially dumping the pretty printed name next to the DW_TAG for the type, rather than only when the type is referenced from elsewhere)
  794. DebugInfo/DWARF: Refactor getAttributeValueAsReferencedDie to accept a DWARFFormValue Save searching for the attribute again when you already have the DWARFFormValue at hand.
  795. [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision:
  796. Fix Wdocumentation warning. NFCI.
  797. [ConstantFold] Use getMinSignedBits for APInt in isIndexInRangeOfArrayType. Indices for getelementptr can be signed so we should use getMinSignedBits instead of getActiveBits here. The function later calls getSExtValue to get the int64_t value, which also checks getMinSignedBits. This fixes Reviewers: mssimpso, efriedma, davide Reviewed By: efriedma Differential Revision:
  798. [X86] Added missing constant pool checks. NFCI. So the extra checks in D55600 don't look like a regression.
  799. llvm-dwarfdump: Dump array dimensions in stringified type names
  800. [SelectionDAG] Add a generic isSplatValue function This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision:
  801. [NVPTX] do not rely on cached subtarget info. If a module has function references, but no functions themselves, we may end up never calling runOnMachineFunction and therefore would never initialize nvptxSubtarget field which would eventually cause a crash. Instead of relying on nvptxSubtarget being initialized by one of the methods, retrieve subtarget info directly. Differential Revision:
  802. Change CallGraph print to show the fully qualified name CallGraph previously would just show the normal name of a function, which gets really confusing when using it on large C++ projects. This patch switches the printName call to a printQualifiedName, so that the namespaces are included. Change-Id: Ie086d863f6b2251be92109ea1b0946825b28b49a
  803. [LV] Fix signed/unsigned comparison warning.
  804. [gn build] Merge r348944
  805. [docs] Use correct ending quotes.
  806. [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640). Differential Revision:
  807. [gn build] Add all non-test build files for lld processing has a potentially interesting part which I've punted on for now (LLD_REVISION and LLD_REPOSITORY are set to empty strings for now). lld now builds in the gn build. But no symlinks to it are created yet, so it can't be meaningfully run yet. Differential Revision:
  808. [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: Differential Revision:
  809. [Driver] Add support for -fembed-bitcode for assembly file Summary: Handle -fembed-bitcode for assembly inputs. When the input file is assembly, write a marker as "__LLVM,__asm" section. Fix Reviewers: compnerd, dexonsmith Reviewed By: compnerd Subscribers: rjmccall, dblaikie, jkorous, cfe-commits Differential Revision:
  810. Make clang::CallGraph look into template instantiations Clang's CallGraph analysis doesn't use the RecursiveASTVisitor's setting togo into template instantiations. The result is that anything wanting to do call graph analysis ends up missing any template function calls. Change-Id: Ib4af44ed59f15d43f37af91622a203146a3c3189
  811. [SampleFDO] Extend profile-sample-accurate option to cover isFunctionColdInCallGraph For SampleFDO, when a callsite doesn't appear in the profile, it will not be marked as cold callsite unless the option -profile-sample-accurate is specified. But profile-sample-accurate doesn't cover function isFunctionColdInCallGraph which is used to decide whether a function should be put into text.unlikely section, so even if the user knows the profile is accurate and specifies profile-sample-accurate, those functions not appearing in the sample profile are still not be put into text.unlikely section right now. The patch fixes that. Differential Revision:
  812. Basic: make `int_least64_t` and `int_fast64_t` match on Darwin The Darwin targets use `int64_t` and `uint64_t` to define the `int_least64_t` and `int_fast64_t` types. The underlying type is actually a `long long`. Match the types to allow the printf specifiers to work properly and have the compiler vended macros match the implementation on the target.
  813. [ExprConstant] Improve memchr/memcmp for type mismatch and multibyte element types Summary: `memchr` and `memcmp` operate upon the character units of the object representation; that is, the `size_t` parameter expresses the number of character units. The constant folding implementation is updated in this patch to account for multibyte element types in the arrays passed to `memchr`/`memcmp` and, in the case of `memcmp`, to account for the possibility that the arrays may have differing element types (even when they are byte-sized). Actual inspection of the object representation is not implemented. Comparisons are done only between elements with the same object size; that is, `memchr` will fail when inspecting at least one character unit of a multibyte element. The integer types are assumed to have two's complement representation with 0 for `false`, 1 for `true`, and no padding bits. `memcmp` on multibyte elements will only be able to fold in cases where enough elements are equal for the answer to be 0. Various tests are added to guard against incorrect folding for cases that miscompile on some system or other prior to this patch. At the same time, the unsigned 32-bit `wchar_t` testing in `test/SemaCXX/constexpr-string.cpp` is restored. Reviewers: rsmith, aaron.ballman, hfinkel Reviewed By: rsmith Subscribers: cfe-commits Differential Revision:
  814. [AMDGPU] Extend the SI Load/Store optimizer to combine more things. I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision:
  815. [mips] Enable using of integrated assembler in all cases.
  816. [mips] Enable using of integrated assembler in all cases.
  817. [AggressiveInstCombine] add tests for rotates with branch; NFC
  818. Remove TODO leftover from my devleopment branch Accidentially checked in a TODO line from r348899. This removes it. Change-Id: I74b59c0ecfe147af8a08dd7fd10893a4ca351d6d
  819. Revert "[OpenCL] Add generic AS to 'this' pointer" Reverting because the patch broke lldb.
  820. [CUDA][OPENMP][NVPTX]Improve logic of the debug info support. Summary: Added support for the -gline-directives-only option + fixed logic of the debug info for CUDA devices. If optimization level is O0, then options --[no-]cuda-noopt-device-debug do not affect the debug info level. If the optimization level is >O0, debug info options are used + --no-cuda-noopt-device-debug is used or no --cuda-noopt-device-debug is used, the optimization level for the device code is kept and the emission of the debug directives is used. If the opt level is > O0, debug info is requested + --cuda-noopt-device-debug option is used, the optimization is disabled for the device code + required debug info is emitted. Reviewers: tra, echristo Subscribers: aprantl, guansong, JDevlieghere, cfe-commits Differential Revision:
  821. [clang-fuzzer] Add explicit dependency on clangSerialization for clangHandleCXX after rC348907 This library was breaking my -DBUILD_SHARED_LIBS=1 build. rC348915 seemed to miss this case. As this seems an "obvious" fix, I am committing without pre-commit review as per the LLVM developer policy.
  822. [OpenCL] Add generic AS to 'this' pointer Address spaces are cast into generic before invoking the constructor. Added support for a trailing Qualifiers object in FunctionProtoType. Differential Revision:
  823. [TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef). Differential Revision:
  824. Regenerate knownbits test. NFCI. A future SimplifyDemandedBits patch will affect this code and I want to ensure the codegen diff is obvious.
  825. [ASTImporter] Remove import of definition from GetAlreadyImportedOrNull Summary: a_sidorin Reviewers: a.sidorin Subscribers: rnkovacs, dkrupp, Szelethus, cfe-commits Differential Revision:
  826. [AMDGPU] Set metadata access for explicit section Summary: This patch provides a means to set Metadata section kind for a global variable, if its explicit section name is prefixed with ".AMDGPU.metadata." This could be useful to make the global variable go to an ELF section without any section flags set. Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye Reviewed By: dstuttard, kzhuravl Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye Differential Revision:
  827. [lit]Add llvm-readelf to tool substitutions Reviewed by: rnk, alexsahp Differential Revision:
  828. [ARM GlobalISel] Select load/store for Thumb2 Unfortunately we can't use TableGen for this because it doesn't yet support predicates on the source pattern root. Therefore, add a bit of handwritten code to the instruction selector to handle the most basic cases. Also mark them as legal and extract their legalizer test cases to a new test file.
  829. [OpenCL] Fix for TBAA information of pointer after addresspacecast Summary: When addresspacecast is generated resulting pointer should preserve TBAA information from original value. Reviewers: rjmccall, yaxunl, Anastasia Reviewed By: rjmccall Subscribers: asavonic, kosarev, cfe-commits, llvm-commits Differential Revision:
  830. [SystemZ] Minor cleanup of SchedModels Some fixes of a few InstRWs for z13 and z14. Review: Ulrich Weigand
  831. Add explicit dependency on clangSerialization after rC348911
  832. Add explicit dependency on clangSerialization for a bunch of components to fix -DBUILD_SHARED_LIBS=on build This is a more thorough fix of rC348911. The story about -DBUILD_SHARED_LIBS=on build after rC348907 (Move PCHContainerOperations from Frontend to Serialization) is: 1. defines PCHContainerReader dtor, ... 2. clangFrontend and clangTooling define classes inheriting from PCHContainerReader, thus their DSOs have undefined references on PCHContainerReader dtor 3. Components depending on either clangFrontend or clangTooling cannot be linked unless they have explicit dependency on clangSerialization due to the default linker option -z defs. The explicit dependency could be avoided if libclang{Frontend,Tooling}.so had these undefined references. This patch adds the explicit dependency on clangSerialization to make them build.
  833. [mips] Use llvm-mc -triple option instead of combination of arch,target-abi,mcpu. NFC
  834. Fix compiler warning about unused variable [NFC]
  835. [Intrinsic] Signed Fixed Point Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision:
  836. [CodeGen] Fix -DBUILD_SHARED_LIBS=on build after rC348907
  837. [X86] Combine vpmovdw+vpacksswb into vpmovdb. This is similar to the combine we already have for vpmovdw+vpackuswb.
  838. [X86] Add a few more fptosi test cases to demonstrate -x86-experimental-vector-widening legalization not combining vpacksswb+vpmovdw. We are able to combine vpackuswb+vpmovdw, but we didn't have packsswb+vpmovdw at the time that combine was added.
  839. [gn build] Add build files for DebugInfo/{DWARF,PDB}, Option, ToolDrivers/llvm-lib, and WindowsManifest The diff in targets.gni is due to me running `gn format` on all .gn and .gni files. llvm_enable_dia_sdk is in a gni file because I'm going to have to read it when writing the lit invocations for check-llvm and check-lld. I've never had the DIA sdk installed locally so I never tested building with it enabled -- it probably doesn't Just Work and needs some path to diaguids.lib. We can finish that once somebody needs it. Differential Revision:
  840. Move PCHContainerOperations from Frontend to Serialization Fix a layering violation. Frontend depends on Serialization, so anything used by both should be in Serialization.
  841. [ConstantInt] Check active bits before calling getZExtValue. Without this check, we hit an assertion in getZExtValue, if the constant value does not fit into an uint64_t. As getZExtValue returns an uint64_t, should we update getAggregateElement to take an uin64_t as well? This fixes Reviewers: efriedma, craig.topper, spatel Reviewed By: efriedma Differential Revision:
  842. [gn build] Add build files for lib/LTO, lib/Linker, lib/Passes, lib/Transforms/{IPO,Instrumentation,ObjCARC} Differential Revision:
  843. Implement IMAGE_REL_AMD64_SECREL for RuntimeDyldCOFFX86_64 lldb on Windows uses the ExecutionEngine for expression evaluation and hits the llvm_unreachable due to this relocation. Thus, implement the relocation and add a test to verify it's function.
  844. [gn build] Add build files for Target/X86/... and for tools/llc The tablegen setup for Target/X86 is a bit different from the CMake build: In the CMake build, Target/X86/CMakeLists.txt has a single tablegen target that does everything. But some of the generated files are only used privately by a subproject, so in the GN build some of the tablegen invocations are smaller-scoped, mostly for build cleanliness. (It helps also a tiny bit with build parallelism since now e.g. the cpp files in MCTargetDesc can build after just 3 .inc files are generated instead of being blocked on all 13. But it's not a big win, since things depending on Target still need to wait for all 11, even though all .inc file use is internal to lib/Target.) Also add a build file for llc, since now all its dependencies have build files. Differential Revision:
  845. [codeview] Look through typedefs in getCompleteTypeIndex Summary: Any time a symbol record, whether it's S_UDT, S_LOCAL, or S_[GL]DATA32, references a record type, it should use the complete type index, even if there's a typedef in the way. Fixes the compiler part of PR39853. Reviewers: zturner, aganea Subscribers: hiraditya, arphaman, llvm-commits Differential Revision:
  846. [GISel] Add parentheses to an assert because gcc is mean.
  847. Replace Const-Member checking with non-recursive version. As reported in PR39946, these two implementations cause stack overflows to occur when a type recursively contains itself. While this only happens when an incomplete version of itself is used by membership (and thus an otherwise invalid program), the crashes might be surprising. The solution here is to replace the recursive implementation with one that uses a std::vector as a queue. Old values are kept around to prevent re-checking already checked types. Change-Id: I582bb27147104763d7daefcfee39d91f408b9fa8
  848. Revert "debuginfo: Use symbol difference for CU length to simplify assembly reading/editing" Temporarily reverts commit r348806 due to strange asm compilation issues in certain modes (combination of asan+cuda+other things). Will provide repro soon.
  849. [coroutines] Improve suspend point simplification Summary: Enable suspend point simplification for cases where: * and coro.suspend are in different basic blocks * where there are intervening intrinsics Reviewers: modocache, tks2103, lewissbaker Reviewed By: modocache Subscribers: EricWF, llvm-commits Differential Revision:
  850. [Debuginfo] Prevent CodeGenPrepare from dropping debuginfo references. This fixes PR39845. CodeGenPrepare employs a transactional model when performing optimizations, i.e. it changes the IR to attempt an optimization and rolls back the change when it finds the change inadequate. It is during the rollback that references to locals were dropped from debug value intrinsics. This patch reinstates debuginfo references during rollbacks. Reviewers: aprantl, vsk Differential Revision:
  851. [ConstantFolding] Handle leading zero-size elements in load folding Struct types may have leading zero-size elements like [0 x i32], in which case the "real" element at offset 0 will not necessarily coincide with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast() wants to drill down the element at offset 0, but currently always picks the 0th aggregate element to do so. This patch changes the code to find the first non-zero-size element instead, for the struct case. The motivation behind this change is Rust is fond of emitting [0 x iN] separators between struct elements to enforce alignment, which prevents constant folding in this particular case. The additional tests with [4294967295 x [0 x i32]] check that we don't end up unnecessarily looping over a large number of zero-size elements of a zero-size array. Differential Revision:
  852. [GISel]: Add MachineIRBuilder support for passing in Flags while building Add the ability to pass in flags to buildInstr calls. Currently no validation is performed but that can be easily performed based on the opcode (if necessary). Reviewed by: paquette.
  853. Revert r348889; it fails some tests.
  854. Stop stripping comments from AST matcher example code. The AST matcher documentation dumping script was being a bit over-zealous about stripping comment markers, which ended up causing comments in example code to stop being comments. Fix that by only stripping comments at the start of a line, rather than removing any forward slash (which also impacts prose text).
  855. Emit -Wformat properly for bit-field promotions. Only explicitly look through integer and floating-point promotion where the result type is actually a promotion, which is not always the case for bit-fields in C.
  856. [Sanitizer] Expand FSEEK interception to FreeBSD Reviewers: krytarowski Reviewed By: krytarowski Differential Revision:
  857. [NewPM] fixing asserts on deleted loop in -print-after-all IR-printing AfterPass instrumentation might be called on a loop that has just been invalidated. We should skip printing it to avoid spurious asserts. Reviewed By: chandlerc, philip.pfaffe Differential Revision:
  858. [analyzer][CStringChecker] evaluate explicit_bzero - explicit_bzero has limited scope/usage only for security/crypto purposes but is non-optimisable version of memset/0 and bzero. - explicit_memset has similar signature and semantics as memset but is also a non-optimisable version. Reviewers: NoQ Reviewed By: NoQ Differential Revision:
  859. [COFF, ARM64] Emit COFF function header Summary: Emit COFF header when printing out the function. This is important as the header contains two important pieces of information: the storage class for the symbol and the symbol type information. This bit of information is required for the linker to correctly identify the type of symbol that it is dealing with. This patch mimics X86 and ARM COFF behavior for function header emission. Reviewers: rnk, mstorsjo, compnerd, TomTan, ssijaric Reviewed By: mstorsjo Subscribers: dmajor, javed.absar, kristof.beyls, llvm-commits Differential Revision:
  860. [test] Permit NetBSD in
  861. [libcxx] Only enable the availability LIT feature when we're testing libc++ Other standard libraries don't implement availability markup, so it doesn't make sense to e.g. XFAIL tests based on availability markup outside of libc++.
  862. [HotColdSplitting] Disable outlining landingpad instructions (PR39917) It's currently not safe to outline landingpad instructions (see Like, the order and content of previous landingpad instructions in a function alters the lowering of subsequent landingpads by renumbering type info ID's. Outlining a landingpad therefore breaks exception handling & unwinding.
  863. [XRay] Add a helper function sortByKey to simplify code Reviewers: dberris, mboerger Reviewed By: dberris Subscribers: mgrang, llvm-commits Differential Revision:
  864. [libcxx] Remove the no_default_flags LIT configuration This is part of an ongoing cleanup of the LIT test suite, where I'm trying to reduce the number of configuration options. In this case, the original intent seemed to be running the test suite with libstdc++, but this is now supported by specifying cxx_stdlib_under_test=libstdc++.
  865. [NFC] Fix incorrect (but unreachable) LIT error message It is unreachable because we test that the cxx_stdlib_under_test is in the supported set of libraries elsewhere. Furthermore, this code relied on the `use_stdlib_type`, which is never defined.
  866. Remove CGDebugInfo::getOrCreateFile() and use TheCU->getFile() directly.
  867. Reuse code from CGDebugInfo::getOrCreateFile() when creating the file for the DICompileUnit. This addresses post-commit feedback for D55085. Without this patch, a main source file with an absolute paths may appear in different DIFiles, once with the absolute path and once with the common prefix between the absolute path and the current working directory. Differential Revision:
  868. Pass PartialOverloading argument to the correct corresponding parameter
  869. [ASan] Minor documentation fix: clarify static linking limitation. Summary: ASan does not support statically linked binaries, but ASan runtime itself can be statically linked into a target binary executable. Reviewers: eugenis, kcc Reviewed By: eugenis Subscribers: cfe-commits, llvm-commits Differential Revision:
  870. [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927) call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM This has the potential to create less-than-8-bit scalar types as shown in some of the test diffs, but it looks like the backend knows how to deal with that in these patterns. This is the simple part of the fix suggested in: Differential Revision:
  871. [BDCE] Add tests for PR39771; NFC These involve cases where certain uses are dead by means of having no demanded bits, even though the used instruction still has demanded bits when other uses are taken into account. BDCE currently does not simplify such cases.
  872. Adding tests for -ast-dump; NFC. This adds tests for expressions in C++.
  873. [llvm-readelf] Add -e/--headers support to readobj/elf Differential Revision:
  874. Revert "[PowerPC] Make no-PIC default to match GCC - CLANG" This reverts commit rL348299.
  875. Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare Summary: When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand, %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags JNE_1 %bb.41, implicit $eflags so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this Patch by Jianping Chen Reviewers: smaslov, LuoYuanke, liutianle, Jianping Reviewed By: Jianping Subscribers: lebedev.ri, llvm-commits Differential Revision:
  876. [x86] clean up code for converting 16-bit ops to LEA; NFC As discussed in D55494, we want to extend this to handle 8-bit ops too, but that could be extended further to enable this on 32-bit systems too.
  877. [libcxx] Fix test failure on GCC 4.9 GCC 4.9 seems to think that a constexpr default constructor implies the constructor to be noexcept.
  878. [analyzer] Fix a minor typo.
  879. [pair] Mark constructors as conditionally noexcept Summary: std::tuple marks its constructors as noexcept when the corresponding memberwise constructors are noexcept too -- this commit improves std::pair so that it behaves the same. This is a re-application of r348824, which broke the build in C++03 mode because a test was marked as supported in C++03 when it shouldn't be. Note: I did not add support in the explicit and non-explicit `pair(_Tuple&& __p)` constructors because those are non-standard extensions, and supporting them properly is tedious (we have to copy the rvalue-referenceness of the deduced _Tuple&& onto the result of tuple_element). <rdar://problem/29537079> Reviewers: mclow.lists, EricWF Subscribers: christof, llvm-commits Differential Revision:
  880. [libcxx] Fix test on compilers that do not support char8_t yet
  881. [x86] remove dead code for 16-bit LEA formation; NFC As discussed in: D55494 ...this code has been disabled/dead for a long time (the code references Athlon and Pentium 4), and there's almost no chance that it will be used given the last decade of uarch evolution. Also, in SDAG we promote 16-bit ops to 32-bit, so there's almost no way to test this code any more.
  882. Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping stores." Breaks ARM/memcpy-inline.ll
  883. [CodeGen] Allow mempcy/memset to generate small overlapping stores. Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision:
  884. Use the standard Duration factory matcher Summary: A new check came in over the weekend; it should use our existing infrastructure for matching `absl::Duration` factories. Patch by hwright. Reviewers: JonasToth Reviewed By: JonasToth Subscribers: astrelni Tags: #clang-tools-extra Differential Revision:
  885. Fix bug where we'd try symbolize a second time with the same arguments. Summary: Fix bug where we'd try symbolize a second time with the same arguments even though symbolization failed the first time. This looks like a long standing typo given that the guard for trying symbolization again is to only try it if symbolization failed using `binary` and `original_binary != binary`. Reviewers: kubamracek, glider, samsonov Subscribers: #sanitizers, llvm-commits Differential Revision:
  886. [clang-tidy] NFC Consolidate test absl::Time implementation Summary: Several tests re-implement these same prototypes (differently), so we can put them in a common location. Patch by hwright. Reviewers: JonasToth Reviewed By: JonasToth Tags: #clang-tools-extra Differential Revision:
  887. [TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBits Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction. Part of PR39689.
  888. Fix "not all control paths return a value" MSVC warnings. NFCI.
  889. [DeadArgElim] Fixes for dbg.values using dead arg/return values Summary: When eliminating a dead argument or return value in a function with local linkage, all uses, including in dbg.value intrinsics, would be replaced with null constants. This would mean that, for example for an integer argument, the debug info would incorrectly express that the value is 0. Instead, replace all uses with undef to indicate that the argument/return value is optimized out. Also, make sure that metadata uses of return values are rewritten even if there are no non-metadata uses of the value. As a bit of historical curiosity, the code that emitted null constants was introduced in the initial check-in of the pass in 2003, before 'undef' values even existed in LLVM. This fixes PR23260. Reviewers: dblaikie, aprantl, vsk, djtodoro Reviewed By: aprantl Subscribers: llvm-commits Tags: #debug-info Differential Revision:
  890. Cleanup test case by removing unused attribute dso_local Attribute 'dso_local' generated in bitcode from compiling original C file but isn't needed. Differential Revision:
  891. Reland r348741 "[Sema] Further improvements to to static_assert diagnostics." Fix a dangling reference to temporary, never return nullptr.
  892. [X86] Switch the 64-bit mulx schedule test to use inline assembly. I'm not sure we should always prefer MULX over MUL. So making the MULX guaranteed with inline assembly.
  893. Revert r348830 "[Sema]improve static_assert(!expr)" Submitted the wrong change.
  894. [Sema]improve static_assert(!expr)
  895. Fix problems with char8_t stuff on compilers that don't support char8_t yet
  896. Second part of P0482 - char8_t. Reviewed as
  897. Move CodeGenOptions from Frontend to Basic Basic uses CodeGenOptions and should not depend on Frontend.
  898. [PPC][NFC] store operands are dst not src Differential Revision:
  899. Revert "[pair] Mark constructors as conditionally noexcept" This broke the tests on Linux. Reverting until I find out why the tests are broken (tomorrow).
  900. [pair] Mark constructors as conditionally noexcept Summary: std::tuple marks its constructors as noexcept when the corresponding memberwise constructors are noexcept too -- this commit improves std::pair so that it behaves the same. Note: I did not add support in the explicit and non-explicit `pair(_Tuple&& __p)` constructors because those are non-standard extensions, and supporting them properly is tedious (we have to copy the rvalue-referenceness of the deduced _Tuple&& onto the result of tuple_element). <rdar://problem/29537079> Reviewers: mclow.lists, EricWF Subscribers: christof, llvm-commits Differential Revision:
  901. [gn build] Add build files for AsmParser, MIRParser, IRReader, MCDisassembler, Vectorize These are all remaining build dependencies of llc, except for Target/X86 which is in a separate patch at Differential Revision:
  902. [analyzer] Remove memoization from RunLoopAutoreleaseLeakChecker Memoization dose not seem to be necessary, as other statement visitors run just fine without it, and in fact seems to be causing memory corruptions. Just removing it instead of investigating the root cause. rdar://45945002 Differential Revision:
  903. [analyzer] Hack for backwards compatibility for options for RetainCountChecker. To be removed once the clients update.
  904. [analyzer] Display a diagnostics when an inlined function violates its os_consumed summary This is currently a diagnostics, but might be upgraded to an error in the future, especially if we introduce os_return_on_success attributes. rdar://46359592 Differential Revision:
  905. [analyzer] Resolve another bug where the name of the leaked object was not printed properly Differential Revision:
  906. [WebAssembly] Add '.eventtype' directive support Summary: This patch supports `.eventtype` directive printing and parsing in the same syntax with `.functype`. Reviewers: aardappel, sbc100 Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  907. [TextAPI][elfabi] Make SoName optional This change makes DT_SONAME treated as an optional trait for ELF TextAPI stubs. This change accounts for the fact that shared objects aren't guaranteed to have a DT_SONAME entry. Tests have been updated to check for correct behavior of an optional soname. Differential Revision:
  908. [WebAssembly] TargetStreamer cleanup (NFC) Summary: - Unify mixed argument names (`Symbol` and `Sym`) to `Sym` - Changed `MCSymbolWasm*` argument of `emit***` functions to `const MCSymbolWasm*`. It seems not very intuitive that emit function in the streamer modifies symbol contents. - Moved empty function bodies to the header - clang-format Reviewers: aardappel, dschuff, sbc100 Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision:
  909. [GISel]: Refactor MachineIRBuilder to allow passing additional parameters to build Instrs Previously MachineIRBuilder::buildInstr used to accept variadic arguments for sources (which were either unsigned or MachineInstrBuilder). While this worked well in common cases, it doesn't allow us to build instructions that have multiple destinations. Additionally passing in other optional parameters in the end (such as flags) is not possible trivially. Also a trivial call such as B.buildInstr(Opc, Reg1, Reg2, Reg3) can be interpreted differently based on the opcode (2defs + 1 src for unmerge vs 1 def + 2srcs). This patch refactors the buildInstr to buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>) where DstOps and SrcOps are typed unions that know how to add itself to MachineInstrBuilder. After this patch, most invocations would look like B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..}); Now all the other calls (such as buildAdd, buildSub etc) forward to buildInstr. It also makes it possible to build instructions with multiple defs. Additionally in a subsequent patch, we should make it possible to add flags directly while building instructions. Additionally, the main buildInstr method is now virtual and other builders now only have to override buildInstr (for say constant folding/cseing) is straightforward. Also attached here ( is a clang-tidy patch that should upgrade the API calls if necessary.
  910. Follow-up fix to r348811 for null Errors (which is the case for end iterators) Not sure how I missed that in my testing, but obvious enough - this causes segfaults when attempting to dereference the Error in end iterators.
  911. Add a version of std::function that includes a few optimizations in ABI V2. Patch by Jordan Soyke ( Reviewed as D55045 The result of running the benchmarks and comparing them can be found here:
  912. llvm-objcopy: Improve/simplify llvm::Error handling during notes iteration Using an Error as an out parameter from an indirect operation like iteration as described in the documentation ( ) seems to be a little fussy - so here's /one/ possible solution, though I'm not sure it's the right one. Alternatively such APIs may be better off being switched to a standard algorithm style, where they take a lambda to do the iteration work that is then called back into (eg: "Error e = obj.for_each_note([](const Note& N) { ... });"). This would be safer than having an unwritten assumption that the user of such an iteration cannot return early from the inside of the function - and must always exit through the gift shop... I mean error checking. (even though it's guaranteed that if you're mid-way through processing an iteration, it's not in an error state). Alternatively we'd need some other (the super untrustworthy/thing we've generally tried to avoid) error handling primitive that actually clears the error state entirely so it's safe to ignore. Fleshed this solution out a bit further during review - it now relies on op==/op!= comparison as the equivalent to "if (Err)" testing the Error. So just like an Error must be checked (even if it's in a success state), the Error hiding in the iterator must be checked after each increment (including by comparison with another iterator - perhaps this could be constrained to only checking if the iterator is compared to the end iterator? Not sure it's too important). So now even just creating the iterator and not incrementing it at all should still assert because the Error has not been checked. Reviewers: lhames, jakehehrlich Differential Revision:
  913. Update test for instcombine change
  914. [builtins] Remove trailing whitespaces, NFC Remove trailing whitespaces so that it is easier to diff the code between div{s,d,t}f3.c
  915. debuginfo: Use symbol difference for CU length to simplify assembly reading/editing Mucking about simplifying a test case ( ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer). Reviewers: JDevlieghere, probinson, ABataev Differential Revision:
  916. [Local] Promote an utility that could be used elsewhere. NFCI.
  917. Fix LLVM_LINK_LLVM_DYLIB build of TapiTests A dependency on TestingSupport was introduced in rL348735 but library was not incldued in the LLVM_LINK_LLVM_DYLIB build. Differential Revision:
  918. [Hexagon] Couple of fixes in optimize addressing mode - Check if an operand is an immediate before calling getImm. Some operands that take constant values can actually have global symbols or other constant expressions. - When a load-constant instruction can be folded into users, make sure to only delete it when all users have been successfully converted.
  919. InstCombine: Scalarize single use icmp/fcmp
  920. [InstCombine] add tests for movmsk (PR39927) NFC
  921. Revert "Change InitListExpr dump to label and pointer" This reverts commit r348794.
  922. Fix nits
  923. Re-order content of template parameter dumps Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  924. [Targets] Fixup incorrect targets in codemodel tests
  925. Re-order content in OMPDeclareReductionDecl dump Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  926. Change InitListExpr dump to label and pointer Summary: Don't add a child just for the label. Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  927. [clang-tidy] insert release notes for new checkers alphabetically Summary: Almost all code review comments on new checkers {D55433} {D48866} {D54349} seem to ask for the release notes to be added alphabetically, plus I've seen commits by @Eugene.Zelenko reordering the lists Make add those release notes alphabetically based on checker name If include-fixer section is seen add it at the end Minor change in the message format to prevent double newlines added before the checker. Do the tools themselves have unit tests? (sorry new to this game) - Tested adding new checker at the beginning - Tested on adding new checker in the middle - Tested on empty ReleasesNotes.rst (as we would see after RC) Patch by MyDeveloperDay. Reviewers: alexfh, JonasToth, curdeius, aaron.ballman, benhamilton, hokein Reviewed By: JonasToth Subscribers: cfe-commits, xazax.hun, Eugene.Zelenko Tags: #clang-tools-extra Differential Revision:
  928. Revert "[Hexagon] Check if operand is an immediate before getImm" This reverts r348787. The patch wasn't quite correct.
  929. APFloat: allow 64-bit of payload Summary: The APFloat and Constant APIs taking an APInt allow arbitrary payloads, and that's great. There's a convenience API which takes an unsigned, and that's silly because it then directly creates a 64-bit APInt. Just change it to 64-bits directly. At the same time, add ConstantFP NaN getters which match the APFloat ones (with getQNaN / getSNaN and APInt parameters). Improve the APFloat testing to set more payload bits. Reviewers: scanon, rjmccall Subscribers: jkorous, dexonsmith, kristina, llvm-commits Differential Revision:
  930. Add an explicit triple to this test to fix failing test bots.
  931. [constexpr][c++2a] Try-catch blocks in constexpr functions Implement support for try-catch blocks in constexpr functions, as proposed in and voted in San Diego for c++20. The idea is that we can still never throw inside constexpr, so the catch block is never entered. A try-catch block like this: try { f(); } catch (...) { } is then morally equivalent to just { f(); } Same idea should apply for function/constructor try blocks. rdar://problem/45530773 Differential Revision:
  932. [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes. This patch restricts the capability of G_MERGE_VALUES, and uses the new G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places. This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32> and <2 x s64> vectors. Differential Revisions:
  933. [Hexagon] Check if operand is an immediate before getImm
  934. Adding tests for -ast-dump; NFC. This adds tests for expressions in C.
  935. [Hexagon] Add patterns for any_extend from i1 and short vectors of i1
  936. [TargetLowering] Add UNDEF folding to SimplifyDemandedVectorElts If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node. Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values. Differential Revision:
  937. [docs] Add the new Objective-C ARC intrinsics to the LangRef. These were added in r348441. This mostly just points to the clang documentation to describe the intended semantics of each intrinsic.
  938. [DAGCombiner] Remove unnecessary recursive DAGCombiner::visitINSERT_SUBVECTOR call. As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.
  939. Refactor std::function to more easily support alternative implementations. Patch from Jordan Soyke ( Reviewed as D55520 This change adds a new internal class, called __value_func, that adds a minimal subset of value-type semantics to the internal __func interface. The change is NFC, and is cleanup for the upcoming ABI v2 function implementation (D55045).
  940. ComputeLineNumbers: delete SSE2 vectorization Summary: SSE2 vectorization was added in 2012, but it is 2018 now and I can't observe any performance boost (testing clang -E [all Sema/* CodeGen/* with proper -I options]) with the existing _mm_movemask_epi8+countTrailingZeros or the following SSE4.2 (compiling with -msse4.2): __m128i C = _mm_setr_epi8('\r','\n',0,0,0,0,0,0,0,0,0,0,0,0,0,0); _mm_cmpestri(C, 2, Chunk, 16, _SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_ANY | _SIDD_POSITIVE_POLARITY | _SIDD_LEAST_SIGNIFICANT) Delete the vectorization to simplify the code. Also simplify the code a bit and don't check the line ending sequence \n\r Reviewers: bkramer, #clang Reviewed By: bkramer Subscribers: cfe-commits Differential Revision:
  941. [x86] fix formatting; NFC This should really be generalized to allow increment and/or we should replace it by using ISD::matchUnaryPredicate(). See D55515 for context.
  942. [AArch64] Refactor the Exynos scheduling predicates Refactor the scheduling predicates based on `MCInstPredicate`. In this case, for the Exynos processors. Differential revision:
  943. [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D. This commit changes which l1 flush instruction is used for AMDPAL and MESA3d workloads to flush the entire l1 cache instead of just the volatile lines. Differential Revision:
  944. [Sanitizer] expand sysctl/getmntinfo/nl_langinfo to Darwin Reviewers: vitalybuka, krytarowski, kubamracek Reviewed By: vitalybuka, krytarowski Differential Revision:
  945. [x86] add tests for LowerVSETCC with min/max; NFC
  946. [AArch64] Refactor the scheduling predicates Refactor the scheduling predicates based on `MCInstPredicate`. Augment the number of helper predicates used by processor specific predicates. Differential revision:
  947. [AMDGPU] Add new Mode Register pass - minor fix Trivial change to add parentheses to an expression to avoid a sanitizer error in SIModeRegister.cpp, which was committed earlier.
  948. [llvm-mca] Add new tests for Exynos (NFC)
  949. [DAGCombiner] Simplify test case from r348759 Thanks Simon for pointing that out.
  950. [libclang] Revert removal of tidy plugin support from libclang introduced in r347496 Differential Revision:
  951. [AVX512] Update typo in comment Should be "Sae" for "Suppress All Exceptions". NFC
  952. Use zip_longest for iterator range comparisons. NFC. Use zip_longest in two locations that compare iterator ranges. zip_longest allows the iteration using a range-based for-loop and to be symmetric over both ranges instead of prioritizing one over the other. In that latter case code have to handle the case that the first is longer than the second, the second is longer than the first, and both are of the same length, which must partially be checked after the loop. With zip_longest, this becomes an element comparison within the loop like the comparison of the elements themselves. The symmetry makes it clearer that neither the first and second iterators are handled differently. The iterators are not event used directly anymore, just the ranges. Differential Revision:
  953. [GlobalISel] Set stack protector index when translating Intrinsic::stackprotector Record the stack protector index in MachineFrameInfo when translating Intrinsic::stackprotector similarly as is done by SelectionDAG when processing the same intrinsic. Setting this index allows the Prologue/Epilogue Insertion to recognize that the stack protection is enabled. The pass can then make sure that the stack protector comes before local variables on the stack and assigns potentially vulnerable objects first so they are close to the stack protector slot. Differential Revision:
  954. [mips][mc] Emit R_{MICRO}MIPS_JALR when expanding jal to jalr When replacing jal with jalr, also emit '.reloc R_MIPS_JALR' (R_MICROMIPS_JALR for micromips). The linker might then be able to turn jalr into a direct call. Add '-mips-jalr-reloc' to enable/disable this feature (default is true). Differential revision:
  955. [DAGCombiner] Use the result value type in visitCONCAT_VECTORS This triggers an assert when combining concat_vectors of a bitcast of merge_values. With asserts disabled, it fails to select: fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8 0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1 0x7ff19d000b50: f64 = Register %1 In function: d Differential Revision:
  956. [NFC][AArch64] Remove duplicate Arch list in target parser tests The list generated in the target parser tests is the same as the one in the AArch64 target parser. Use that one instead. Differential Revision:
  957. Misc typos fixes in ./lib folder Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned` Reviewers: teemperor Reviewed By: teemperor Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits Differential Revision:
  958. [AMDGPU] Add new Mode Register pass A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations.
  959. [DebugInfo] Don't drop dbg.value's of nullptr Currently, dbg.value's of "nullptr" are dropped when entering a SelectionDAG -- apparently just because of an oversight when recognising Values that are constant (see PR39787). This patch adds ConstantPointerNull to the list of constants that can be turned into DBG_VALUEs. The matter of what bit-value a null pointer constant in LLVM has was raised in this mailing list thread: Where it transpires LLVM relies on (IR) null pointers being zero valued, thus I've baked this assumption into the patch. Differential Revision:
  960. [OpenCL][CodeGen] Fix replacing memcpy with addrspacecast Summary: If a function argument is byval and RV is located in default or alloca address space an optimization of creating addrspacecast instead of memcpy is performed. That is not correct for OpenCL, where that can lead to a situation of address space casting from __private * to __global *. See an example below: ``` typedef struct { int x; } MyStruct; void foo(MyStruct val) {} kernel void KernelOneMember(__global MyStruct* x) { foo (*x); } ``` for this code clang generated following IR: ... %0 = load %struct.MyStruct addrspace(1)*, %struct.MyStruct addrspace(1)** %x.addr, align 4 %1 = addrspacecast %struct.MyStruct addrspace(1)* %0 to %struct.MyStruct* ... So the optimization was disallowed for OpenCL if RV is located in an address space different than that of the argument (0). Reviewers: yaxunl, Anastasia Reviewed By: Anastasia Subscribers: cfe-commits, asavonic Differential Revision:
  961. [DebugInfo] Emit undef DBG_VALUEs when SDNodes are optimised out This is a fix for PR39896, where dbg.value's of SDNodes that have been optimised out do not lead to "DBG_VALUE undef" instructions being created. Such undef instructions are necessary to terminate earlier variable ranges, otherwise variable values leak past the point where they're valid. The "invalidated" flag of SDDbgValue is currently being abused to mean two things: * The corresponding SDNode is now invalid * This SDDbgValue should not be emitted Of which there are several legitimate combinations of meaning: * The SDNode has been invalidated and we should emit "DBG_VALUE undef" * The SDNode has been invalidated but the debug data was salvaged, don't emit anything for this SDDbgValue * This SDDbgValue has been emitted This patch introduces distinct "Emitted" and "Invalidated" fields to the SDDbgValue class, updates users accordingly, and generates "undef" DBG_VALUEs for invalidated records. Awkwardly, there are circumstances where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll which I've preserved. Differential Revision:
  962. [X86] Fix AvoidStoreForwardingBlocks pass for negative displacements Fixes The size of the first copy was computed as std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as I can see, this should just be LdDisp2 - LdDisp1. The case where LdDisp1 > LdDisp2 is already handled in the code above, in which case LdDisp2 is set to LdDisp1 and this subtraction will evaluate to Size1 = 0, which is the correct value to skip an overlapping copy. Differential Revision:
  963. Add data types needed for md2(3)/NetBSD interceptors Missing part of D55469.
  964. Add interceptors for the sha2(3) from NetBSD Summary: SHA224_Init, SHA224_Update, SHA224_Final, SHA224_End, SHA224_File, SHA224_FileChunk, SHA224_Data, SHA256_Init, SHA256_Update, SHA256_Final, SHA256_End, SHA256_File, SHA256_FileChunk, SHA256_Data, SHA384_Init, SHA384_Update, SHA384_Final, SHA384_End, SHA384_File, SHA384_FileChunk, SHA384_Data, SHA512_Init, SHA512_Update, SHA512_Final, SHA512_End, SHA512_File, SHA512_FileChunk, SHA512_Data – calculates the NIST Secure Hash Standard (version 2) Add tests for new interceptors. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  965. Add interceptors for md2(3) from NetBSD Summary: MD2Init, MD2Update, MD2Final, MD2End, MD2File, MD2Data - calculates the RSA Data Security, Inc., "MD2" message digest. Add a dedicated test. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  966. Add new interceptors for FILE repositioning stream Summary: Add new interceptors for a set of functions to reposition a stream: fgetpos, fseek, fseeko, fsetpos, ftell, ftello, rewind . Add a dedicated test. Enable this interface on NetBSD. Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  967. Revert r348741 "[Sema] Further improvements to to static_assert diagnostics." Seems to break build bots.
  968. [Sema] Further improvements to to static_assert diagnostics. Summary: We're now handling cases like `static_assert(!expr)` and static_assert(!(expr))`. Reviewers: aaron.ballman, Quuxplusone Subscribers: cfe-commits Differential Revision:
  969. [llvm-exegesis] Also check latency mode in local lit. Summary: This should avoid failing on old CPUs that do not have a cycle counter. Subscribers: tschuett, llvm-commits Differential Revision:
  970. [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max reduction. Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register? Reviewers: RKSimon, spatel, ABataev Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision:
  971. [X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc. The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature. This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.
  972. [X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic. Both intrinsics do the exact same thing so we really only need one. Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.
  973. [TextAPI][elfabi] Fix build by adding std::move() to r348735
  974. [TextAPI][elfabi] Make TBE handlers functions that return Errors Since TBEHandler doesn't maintain state or otherwise have any need to be a class right now, the read and write functions have been moved out and turned into standalone functions. Additionally, the TBE read function has been updated to return an Expected value for better error handling. Tests have been updated to reflect these changes. Differential Revision:
  975. [bugpoint] Find 'opt', etc., in bugpoint directory Summary: When bugpoint attempts to find the other executables it needs to run, such as `opt` or `clang`, it tries searching the user's PATH. However, in many cases, the 'bugpoint' executable is part of an LLVM build, and the 'opt' executable it's looking for is in that same directory. Many LLVM tools handle this case by using the `Paths` parameter of `llvm::sys::findProgramByName`, passing the parent path of the currently running executable. Do this same thing for bugpoint. However, to preserve the current behavior exactly, first search the user's PATH, and then search for 'opt' in the directory containing 'bugpoint'. Test Plan: `check-llvm`. Many of the existing bugpoint tests no longer need to use the `--opt-command` option as a result of these changes. Reviewers: MatzeB, silvas, davide Reviewed By: MatzeB, davide Subscribers: davide, llvm-commits Differential Revision:
  976. Re-commit "[IR] Add NODISCARD to attribute functions" Now that is committed, can be committed once again -- all warnings are now fixed.
  977. [AMDGPU] Fix discarded result of addAttribute Summary: `llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods defined on these classes, such as `addAttribute`, return a new immutable object with the attribute added. In I attempted to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since calling these methods has no side-effects, and so ignoring the result that is returned is almost certainly a programmer error. However, committing the change resulted in new warnings in the AMDGPU target. The AMDGPU simplify libcalls pass added in attempts to add the readonly and nounwind attributes to simplified library functions, but instead calls the `addAttribute` methods and ignores the result. Modify the simplify libcalls pass to actually add the nounwind and readonly attributes. Also update the simplify libcalls test to assert that these attributes are actually being set. Reviewers: rampitec, vpykhtin, rnk Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision:
  978. Speculatively fixing the build; it seems add_pointer_t and add_const_t are not implemented everywhere.
  979. Move the make_const_ptr trait into STLExtras; use add_pointer where possible; NFC.
  980. Adding an STL-like type trait that is duplicated in multiple places in Clang. This trait is used by several AST visitor classes to control whether the AST is visiting const nodes or non-const nodes. These uses cannot be easily replaced with the STL traits directly due to use of an unspecialized templated when a type is expected (due to the template template parameter involved).
  981. [X86] Add some comments about when some X86 intrinsic autoupgrade code was added. Someday we'd like to remove old autoupgrade code so it helps to annotate how long its been there so we don't have to go digging through commit history.
  982. [X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545.
  983. Remove unneeded dependency from lib/Target/X86/Utils/ to lib/IR (aka Core). The dependency was added in r213995 in response to r213986 which did make X86/Utils depend on IR, but r256680 later removed that dependency again.
  984. [x86] regenerate test checks; NFC
  985. [x86] don't try to convert add with undef operands to LEA The existing code tries to handle an undef operand while transforming an add to an LEA, but it's incomplete because we will crash on the i16 test with the debug output shown below. It's better to just give up instead. Really, GlobalIsel should have folded these before we could get into trouble. # Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected bb.0 (%ir-block.0): liveins: $edi %1:gr32 = COPY killed $edi %0:gr16 = COPY %1.sub_16bit:gr32 %5:gr64_nosp = IMPLICIT_DEF %5.sub_16bit:gr64_nosp = COPY %0:gr16 %6:gr64_nosp = IMPLICIT_DEF %6.sub_16bit:gr64_nosp = COPY %2:gr16 %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg %3:gr16 = COPY killed %4.sub_16bit:gr32 $ax = COPY killed %3:gr16 RET 0, implicit killed $ax # End machine code for function add_undef_i16. *** Bad machine code: Reading virtual register without a def *** - function: add_undef_i16 - basic block: %bb.0 (0x7fe6cd83d940) - instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16 - operand 1: %2:gr16 LLVM ERROR: Found 1 machine code errors. Differential Revision:
  986. [X86] Extend pfm counter coverage for llvm-exegesis Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any').
  987. NFC: Rename TemplateDecl dump utilities There is a clang::TemplateDecl AST type, so a method called VisitTemplateDecl looks like it should 'override' the method from the base visitor, but it does not because of the extra parameters it takes. In reality, these methods are utilities, so name them like utilities.
  988. NFC: Move dump of individual comment nodes to NodeDumper Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  989. Revert "Introduce optional labels to dumpStmt" This reverts commit 933402caa09963792058198578522a95f013c69c.
  990. Introduce optional labels to dumpStmt If the label is present, it is added as a child, with the statement a child of the label. This preserves behavior of the InitListExpr dump output.
  991. Inline hasNodes into only caller It is easier to refactor with fewer utility methods.
  992. Inline dumpFullComment into callers It causes confusion over whether it or dumpComment is the more important. It is easier to refactor with fewer utility methods.
  993. Re-order content from InitListExpr Summary: This causes no change in the output of ast-dump-stmt.cpp due to the way child nodes are printed with a delay. Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  994. Fix InitListExpr test Wrong case of Check meant this has no effect.
  995. [X86] Add test for PR39926; NFC The test file shows a case where the avoid store forwarding block pass misses to copy a range (-1..1) when the load displacement changes sign. Baseline test for D55485.
  996. SourceManager: insert(make_pair(..)) -> try_emplace. NFC
  997. [COFF] Map truncated .eh_frame section name PE/COFF sections can have section names truncated to 8 chars, in order to have the name available at runtime. (The string table, where long untruncated names are stored, isn't loaded at runtime.) This allows various llvm tools to dump the .eh_frame section from such executables. Patch by Peiyuan Song! Differential Revision:
  998. Fix conflict types for this FreeBSD test.
  999. [DAGCombiner] re-enable truncation of binops This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch.
  1000. [x86] add 32-bit RUN for tests and test with opaque constants; NFC The opaque constant test is reduced from a Chrome file that infinite-looped with rL347917.
  1001. [gn build] Add build files for CodeGen subfolders AsmPrinter, GlobalISel, SelectionDAG. Differential Revision:
  1002. [WebAssembly] Make WasmSymbol's signature usable for events (NFC) Summary: WasmSignature used to use its `WasmSignature` member variable only for function types, but now it also can be used for events as well. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision:
  1003. [llvm-readobj] Little clean up inside `parseDynamicTable` Summary: This anoymous function actually has same logic with `Obj->toMappedAddr`. Besides, I have a question on resolving illegal value. `gnu-readelf`, `gnu-objdump` and `llvm-objdump` could parse the test file 'test/tools/llvm-objdump/Inputs/private-headers-x86_64.elf', but `llvm-readobj` will fail when parse `DT_RELR` segment. Because, the value is 0x87654321 which is illegal. So, shall we do this clean up rather then remove the checking statements inside anoymous function? ``` if (Delta >= Phdr.p_filesz) return createError("Virtual address is not in any segment"); ``` Reviewers: rupprecht, jhenderson Reviewed By: jhenderson Subscribers: llvm-commits Differential Revision:
  1004. Convert some ObjC msgSends to runtime calls. It is faster to directly call the ObjC runtime for methods such as alloc/allocWithZone instead of sending a message to those functions. This patch adds support for converting messages to alloc/allocWithZone to their equivalent runtime calls. Tests included for the positive case of applying this transformation, negative tests that we ensure we only convert "alloc" to objc_alloc, not "alloc2", and also a driver test to ensure we enable this only for supported runtime versions. Reviewed By: rjmccall
  1005. Move diagnostic enums into Basic. Move enums from */*Diagnostic.h to Basic/Diagnostic*.h. Basic/AllDiagnostics.h needs all the enums and moving the sources to Basic prevents a Basic->*->Basic dependency loop. This also allows each Basic/Diagnostics* to have a header at Basic/Diagnostic*.h (except for Common). The old headers are kept in place since other packages are still using them.
  1006. Fix a typo in the strtoi test
  1007. Revert a chunk of previous change in sanitizer_platform_limits_netbsd.h Undefining INLINE breaks the build. The invalid change in this file has been overlooked in D55386.
  1008. Add interceptors for md5(3) from NetBSD Summary: MD5Init, MD5Update, MD5Final, MD5End, MD5File, MD5Data - calculates the RSA Data Security, Inc., "MD5" message digest. Add a dedicated test. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1009. Add interceptors for the rmd160(3) from NetBSD Summary: RMD160Init, RMD160Update, RMD160Final, RMD160Transform, RMD160End, RMD160File, RMD160Data - calculates the ``RIPEMD-160'' message digest. Add a dedicated test for this API. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1010. Add interceptors for the md4(3) from NetBSD Summary: MD4Init, MD4Update, MD4Final, MD4End, MD4File, MD4Data - calculates the RSA Data Security, Inc., "MD4" message digest. Add dedicated test. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1011. Add interceptors for the sha1(3) from NetBSD Summary: Add interceptors for: - SHA1Init - SHA1Update - SHA1Final - SHA1Transform - SHA1End - SHA1File - SHA1FileChunk - SHA1Data Add a dedicated regression test for this API. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: mgorny, llvm-commits, kubamracek, #sanitizers Tags: #sanitizers Differential Revision:
  1012. Stop tracking retain count of OSObject after escape to void * / other primitive types Escaping to void * / uint64_t / others non-OSObject * should stop tracking, as such functions can have heterogeneous semantics depending on context, and can not always be annotated. rdar://46439133 Differential Revision:
  1013. [sanitizer] Add lit.local.cfg for FreeBSD
  1014. [sanitizer] Suppress lint warning conflicting with clang-format
  1015. Fix style.
  1016. [gn build] Merge r348593
  1017. [SelectionDAG] Remove ISD::ADDC/ADDE from some undef handling code in getNode. NFCI These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway.
  1018. Conflict fixes from previous commits.
  1019. [Sanitizer] capsicum api subset interception - For the moment a subset of this api dealing with file descriptors permissions and ioctls. Reviewers: vitalybuka, krytarowski Reviewed By: vitalybuka Differential Revision:
  1020. [gn build] Add build files for lib/CodeGen, lib/Transforms/..., and lib/Bitcode/Writer Differential Revision:
  1021. [Documentation] Alphabetical order in new checks list.
  1022. [tests] Fix the FileManagerTest getVirtualFile test on Windows Summary: The test passes on Windows only when it is executed on the C: drive. If the build and tests run on a different drive, the test is currently failing. Reviewers: kadircet, asmith Subscribers: cfe-commits Differential Revision:
  1023. Add interceptors for the strtoi(3)/strtou(3) from NetBSD Summary: strtoi/strtou converts string value to an intmax_t/uintmax_t integer. Add a dedicated test. Enable this API for NetBSD. It's a reworked version of the original work by Yang Zheng. Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: kubamracek, tomsun.0.7, mgorny, llvm-commits, #sanitizers Tags: #sanitizers Differential Revision:
  1024. [CUDA] Added missing 'inline' for functions defined in a header.
  1025. [X86] Remove the XFAILed test added in r348620 It seems to be unexpectedly passing on some bots probably because it requires asserts to fail, but doesn't say that. But we already have a patch in review to make it not xfail so I'd rather just focus on getting it passing rather than trying to figure out an unexpected pass.
  1026. Update a couple of vector<bool> tests that were testing libc++-specific bahavior. Thanks to Andrey Maksimov for the catch.
  1027. Fix IOError exception being raised in ``crash when using `atos` symbolizer on Darwin when the binaries don't exist. For now we just produce an unsymbolicated stackframe when the binary doesn't exist.
  1028. AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments We were still using the rounded down offset and alignment even though they aren't handled because you can't trivially bitcast the loaded value.
  1029. [GlobalISel] Add IR translation support for the @llvm.log10 intrinsic This adds IR translation support for @llvm.log10 and updates relevant tests.
  1030. Add a new interceptors for statvfs1(2) and fstatvfs1(2) from NetBSD Summary: statvfs1, fstatvfs1 - get file system statistics. While there, use file descriptor related macros in the fstatvfs interceptor. Add a dedicated test. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: dvyukov, kubamracek, mgorny, llvm-commits, #sanitizers Tags: #sanitizers Differential Revision:
  1031. [Hexagon] Fix post-ra expansion of PS_wselect
  1032. Add a new interceptor for fparseln(3) from NetBSD Summary: fparseln - returns the next logical line from a stream. Add a dedicated test for this API. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, mgorny, llvm-commits, #sanitizers Tags: #sanitizers Differential Revision:
  1033. [libcxx] Remove the availability_markup LIT feature It is now equivalent to the 'availability' LIT feature, so there's no reason to keep both.
  1034. Add new interceptor for strtonum(3) Summary: strtonum(3) reliably convertss string value to an integer. This function is used in OpenBSD compat namespace and is located inside NetBSD's libc. Add a dedicated test for this interface. It's a reworked version of the original code by Yang Zheng. Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: tomsun.0.7, kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1035. [ModuleSummary] use StringRefs to avoid a redundant copy; NFC `Saver` is a StringSaver, which has a few overloads of `save` that all ultimately just call `StringRef save(StringRef)`. Just take a StringRef here instead of building up a std::string to convert it to a StringRef.
  1036. Fix unused variable warning. NFCI.
  1037. [WebAssembly] clang-format/clang-tidy AsmParser (NFC) Summary: - LLVM clang-format style doesn't allow one-line ifs. - LLVM clang-tidy style says method names should start with a lowercase letter. But currently WebAssemblyAsmParser's parent class MCTargetAsmParser is mixing lowercase and uppercase method names itself so overridden methods cannot be renamed now. - Changed else ifs after returns to ifs. - Added some newlines for readability. Reviewers: aardappel, sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision:
  1038. Delete registerScope function `unregisterScope()` is not currently used, so removing it.
  1039. Follow-up from r348441 to add the rest of the objc ARC intrinsics. This adds the other intrinsics used by ARC and codegen's them to their respective runtime methods.
  1040. [MemCpyOpt] memset->memcpy forwarding with undef tail Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision:
  1041. [MemCpyOpt] Add tests for memset->memcpy forwaring with undef tail; NFC These are baseline tests for D55120.
  1042. AMDGPU: Use gfx9 instead of gfx8 in a test They are the same for the purposes of the tests, but it's much easier to write check lines for the memory instructions with offsets.
  1043. [Preprocessor] Don't avoid entering included files after hitting a fatal error. Change in r337953 violated the contract for `CXTranslationUnit_KeepGoing`: > Do not stop processing when fatal errors are encountered. Use different approach to fix long processing times with multiple inclusion cycles. Instead of stopping preprocessing for fatal errors, do this after reaching the max allowed include depth and only for the files that were processed already. It is likely but not guaranteed those files cause a cycle. rdar://problem/46108547 Reviewers: erik.pilkington, arphaman Reviewed By: erik.pilkington Subscribers: jkorous, dexonsmith, ilya-biryukov, Dmitry.Kozhevnikov Differential Revision:
  1044. [HotColdSplitting] Refine definition of unlikelyExecuted The splitting pass uses its 'unlikelyExecuted' predicate to statically decide which blocks are cold. - Do not treat noreturn calls as if they are cold unless they are actually marked cold. This is motivated by functions like exit() and longjmp(), which are not beneficial to outline. - Do not treat inline asm as an outlining barrier. In practice asm("") is frequently used to inhibit basic block merging; enabling outlining in this case results in substantial memory savings. - Treat invokes of cold functions as cold. As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's no longer needed. The pass can identify & outline blocks dominated by EH pads, so there's no need to special-case __cxa_begin_catch etc. Differential Revision:
  1045. [HotColdSplitting] Outline more than once per function Algorithm: Identify maximal cold regions and put them in a worklist. If a candidate region overlaps with another, discard it. While the worklist is full, remove a single-entry sub-region from the worklist and attempt to outline it. By the non-overlap property, this should not invalidate parts of the domtree pertaining to other outlining regions. Testing: LNT results on X86 are clean. With test-suite + externals, llvm outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file 483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm outlines over 100 times in some functions (mostly EH paths). There was not a significant performance impact pre vs. post-patch. Differential Revision:
  1046. [analyzer] Move out tracking retain count for OSObjects into a separate checker Allow enabling and disabling tracking of ObjC/CF objects separately from tracking of OS objects. Differential Revision:
  1047. [analyzer] RetainCountChecker: remove untested, unused, incorrect option IncludeAllocationLine The option has no tests, is not used anywhere, and is actually incorrect: it prints the line number without the reference to a file, which can be outright incorrect. Differential Revision:
  1048. Missing freebsd files. A lib/sanitizer_common/ A lib/sanitizer_common/sanitizer_platform_limits_freebsd.h
  1049. [Sanitizer] Separate FreeBSD interception data structures Reviewers: vitalybuka, krytarowski Reviewed By: krytarowski Differential Revision:
  1050. [clang-tidy]: Abseil: new check 'abseil-upgrade-duration-conversions' Patch by Alex Strelnikov. Reviewed as D53830 Introduce a new check to upgrade user code based on upcoming API breaking changes to absl::Duration. The check finds calls to arithmetic operators and factory functions for absl::Duration that rely on an implicit user defined conversion to int64_t. These cases will no longer compile after proposed changes are released. Suggested fixes explicitly cast the argument int64_t.
  1051. Update the Swift version numbers reported by objdump Summary: Add Swift 4.1, Swift 4.2, and Swift 5 version numbers to objdump's MachODump's print_imae_info routines. rdar://46548425 Reviewers: pete, lhames, bob.wilson Reviewed By: pete, bob.wilson Subscribers: bob.wilson, llvm-commits Differential Revision:
  1052. [NativePDB] Reconstruct function declarations from debug info. Previously we would create an lldb::Function object for each function parsed, but we would not add these to the clang AST. This is a first step towards getting local variable support working, as we first need an AST decl so that when we create local variable entries, they have the proper DeclContext. Differential Revision:
  1053. [llvm-tapi] Don't try to override SequenceTraits for std::string For some reason this doesn't seem to work with LLVM_LINK_LLVM_DYLIB build. See What is more it seems that overriding these traits for core types (including std::string) is not supported/recommend by YAMLTraits.h. See line 1918 which has the assertion: "only use LLVM_YAML_IS_SEQUENCE_VECTOR for types you control" Differential Revision:
  1054. [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC This duplicates several shared checks, but we need to split this up to fix underlying bugs in smaller steps.
  1055. [X86] Replace instregex with instrs list. NFCI.
  1056. AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load
  1057. [llvm-mca][x86] Add RDSEED instruction resource tests for GLM
  1058. [llvm-mca][x86] Add missing AES instruction resource tests Add missing non-VEX instructions
  1059. [llvm-mca][x86] Add RDRAND/RDSEED instruction resource tests
  1060. [CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width. So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops. There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles. Differential Revision:
  1061. [X86] Initialize and Register X86CondBrFoldingPass To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm Patch by Jianping Chen Differential Revision:
  1062. AMDGPU: Remove
  1063. Make testcase more robust for bots actually building in /var
  1064. [X86] Improve pfm counter coverage for llvm-exegesis This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports. Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code). The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings. These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs. Differential Revision:
  1065. AMDGPU: Remove llvm.SI.buffer.load.dword
  1066. AMDGPU: Remove llvm.AMDGPU.kill This is the last of the old AMDGPU intrinsics.
  1067. [CTU] test/Analysis/ctu-main.cpp Attempt to fix failing windows bot
  1068. Adding an AST dump test for statement expressions; NFC.
  1069. Make testcase more robust for completely-out-of-tree builds. Thats to Dave Zarzycki for reprorting this!
  1070. [libcxx] Add paranoid cast-to-void in comma operator
  1071. [CTU] Add triple/lang mismatch handling Summary: We introduce a strict policy for C++ CTU. It can work across TUs only if the C++ dialects are the same. We neither allow C vs C++ CTU. We do this because the same constructs might be represented with different properties in the corresponding AST nodes or even the nodes might be completely different (a struct will be RecordDecl in C, but it will be a CXXRectordDecl in C++, thus it may cause certain assertions during cast operations). Reviewers: xazax.hun, a_sidorin Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Differential Revision:
  1072. [CTU] test/Analysis/ctu-main.cpp Attempt to fix failing windows bot
  1073. [CTU] Add more lit tests and better error handling Summary: Adding some more CTU list tests. E.g. to check if a construct is unsupported. We also slightly modify the handling of the return value of the `Import` function from ASTImporter. Reviewers: xazax.hun, balazske, a_sidorin Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Differential Revision:
  1074. [DAGCombiner] disable truncation of binops by default As discussed in the post-commit thread of r347917, this transform is fighting with an existing transform causing an infinite loop or out-of-memory, so this is effectively reverting r347917 and its follow-up r348195 while we investigate the bug.
  1075. [unittests] Add C++17 and C++2a support to the tooling tests
  1076. Reapply "[DemandedBits][BDCE] Support vectors of integers" DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision:
  1077. [AMDGPU] Shrink scalar AND, OR, XOR instructions This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable. It performs: AND s0, s0, ~(1 << n) -> BITSET0 s0, n OR s0, s0, (1 << n) -> BITSET1 s0, n AND s0, s1, x -> ANDN2 s0, s1, ~x OR s0, s1, x -> ORN2 s0, s1, ~x XOR s0, s1, x -> XNOR s0, s1, ~x In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).
  1078. Make CPUDispatch resolver emit dependent functions. Inline cpu_specific versions referenced before the cpu_dispatch function weren't properly emitted, since they hadn't been referred to. This patch ensures that during resolver generation that all appropriate versions are emitted. Change-Id: I94c3766aaf9c75ca07a0ad8258efdbb834654ff8
  1079. Add an explicit triple to this test to prevent failures due to size_t differences.
  1080. Fix spelling of WINDOWS in a test Change-Id: I232515655359f14308e1c5509c4b7db96d1fafcb
  1081. [DAGCombiner] remove explicit calls to AddToWorkList; NFCI As noted in the post-commit thread for rL347917: ...we don't need to repeat these calls because the combiner does it automatically.
  1082. Adding tests for -ast-dump; NFC. This adds tests for various statements in C++ that are not covered by C.
  1083. Revert "Multiversioning- Ensure all MV functions are emitted." This reverts commit 65df29f9318ac13a633c0ce13b2b0bccf06e79ca. AS suggested by @rsmith here: I'm reverting this and solving the initial problem in a different way.
  1084. [CTU] Add DisplayCTUProgress analyzer switch Summary: With a new switch we may be able to print to stderr if a new TU is being loaded during CTU. This is very important for higher level scripts (like CodeChecker) to be able to parse this output so they can create e.g. a zip file in case of a Clang crash which contains all the related TU files. Reviewers: xazax.hun, Szelethus, a_sidorin, george.karpenkov Subscribers: whisperity, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalho, donat.nagy, dkrupp, Differential Revision:
  1085. Introduce llvm.experimental.widenable_condition intrinsic This patch introduces a new instinsic `@llvm.experimental.widenable_condition` that allows explicit representation for guards. It is an alternative to using `@llvm.experimental.guard` intrinsic that does not contain implicit control flow. We keep finding places where `@llvm.experimental.guard` is not supported or treated too conservatively, and there are 2 reasons to that: - `@llvm.experimental.guard` has memory write side effect to model implicit control flow, and this sometimes confuses passes and analyzes that work with memory; - Not all passes and analysis are aware of the semantics of guards. These passes treat them as regular throwing call and have no idea that the condition of guard may be used to prove something. One well-known place which had caused us troubles in the past is explicit loop iteration count calculation in SCEV. Another example is new loop unswitching which is not aware of guards. Whenever a new pass appears, we potentially have this problem there. Rather than go and fix all these places (and commit to keep track of them and add support in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible. The only significant difference between guards and regular explicit branches is that guard's condition can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor, and it is always legal to go there no matter what the guard condition is. The other successor is a guarded block, and it is only legal to go there if the condition is true. This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard` intrinsic. Now a widenable guard can be represented in the CFG explicitly like this: %widenable_condition = call i1 @llvm.experimental.widenable.condition() %new_condition = and i1 %cond, %widenable_condition br i1 %new_condition, label %guarded, label %deopt guarded: ; Guarded instructions deopt: call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ] The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an `undef`, but the intrinsic prevents the optimizer from folding it early. This form should exploit all optimization boons provided to `br` instuction, and it still can be widened by replacing the result of `@llvm.experimental.widenable.condition()` with `and` with any arbitrary boolean value (as long as the branch that is taken when it is `false` has a deopt and has no side-effects). For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using implicit control flow in guards". This patch introduces this new intrinsic with respective LangRef changes and a pass that converts old-style guards (expressed as intrinsics) into the new form. The naming discussion is still ungoing. Merging this to unblock further items. We can later change the name of this intrinsic. Reviewed By: reames, fedor.sergeev, sanjoy Differential Revision:
  1086. ARM: use correct offset from base pointer (r6) in call frame regions. When we had dynamic call frames (i.e. sp adjustment around each call) we were including that adjustment into offsets calculated based on r6, even though it's only sp that changes. This led to incorrect stack slot accesses.
  1087. [CodeComplete] Fix assertion failure Summary: ...that fires when running completion inside an argument of UnresolvedMemberExpr (see the added test). The assertion that fires is from Sema::TryObjectArgumentInitialization: assert(FromClassification.isLValue()); This happens because Sema::AddFunctionCandidates does not account for object types which are pointers. It ends up classifying them incorrectly. All usages of the function outside code completion are used to run overload resolution for operators. In those cases the object type being passed is always a non-pointer type, so it's not surprising the function did not expect a pointer in the object argument. However, code completion reuses the same function and calls it with the object argument coming from UnresolvedMemberExpr, which can be a pointer if the member expr is an arrow ('->') access. Extending AddFunctionCandidates to allow pointer object types does not seem too crazy since all the functions down the call chain can properly handle pointer object types if we properly classify the object argument as an l-value, i.e. the classification of the implicitly dereferenced pointer. Reviewers: kadircet Reviewed By: kadircet Subscribers: cfe-commits Differential Revision:
  1088. [unittests] Merge the PrintedStmtCXX..Matches functions (NFC) This was reviewed as part of
  1089. Adding tests for -ast-dump; NFC. This adds tests for various statements in C.
  1090. [CTU] Eliminate race condition in CTU lit tests Summary: We plan to introduce additional CTU related lit test. Since lit may run the tests in parallel, it is not safe to use the same directory (%T) for these tests. It is safe to use however test case specific directories (%t). Reviewers: xazax.hun, a_sidorin Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Differential Revision:
  1091. [CTU] Add asserts to protect invariants Reviewers: xazax.hun, a_sidorin Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Differential Revision:
  1092. [Targets] Add errors for tiny and kernel codemodel on targets that don't support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision:
  1093. [CTU] Add statistics Reviewers: xazax.hun, a_sidorin Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Differential Revision:
  1094. [clang-tidy] Remove duplicated getText implementation, NFC
  1095. Add a AArch64 triple to tiny codemodel test. Most other targets do not support the tiny code model.
  1096. Fix gcc7.3 -Wparentheses warning. NFCI.
  1097. [yaml2obj] format some codes NFC. Summary: This line is longer than 80 characters. Subscribers: llvm-commits, jakehehrlich Differential Revision:
  1098. [yaml2obj] revert bad change
  1099. [yaml2obj] format some codes NFC. Summary: This line is longer than 80 characters. Subscribers: llvm-commits, jakehehrlich Differential Revision:
  1100. Fix test/tools/llvm-mca/AArch64/Exynos/direct-branch.s on Mac It was failing as below. Adding a triple seems to help. -- : 'RUN: at line 2';   /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m1 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M1 : 'RUN: at line 3';   /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M3 -- Exit Code: 1 Command Output (stderr): -- /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s:36:12: error: M1-NEXT: expected string not found in input            ^ <stdin>:21:2: note: scanning from here  1 0 0.25 b Ltmp0  ^ --
  1101. [utils] Use operator "in" instead of bound function "has_key" has_key has been removed in Python 3. The in comparison operator can be used instead. Differential Revision:
  1102. [X86] Add ivybridge to llvm-exegesis PFM counter mappings
  1103. [SelectionDAG] Don't pass on DemandedElts when handling SCALAR_TO_VECTOR Fixes an assertion: llc: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2200: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `(!Op.getValueType().isVector() || NumElts == Op.getValueType().getVectorNumElements()) && "Unexpected vector size"' failed. Committed on behalf of: @pendingchaos (Rhys Perry) Differential Revision:
  1104. [CMake] Add support for NO_INSTALL_RPATH argument in llvm_add_library() Summary: Allow clients to suppress setup of default RPATHs in designated library targets. This is used in LLDB when emitting liblldb as a framework bundle, which itself doesn't load further RPATH-dependent libraries. This follows the approach in add_llvm_executable(). Reviewers: aprantl, JDevlieghere, davide, friss Reviewed By: aprantl Subscribers: mgorny, lldb-commits, llvm-commits, #lldb Differential Revision:
  1105. [PowerPC] VSX register support for inline assembly Summary: The patch is to add the VSX register support for inline assembly. After this patch, we can use VSX register in inline assembly clobber list without error. Reviewed By: jsji, nemanjai Differential Revision:
  1106. [IR] Don't assume all functions are 4 byte aligned In some cases different alignments for function might be used to save space e.g. thumb mode with -Oz will try to use 2 byte function alignment. Similar patch that fixed this in other areas exists here This was approved previously (r348215) but when committed it caused failures on the sanitizer buildbots when building llvm with clang (containing this patch). This is now fixed because I've added a check to see if getting the parent module returns null if it does then set the alignment to 0. Differential Revision:
  1107. [PM] Port LoadStoreVectorizer to the new pass manager. Differential Revision:
  1108. Fix thunks returning memptrs via sret by emitting also scalar return values directly in sret slot (PR39901) Thunks that return member pointers via sret are broken due to using temporary storage for the return value on the stack and then passing that pointer to a tail call, violating the rule that a tail call can't access allocas in the caller (see bug). Since r90526, we put aggregate return values directly in the sret slot, but this doesn't apply to member pointers which are considered scalar. Unless I'm missing something subtle, we should be able to always use the sret slot directly for indirect return values. Differential revision:
  1109. [XRay] Use preallocated memory for XRay profiling Summary: This change builds upon D54989, which removes memory allocation from the critical path of the profiling implementation. This also changes the API for the profile collection service, to take ownership of the memory and associated data structures per-thread. The consolidation of the memory allocation allows us to do two things: - Limits the amount of memory used by the profiling implementation, associating preallocated buffers instead of allocating memory on-demand. - Consolidate the memory initialisation and cleanup by relying on the buffer queue's reference counting implementation. We find a number of places which also display some problematic behaviour, including: - Off-by-factor bug in the allocator implementation. - Unrolling semantics in cases of "memory exhausted" situations, when managing the state of the function call trie. We also add a few test cases which verify our understanding of the behaviour of the system, with important edge-cases (especially for memory-exhausted cases) in the segmented array and profile collector unit tests. Depends on D54989. Reviewers: mboerger Subscribers: dschuff, mgorny, dmgreen, jfb, llvm-commits Differential Revision:
  1110. [LoopSimplifyCFG] Do not deal with loops with irreducible CFG inside The current algorithm that collects live/dead/inloop blocks relies on some invariants related to RPO and PO traversals. In particular, the important fact it requires is that the only loop's latch is the first block in PO traversal. It also relies on fact that during RPO we visit all prececessors of a block before we visit this block (backedges ignored). If a loop has irreducible non-loop cycle inside, both these assumptions may break. This patch adds detection for this situation and prohibits the terminator folding for loops with irreducible CFG. We can in theory support this later, for this some algorithmic changes are needed. Besides, irreducible CFG is not a frequent situation and we can just don't bother. Thanks @uabelho for finding this! Differential Revision: Reviewed By: skatkov
  1111. [PowerPC] Fix assert from machine verify pass that missing undef register flag Fix assert about using an undefined physical register in machine instruction verify pass. The reason is that register flag undef is missing when doing transformation from If Conversion Pass. ``` Bad machine code: Using an undefined physical register - function: func_65 - basic block: %bb.0 entry (0x10024740738) - instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3 - operand 0: killed $cr5lt LLVM ERROR: Found 1 machine code errors. ``` There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying. Differential Revision:
  1112. [llvm-mca] Improve test (NFC) Add more instructions to the test for Cortex.
  1113. [llvm-mca] Improve test (NFC) Add a label to make explicit that the branch is short for Exynos.
  1114. Re-land "[XRay] Move-only Allocator, FunctionCallTrie, and Array" This reverts commit r348455, with some additional changes: - Work-around deficiency of gcc-4.8 by duplicating the implementation of `AppendEmplace` in `Append`, but instead of using brace-init for the copy construction, use a placement new explicitly calling the copy constructor.
  1115. [CodeExtractor] Store outputs at the first valid insertion point When CodeExtractor outlines values which are used by the original function, it must store those values in some in-out parameter. This store instruction must not be inserted in between a PHI and an EH pad instruction, as that results in invalid IR. This fixes the following verifier failure seen while outlining within ObjC methods with live exit values: The unwind destination does not have an exception handling instruction! %call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1) to label %invoke.cont34 unwind label %lpad33, !dbg !4183 The unwind destination does not have an exception handling instruction! invoke void @objc_exception_throw(i8* %call35) #12 to label %invoke.cont36 unwind label %lpad33, !dbg !4184 LandingPadInst not the first non-PHI instruction in the block. %3 = landingpad { i8*, i32 } catch i8* null, !dbg !1411 rdar://46540815
  1116. Revert "[llvm-tapi] Don't override SequenceTraits for std::string" Revert r348551 since it triggered some warnings that don't appear to have a quick fix.
  1117. Revert "[DemandedBits][BDCE] Support vectors of integers" This reverts commit r348549. Causing assertion failures during clang build.
  1118. Add test for InitListExpr
  1119. [DAGCombiner] use root SDLoc for all nodes created by logic fold If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement.
  1120. [llvm-tapi] Don't override SequenceTraits for std::string Change the ELF YAML implementation of TextAPI so NeededLibs uses flow sequence vector correctly instead of overriding the YAML implementation for std::vector<std::string>>. This should fix the test failure with the LLVM_LINK_LLVM_DYLIB build mentioned in D55381. Still passes existing tests that cover this. Differential Revision:
  1121. [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold.
  1122. [DemandedBits][BDCE] Support vectors of integers DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. The getDemandedBits() method can now only be called on instructions that have integer or vector of integer type. Previously it could be called on any sized instruction (even if it was not particularly useful). The size of the return value is now always the scalar size in bits (while previously it was the type size in bits). Differential Revision:
  1123. [BDCE] Add tests for BDCE applied to vector instructions; NFC These are baseline tests for D55297.
  1124. [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC This code can still misbehave.
  1125. NFC: Move VisitExpr code to dumpStmt Summary: The call is duplicated in the handlers of all Expr subclasses. This change makes it easy to split statement handling out to TextNodeDumper. Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  1126. NFC: Move VisitStmt code to dumpStmt Summary: This call is duplicated in Visits of all direct subclasses of Stmt. Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  1127. Add more expected content to match in test
  1128. Use relative line offsets in test
  1129. [frontend][darwin] warn_stdlibcxx_not_found: supress warning for preprocessed input Addresses second post-commit feedback for r335081 from Nico
  1130. Run `git ls-files '*.gn' '*.gni' | xargs -n 1 gn format`.
  1131. [gn build] merge r348505.
  1132. [X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and SETCC_CARRY, 1) This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused. I'm trying to decide if we need SETCC_CARRY. This removes one of its usages. Differential Revision:
  1133. [DAGCombiner] don't group bswap with casts in logic hoisting fold This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518
  1134. [analyzer] Rely on os_consumes_this attribute to signify that the method call consumes a reference for "this" Differential Revision:
  1135. [attributes] Add an attribute os_consumes_this, with similar semantics to ns_consumes_self The attribute specifies that the call of the C++ method consumes a reference to "this". Differential Revision:
  1136. [analyzer] Fix an infinite recursion bug while checking parent methods in RetainCountChecker Differential Revision:
  1137. [x86] add test for vector bitwise-logic-of-bswaps; NFC
  1138. [libc++] Improve diagnostics for non-const comparators and hashers in associative containers Summary: When providing a non-const-callable comparator in a map or set, the warning diagnostic does not include the point of instantiation of the container that triggered the warning, which makes it difficult to track down the problem. This commit improves the diagnostic by placing it directly in the body of the associative container. The same change is applied to unordered associative containers, which had a similar problem. Finally, this commit cleans up the forward declarations of several map and unordered_map helpers, which are not needed anymore. <rdar://problem/41370747> Reviewers: EricWF, mclow.lists Subscribers: christof, dexonsmith, llvm-commits Differential Revision:
  1139. [libcxx] Always convert 'use_system_cxx_lib' to an absolute path Otherwise, some tests would fail when a relative path was passed, because they'd use the relative path from a different directory than the current working directory.
  1140. [test] Add missing cmake include for building libFuzzer alone Include CompilerRTCompile in fuzzer tests explicitly. Otherwise, when building only libFuzzer, CMake fails due to: CMake Error at cmake/Modules/AddCompilerRT.cmake:395 (sanitizer_test_compile): Unknown CMake command "sanitizer_test_compile". Call Stack (most recent call first): lib/fuzzer/tests/CMakeLists.txt:53 (generate_compiler_rt_tests) Differential Revision:
  1141. [DAGCombiner] reduce indent; NFC Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place.
  1142. [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef. This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes: concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A) This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case posted by Craig; that particular case would probably require a more complicated approach (and knowledge about used bits). Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx): vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vxorps %xmm1, %xmm1, %xmm1 vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3] vmovaps %ymm0, (%rsi) vzeroupper retq Now we generate this: vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vmovaps %ymm0, (%rsi) vzeroupper retq As a side note: that VZEROUPPER is completely redundant... I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on %-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion %pass is disabled. Differential Revision:
  1143. [libcxx] Fix incorrect XFAILs for chrono tests on old macos deployment targets The tests were marked to fail based on the 'availability' LIT feature. However, those tests should really only be failing when we run them against the dylibs that were deployed on macosx10.7 and macosx10.8, which the deployment target has nothing to do with. This caused the tests to unexpectedly pass when running the tests with deployment target macosx10.{7,8} but running with a recent dylib.
  1144. [DAGCombiner] don't hoist logic op if operands have other uses, part 2 The PPC test with 2 extra uses seems clearly better by avoiding this transform. With 1 extra use, we also prevent an extra register move (although that might be an RA problem). The general rule should be to only make a change here if it is always profitable. The x86 diffs are all neutral.
  1145. Fix Wdocumentation warning. NFCI.
  1146. [PowerPC] add tests for hoisting bitwise logic; NFC
  1147. Allow forwarding -fdebug-compilation-dir to cc1as The flag -fdebug-compilation-dir is useful to make generated .o files independent of the path of the build directory, without making the compile command-line dependent on the path of the build directory, like -fdebug-prefix-map requires. This change makes it so that the driver can forward the flag to -cc1as, like it already can for -cc1. We might want to consider making -fdebug-compilation-dir a driver flag in a follow-up. (Since -fdebug-compilation-dir defaults to PWD, it's already possible to get this effect by setting PWD, but explicit compiler flags are better than env vars, because e.g. ninja tracks command lines and reruns commands that change.) Somewhat related to PR14625. Differential Revision:
  1148. Reapply "Avoid emitting redundant or unusable directories in DIFile metadata entries."" This reverts commit r348280 and reapplies D55085 without modifications. Original commit message: Avoid emitting redundant or unusable directories in DIFile metadata entries. As discussed on llvm-dev recently, Clang currently emits redundant directories in DIFile entries, such as .file 1 "/Volumes/Data/llvm" "/Volumes/Data/llvm/tools/clang/test/CodeGen/debug-info-abspath.c" This patch looks at any common prefix between the compilation directory and the (absolute) file path and strips the redundant part. More importantly it leaves the compilation directory empty if the two paths have no common prefix. After this patch the above entry is (assuming a compilation dir of "/Volumes/Data/llvm/_build"): .file 1 "/Volumes/Data/llvm" "tools/clang/test/CodeGen/debug-info-abspath.c" When building the FileCheck binary with debug info, this patch makes the build artifacts ~1kb smaller. Differential Revision:
  1149. Reapply "Adapt gcov to changes in CFE." This reverts commit r348203 and reapplies D55085 with an additional GCOV bugfix to make the change NFC for relative file paths in .gcno files. Thanks to Ilya Biryukov for additional testing! Original commit message: Update Diagnostic handling for changes in CFE. The clang frontend no longer emits the current working directory for DIFiles containing an absolute path in the filename: and will move the common prefix between current working directory and the file into the directory: component.
  1150. [AArch64] Fix Exynos predicate Fix predicate for arithmetic instructions with shift and/or extend.
  1151. [libcxx] Add checks for unique value of array<T, 0>.begin() and array<T, 0>.end() The standard section [] requires the return value of begin() and end() methods of a zero-sized array to be unique. Eric Fiselier clarifies: "That unique value cannot be null, and must be properly aligned". This patch adds checks for the first part of this clarification: unique value returned by these methods cannot be null. Reviewed as Thanks to Andrey Maksimov for the patch.
  1152. [DAGCombiner] don't hoist logic op if operands have other uses The AVX512 diffs are neutral, but the bswap test shows a clear overreach in hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can increase the instruction count. This could also fight with transforms trying to go in the opposite direction and possibly blow up/infinite loop. This might be enough to solve the bug noted here: I did not add the hasOneUse() checks to all opcodes because I see a perf regression for at least one opcode. We may decide that's irrelevant in the face of potential compiler crashing, but I'll see if I can salvage that first.
  1153. [libcxx] Add XFAILs for aligned allocation tests on AppleClang 9 Some people are still running the test suite using AppleClang 9.
  1154. [x86] add test for hoistLogicOpWithSameOpcodeHands with extra uses; NFC
  1155. [PDB] Move some code around. NFC.
  1156. [CUDA] Fix nvidia-cuda-toolkit detection on Ubuntu This just extends D40453 (r319317) to Ubuntu. Reviewed By: Hahnfeld, tra Differential Revision:
  1157. [gn build] Process files in llvm/Config and add lib/Target/ Tweak to also handle variable references looking @FOO@ (matching CMake's configure_file() function), and make it replace '\' 'n' in values with a newline literal since there's no good portable way of passing a real newline literal on a command line. Use that to process all the files in llvm/include/Config and add llvm/lib/Target/, which (indirectly, through llvm-c/Target.h) includes them. Differential Revision:
  1158. [DAGCombiner] refactor function that hoists bitwise logic; NFCI Added FIXME and TODO comments for lack of safety checks. This function is a suspect in out-of-memory errors as discussed in the follow-up thread to r347917:
  1159. [Sanitizer] getmntinfo support in FreeBSD Reviewers: krytarowski Reviewed By: krytarowski Differential Revision:
  1160. Support skewed stream arrays. VarStreamArray was built on the assumption that it is backed by a StreamRef, and offset 0 of that StreamRef is the first byte of the first record in the array. This is a logical and intuitive assumption, but unfortunately we have use cases where it doesn't hold. Specifically, a PDB module's symbol stream is prefixed by 4 bytes containing a magic value, and the first byte of record data in the array is actually at offset 4 of this byte sequence. Previously, we would just truncate the first 4 bytes and then construct the VarStreamArray with the resulting StreamRef, so that offset 0 of the underlying stream did correspond to the first byte of the first record, but this is problematic, because symbol records reference other symbol records by the absolute offset including that initial magic 4 bytes. So if another record wants to refer to the first record in the array, it would say "the record at offset 4". This led to extremely confusing hacks and semantics in loading code, and after spending 30 minutes trying to get some math right and failing, I decided to fix this in the underlying implementation of VarStreamArray. Now, we can say that a stream is skewed by a particular amount. This way, when we access a record by absolute offset, we can use the same values that the records themselves contain, instead of having to do fixups. Differential Revision:
  1161. [X86] Refactored IsSplatVector to use switch. NFCI. Initial step towards making the function more generic (and probably move into SelectionDAG). This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate).
  1162. [DEBUGINFO, NVPTX] Disable emission of ',debug' option if only debug directives are allowed. Summary: If the output of debug directives only is requested, we should drop emission of ',debug' option from the target directive. Required for supporting of nvprof profiler. Reviewers: echristo Subscribers: llvm-commits Differential Revision:
  1163. [GVN] Don't perform scalar PRE on GEPs Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from sinking the addressing mode computation of memory instructions back to its uses. The problem comes from the insertion of PHIs, which confuse CGP and make it bail. I've autogenerated the check lines of an existing test and added a store instruction to demonstrate the motivation behind this change. The store is now using the gep instead of a phi. Differential Revision:
  1164. [DEBUGINFO, NVPTX]Emit last debugging directives. Summary: We may end up with not emitted debug directives at the end of the module emission. Patch fixes this problem emitting those last directives the end of the module emission. Reviewers: echristo Subscribers: jholewinski, llvm-commits Differential Revision:
  1165. DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated VT.getVectorNumElements(). NFCI.
  1166. [NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: -- It was reverted in rL348249 due a build bot failure in one of the regression tests: The problem seems to be that FileCheck behaves different in windows and linux. This new patch splits the test file in multiple, and does more exact pattern matching attempting to circumvent the issue.
  1167. [OPENMP][NVPTX] Fix globalization of the mapped array sections. If the array section is based on pointer and this sections is mapped in target region + then it is used in the inner parallel region, it also must be globalized as the pointer itself is passed by value, not by reference.
  1168. [clangd] Remove the test that sometimes deadlocks Will figure out how to properly rewrite it and recommit.
  1169. [ARM][NFC] Adding another test for armcgp
  1170. AMDGPU: Generate VALU ThreeOp Integer instructions Summary: Original patch by: Fabian Wahlster <> Change-Id: I148f692a88432541fad468963f58da9ddf79fac5 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, b-sumner, llvm-commits Differential Revision:
  1171. [AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default Turn the combiner back off as there're failures until the issue is fixed. Differential revision:
  1172. Fix -Wcovered-switch-default warning. NFCI.
  1173. [libcxx] Make return value of array<T, 0>.data() checked only for libc++ The section says: "The return value of data() is unspecified". This patch marks all checks of the array<T, 0>.data() return value as libc++ specific. Reviewed as Thanks to Andrey Maksimov for the patch.
  1174. Revert "[LoopSimplifyCFG] Delete dead in-loop blocks" This reverts commit r348457. The original commit causes clang to crash when doing an instrumented build with a new pass manager. Reverting to unbreak our integrate.
  1175. Test commit: Removed trailing space in .txt file.
  1176. [ARM][NFC] Added extra arm-cgp test
  1177. Add new `__sanitizer_mz_default_zone()` API which returns the address of the ASan malloc zone. This API will be used for testing in future patches. Summary: The name of the function is based on `malloc_default_zone()` found in Darwin's `malloc/malloc.h` header file. Reviewers: kubamracek, george.karpenkov Subscribers: #sanitizers, llvm-commits Differential Revision:
  1178. [clangd] Update the test code I forgot to update it in the last round of code review.
  1179. [X86][NFC] Convert memcpy/memset tests to update_llc_test_checks.
  1180. [clangd] C++ API for emitting file status. Introduce clangd C++ API to emit the current status of file.
  1181. Diagnose friend function template redefinitions. Friend function template defined in a class template becomes available if the enclosing class template is instantiated. Until the function template is used, it does not have a body, but still is considered a definition for the purpose of redeclaration checks. This change modifies redefinition check so that it can find the friend function template definitions in instantiated classes. Differential Revision:
  1182. [ARM GlobalISel] Nothing is legal for Thumb ...yet! A lot of the current code should be shared for arm and thumb mode, but until we add tests and work out some of the details (e.g. checking the correct subtarget feature for G_SDIV) it's safer to bail out as early as possible for thumb targets. This should have arguably been part of r348347, which allowed Thumb functions to be handled by the IR Translator.
  1183. Add test for ObjC generics
  1184. Extend OMP test
  1185. Make test resistant to line numbers changing
  1186. [clangd] Fix a typo in TUSchedulerTests Reviewers: ilya-biryukov Reviewed By: ilya-biryukov Subscribers: javed.absar, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1187. [X86][NFC] Add more tests for memset.
  1188. [llvm-dwarfdump] - Simplify the test case. The test was fully rewritten for simplification. New test code was suggested by David Blaikie. Differential revision:
  1189. [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts I was finally able to quantify what i thought was missing in the fix, it was vector constants. If we have a scalar (and %x, -1), it will be instsimplified before we reach this code, but if it is a vector, we may still have a -1 element. Thus, we want to avoid the fold if *at least one* element is -1. Or in other words, ignoring the undef elements, no sign bits should be set. Thus, m_NonNegative(). A follow-up for rL348181
  1190. [NFC][InstCombine] Add more miscompile tests for foldICmpWithLowBitMaskedVal() We also have to me aware of vector constants. If at least one element is -1, we can't transform.
  1191. [X86] Remove some leftover code for handling an i1 setcc type. NFC We should only need to handle i8 now.
  1192. Remove unnecessary include.
  1193. Remove CodeGen dependencies on Sema. Move diagnostics from Sema to Frontend (or Common) so that CodeGen no longer needs to include the Sema diagnostic IDs.
  1194. [LoopSimplifyCFG] Delete dead in-loop blocks This patch teaches LoopSimplifyCFG to delete loop blocks that have become unreachable after terminator folding has been done. Differential Revision: Reviewed By: anna
  1195. InstCombine: Add some missing tests for scalarization
  1196. Revert "[XRay] Move-only Allocator, FunctionCallTrie, and Array" This reverts commits r348438, r348445, and r348449 due to breakages with gcc-4.8 builds.
  1197. ARM, AArch64: support `__attribute__((__swiftcall__))` Support the Swift calling convention on Windows ARM and AArch64. Both of these conform to the AAPCS, AAPCS64 calling convention, and LLVM has been adjusted to account for the register usage. Ensure that the frontend passes this into the backend. This allows the swift runtime to be built for Windows.
  1198. [XRay] Use a local lvalue as arg to AppendEmplace(...) This is a follow-up to D54989. Further work-around gcc-4.8 failing to handle brace-init with temporaries.
  1199. [darwin] remove version number check when enabling -fobjc-subscripting-legacy-runtime This subscripting feature actually works on older OS versions anyway. rdar://36287065
  1200. Reapply fix from r348062 to fix test on Windows.
  1201. [llvm-objcopy] Change --only-keep to --only-section I just hard core goofed when I wrote this and created a different name for no good reason. I'm failry aware of most "fresh" users of llvm-objcopy (that is, users which are not using it as a drop in replacement for GNU objcopy) and can say that only "-j" is being used by such people so this patch should strictly increase compatibility and not remove it. Differential Revision:
  1202. [XRay] Use default-constructed struct as argument to Append(...) This is a follow-up to D54989. Work-around gcc-4.8 failing to handle brace-init for structs to imply default-construction of an aggregate, and treats it as an initialiser list instead.
  1203. AArch64: Fix invalid CCMP emission The code emitting AND-subtrees used to check whether any of the operands was an OR in order to figure out if the result needs to be negated. However the OR could be hidden in further subtrees and not immediately visible. Change the code so that canEmitConjunction() determines whether the result of the generated subtree needs to be negated. Cleanup emission logic to use this. I also changed the code a bit to make all negation decisions early before we actually emit the subtrees. This fixes Differential Revision:
  1204. [attributes] Add more tests for os_returns_retained
  1205. [Sema/Attribute] Check for noderef attribute This patch adds the noderef attribute in clang and checks for dereferences of types that have this attribute. This attribute is currently used by sparse and would like to be ported to clang. Differential Revision:
  1206. Add objc.* ARC intrinsics and codegen them to their runtime methods. Reviewers: erik.pilkington, ahatanak Differential Revision:
  1207. [MachineOutliner][NFC] Move yet another std::vector out of a loop Once again, following the wisdom of the LLVM Programmer's Manual. I think that's enough refactoring for today. :)
  1208. Re-land r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array" Continuation of D54989. Additional changes: - Use `.AppendEmplace(...)` instead of `.Append(Type{...})` to appease GCC 4.8 with confusion on when an initializer_list is used as opposed to a temporary aggregate initialized object.
  1209. [libcxx] Mark some tests as failing on macosx 10.14
  1210. [libcxx] Don't depend on availability markup to provide the streams in the dylib Whether an explicit instantiation declaration should be provided is not a matter of availability markup. This problem is exemplified by the fact that some tests were incorrectly marked as XFAIL when they should instead have been using the definition of streams from the headers, and hence passing, and that, regardless of whether visibility annotations are enabled.
  1211. [Sema] Push and Pop Expression Evaluation Context Records at the start and end of function definitions This patch creates a new context for every function definition we enter. Currently we do not push and pop on these, usually working off of the global context record added in the Sema constructor, which never gets popped. Differential Revision:
  1212. [MachineOutliner][NFC] Move std::vector out of loop See
  1213. [MachineOutliner][NFC] Remove IntegerInstructionMap from InstructionMapper Refactoring. This map was only used when we used a string of integers to output the outlined sequence. Since it's no longer used for anything, there's no reason to keep it around.
  1214. Fix title underlines being too short after r348429
  1215. [GlobalISel] Introduce G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC and G_CONCAT_VECTOR opcodes. These opcodes are intended to subsume some of the capability of G_MERGE_VALUES, as it was too powerful and thus complex to add deal with throughout the GISel pipeline. G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar operands which are larger than the destination vector element type, and therefore does an implicit truncate. G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed, vectors together. These will be used in a subsequent commit. This commit just adds the initial infrastructure. Differential Revision:
  1216. Update ARC docs as objc_storeStrong returns void not id
  1217. [MachineOutliner][NFC] Remove buildCandidateList and replace with findCandidates More refactoring. Since the pruning logic has changed, and the candidate list is gone, everything can be sunk into findCandidates. We no longer need to keep track of the length of the longest substring, so we can drop all of that logic as well. After this, we just find all of the candidates and move to outlining.
  1218. [MachineOutliner][NFC] Candidates don't need to be shared_ptrs anymore More refactoring. After the changes to the pruning logic, and removing CandidateList, there's no reason for Candiates to be shared_ptrs (or pointers at all). std::shared_ptr<Candidate> -> Candidate.
  1219. Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants" This change caused SEGVs in instcombine. (The r347934 change seems to me to be a precipitating cause, not a root cause. Details are on the llvm-commits thread for r347934.)
  1220. Fix test change from r348365 to deal with Windows paths correctly.
  1221. [WebAssembly] Change event section code to 13 Summary: We decided to change the event section code from 12 to 13 as new `DataCount` section in the bulk memory operations proposal will take the code 12 instead. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision:
  1222. [InstCombine] remove dead code from visitExtractElement Extracting from a splat constant is always handled by InstSimplify. Move the test for this from InstCombine to InstSimplify to make sure that stays true.
  1223. [MachineOutliner][NFC] Remove CandidateList, since it's now unused. After removing the pruning logic, there's no reason to populate a list of Candidates. Remove CandidateList and update comments.
  1224. Fix buildbot capture warning A bot didn't like my lambda. This ought to fix it. Example: error C3493: 'AlreadyRemoved' cannot be implicitly captured because no default capture mode has been specified
  1225. [MachineOutliner][NFC] Simplify and unify pruning/outlining logic Since we're now performing outlining per OutlinedFunction rather than per Candidate, we can simply outline each candidate as it shows up. Instead of having a pruning phase, instead, we'll outline entire functions. Then we'll update the UnsignedVec we mapped to reflect the deletion. If any candidate is in a space that's marked dirty, then we'll drop it. This lets us remove the pruning logic entirely, and greatly simplifies the code.
  1226. [Hexagon] Add intrinsics for Hexagon V66
  1227. [InstCombine] reduce duplication in visitExtractElementInst; NFC
  1228. [InstCombine] add/move tests for extractelement; NFC
  1229. ThinLTO: Do not import debug info for imported global constants It looks like this isn't necessary (in any tests I've done, it results in the global being described with no location or value in the imported side - while it's still fully described in the place it's imported from) & results in significant/pathological debug info growth to home these location-less global variable descriptions on the import side. This is a rather pressing/important issue to address - this regressed executable size for one example I'm looking at by 15%, object size is probably similar though I haven't measured it, and a 22x increase in the number of CUs in the cu_index in split DWARF DWP files, creating a similarly large regression in the time it takes llvm-symbolizer to run on such binaries. Reviewers: tejohnson, evgeny777 Differential Revision:
  1230. [Hexagon] Add support for Hexagon V66
  1231. [MachineOutliner] Outline functions by order of benefit Mostly NFC, only change is the order of outlined function names. Loop over the outlined functions instead of walking the candidate list. This is a bit easier to understand. It's far more natural to create a function, then replace all of its occurrences with calls than the other way around. The functions outlined after this do not change, but their names will be decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the most beneficial function, rather than the first one seen. This makes it easier to enforce an ordering on the outlined functions. So, this also adds a test to make sure that the ordering works as expected.
  1232. [Hexagon] Add intrinsics for Hexagon V66
  1233. NFC: Extract TextNodeDumper class Summary: Start by moving some utilities to it. It will eventually house dumping of individual nodes (after indentation etc has already been accounted for). Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  1234. [Hexagon] Add instruction definitions for Hexagon V66
  1235. NFC: Extract TextTreeStructure class Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  1236. NFC: Inline handling of DependentSizedArrayType Summary: Re-order handling of getElementType and getBracketsRange. It is necessary to perform all printing before any traversal to child nodes. This causes no change in the output of ast-dump-array.cpp due to the way child nodes are printed with a delay. This new order of the code is also the order that produces the expected output anyway. Subscribers: cfe-commits Differential Revision:
  1237. Add dump tests for inherited default template parameters
  1238. [Hexagon] Foundation of support for Hexagon V66
  1239. [GISel]: Provide standard interface to observe changes in GISel passes This provides a standard API across GISel passes to observe and notify passes about changes (insertions/deletions/mutations) to MachineInstrs. This patch also removes the recordInsertion method in MachineIRBuilder and instead provides method to setObserver. Reviewed by: vkeles.
  1240. [CodeExtractor] Do not marked outlined calls which may resume EH as noreturn Treat terminators which resume exception propagation as returning instructions (at least, for the purposes of marking outlined functions `noreturn`). This is to avoid inserting traps after calls to outlined functions which unwind. rdar://46129950
  1241. [X86][SSE] Fix a copy+paste typo that was folding the sext/zext of partial vectors
  1242. Revert "[RISCV] Mark unit tests as "requires: riscv-registered-target"" This reverts commit 8908dd12e7bbfc74e264233e900206ad31e285f0.
  1243. Do not check for parameters shadowing fields in function declarations. We would issue a false-positive diagnostic for parameters in function declarations shadowing fields; we now only issue the diagnostic on a function definition instead.
  1244. Adding tests for -ast-dump; NFC. This adds tests for various function and class template declarations.
  1245. [AArch64] Reword description of feature (NFC) Reword the description of the feature that enables custom handling of cheap instructions.
  1246. Honor -fdebug-prefix-map when creating function names for the debug info. This adds a callback to PrintingPolicy to allow CGDebugInfo to remap file paths according to -fdebug-prefix-map. Otherwise the debug info (particularly function names for C++ lambdas) may contain paths that should have been remapped in the debug info. <rdar://problem/46128056> Differential Revision:
  1247. [analyzer] Attribute for RetainCountChecker for OSObject should propagate with inheritance rdar://46388388 Differential Revision:
  1248. [llvm-mca] Simplify test (NFC)
  1249. Mention changes to libc++ include dir lookup in release notes. Summary: The change itself landed as r348365, see the comment for more details. Reviewers: arphaman, EricWF Reviewed By: arphaman Subscribers: cfe-commits Differential Revision:
  1250. [llvm-mca] Sort test run lines (NFC)
  1251. [MachineOutliner][NFC] Use getOccurrenceCount() in getNotOutlinedCost() Some more gardening.
  1252. [MachineOutliner][NFC] Make getters in MachineOutliner.h const Just some refactoring. A few of the getters in OutlinedFunction weren't const.
  1253. [MachineOutliner][NFC] Don't create outlined sequence from integer mapping Some gardening/refactoring. It's cleaner to copy the instructions into the MachineFunction using the first candidate instead of going to the mapper. Also, by doing this we can remove the Seq member from OutlinedFunction entirely.
  1254. [gold-plugin] allow function/data sections to be toggleable Summary: r336838 allowed these to be toggleable. r336858 reverted r336838. r336943 made the generation of these sections conditional on LDPO_REL. This commit brings back the toggle-ability. You can specify: -plugin-opt=-function-sections -plugin-opt=-data-sections For your linker flags to disable the changes made in r336943. Without toggling r336943 off, arm64 linux kernels linked with gold-plugin see significant boot time regressions, but with r336943 outright reverted x86_64 linux kernels linked with gold-plugin fail to boot. Reviewers: pcc, void Reviewed By: pcc Subscribers: javed.absar, kristof.beyls, llvm-commits, srhines Differential Revision:
  1255. Address a post-commit review comment on r348325.
  1256. [CodeComplete] Fix a crash in access checks of inner classes Summary: The crash was introduced in r348135. Reviewers: kadircet Reviewed By: kadircet Subscribers: cfe-commits Differential Revision:
  1257. AMDGPU: Fix using old address spaces in some tests
  1258. [Basic] Cleanups in IdentifierInfo following the removal of PTH The Entry pointer in IdentifierInfo was only null for IdentifierInfo created from a PTH. Now that PTH support has been removed we can remove some PTH specific code in IdentifierInfo::getLength and IdentifierInfo::getNameStart. Also make the constructor of IdentifierInfo private to make sure that they are only created by IdentifierTable, and move it to the header so that it can be inlined in IdentifierTable::get and IdentifierTable::getOwn. Differential Revision: Reviewed By: erichkeane
  1259. [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix:
  1260. [OpenCL] Diagnose conflicting address spaces in templates. Added new diagnostic when templates are instantiated with different address space from the one provided in its definition. This also prevents deducing generic address space in pointer type of templates to allow giving them concrete address space during instantiation. Differential Revision:
  1261. Allow norecurse attribute on functions that have debug infos. Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks. Reviewers: chandlerc, jmolloy, aprantl Reviewed By: aprantl Subscribers: aprantl, llvm-commits Differential Revision:
  1262. [X86] Add test case to show missed opportunity to combine a concat_vector into a scalar_to_vector. NFC This is a test for D55274.
  1263. [NFC] Use clang-format on PrintingPolicy::PrintingPolicy() after fd5c386f743 The white-space change was causing conflicts downstream. rdar://problem/46486841
  1264. Remove XFAIL in for NetBSD-MSan After updating GET_LINK_MAP_BY_DLOPEN_HANDLE() for recent NetBSD this test no longer fails.
  1265. [Sanitizer] nl_langinfo forgotten bit. M lib/sanitizer_common/sanitizer_platform_interceptors.h
  1266. [Sanitizer] expand nl_langinfo interception to FreeBSD Reviewers: krytarowski Reviewed By: krytarowski Differential Revision:
  1267. Revert "[IR] Add NODISCARD to attribute functions" Revert due to warnings-turned-into-errors in AMGPU targets. I'll fix the warnings first, then re-commit this patch.
  1268. [SLH] Fix a nasty bug in SLH. Whenever we effectively take the address of a basic block we need to manually update that basic block to reflect that fact or later passes such as tail duplication and tail merging can break the invariants of the code. =/ Sadly, there doesn't appear to be any good way of automating this or even writing a reasonable assert to catch it early. The change seems trivially and obviously correct, but sadly the only really good test case I have is 1000s of basic blocks. I've tried directly writing a test case that happens to make tail duplication do something that crashes later on, but this appears to require an *amazingly* complex set of conditions that I've not yet reproduced. The change is technically covered by the tests because we mark the blocks as having their address taken, but that doesn't really count as properly testing the functionality.
  1269. [SLH] Regenerate tests with --no_x86_scrub_rip to restore the higher fidelity checking of RIP-based references to basic blocks and other labels. These labels are super important for SLH tests so we should keep them readable in the test cases.
  1270. [IR] Add NODISCARD to attribute functions Summary: Many functions on `llvm::AttributeList` and `llvm::AttributeSet` are documented with "returns a new {list,set} because attribute {lists,sets} are immutable." This documentation can be aided by the addition of an attribute, `LLVM_NODISCARD`. Adding this prevents unsuspecting users of the API from expecting `AttributeList::setAttributes` from modifying the underlying list. At the very least, it would have saved me a few hours of debugging, since I had been doing just that! I had a bug in my program where I was calling `setAttributes` but then passing in the unmutated `AttributeList`. I tried adding LLVM_NODISCARD and confirmed that it would have made my bug immediately obvious. Reviewers: rnk, javed.absar Reviewed By: rnk Subscribers: llvm-commits Differential Revision:
  1271. [AMDGPU]: Turn on the DPP combiner by default Differential revision:
  1272. Add a new interceptor for modctl(2) from NetBSD Summary: modctl - controls loadable kernel modules. Skip tests as this call uses privileged operations. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1273. Add a new interceptor for nl_langinfo(3) from NetBSD Summary: nl_langinfo - gets locale information. Add a dedicated test. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1274. [Haiku] Support __float128 for x86 and x86_64 This patch addresses a compilation error with clang when running in Haiku being unable to compile code using float128 (throws compilation error such as 'float128 is not supported on this target'). Patch by kallisti5 (Alexander von Gluck IV) Differential Revision:
  1275. [InstCombine] simplify icmps with same operands based on dominating cmp The tests here are based on the motivating cases from D54827. More background: 1. We don't get these cases in general with SimplifyCFG because the root of the pattern match is an icmp, not a branch. I'm not sure how often we encounter this pattern vs. the seemingly more likely case with branches, but I don't see evidence to leave the minimal pattern unoptimized. 2. This has a chance of increasing compile-time because we're using a ValueTracking call to handle the match. The motivating cases could be handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/ isImpliedFalseByMatchingCmp, but I saw that we have a more comprehensive wrapper around those, so we might as well use it here unless there's evidence that it's significantly slower. 3. Ideally, we'd handle the fold to constants in InstSimplify, but as with the existing code here, we could extend this to handle cases where the result is not a constant, but a new combined predicate. That would mean splitting the logic across the 2 passes and possibly duplicating the pattern-matching cost. 4. As mentioned in D54827, this seems like the kind of thing that should be handled in Correlated Value Propagation, but that pass is currently limited to dealing with instructions with constant operands, so extending this bit of InstCombine is the smallest/easiest way to get these patterns optimized.
  1276. [X86][SSE] Begun adding modulo rotate support to LowerRotate Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions). I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions.
  1277. Move detection of libc++ include dirs to Driver on MacOS Summary: The intention is to make the tools replaying compilations from 'compile_commands.json' (clang-tidy, clangd, etc.) find the same standard library as the original compiler specified in 'compile_commands.json'. Previously, the library detection logic was in the frontend (InitHeaderSearch.cpp) and relied on the value of resource dir as an approximation of the compiler install dir. The new logic uses the actual compiler install dir and is performed in the driver. This is consistent with the C++ standard library detection on other platforms and allows to override the resource dir in the tools using the compile_commands.json without altering the standard library detection mechanism. The tools have to override the resource dir to make sure they use a consistent version of the builtin headers. There is still logic in InitHeaderSearch that attemps to add the absolute includes for the the C++ standard library, so we keep passing the -stdlib=libc++ from the driver to the frontend via cc1 args to avoid breaking that. In the long run, we should move this logic to the driver too, but it could potentially break the library detection on other systems, so we don't tackle it in this patch to keep its scope manageable. This is a second attempt to fix the issue, first one was commited in r346652 and reverted in r346675. The original fix relied on an ad-hoc propagation (bypassing the cc1 flags) of the install dir from the driver to the frontend's HeaderSearchOptions. Unsurpisingly, the propagation was incomplete, it broke the libc++ detection in clang itself, which caused LLDB tests to break. The LLDB tests pass with new fix. Reviewers: JDevlieghere, arphaman, EricWF Reviewed By: arphaman Subscribers: mclow.lists, ldionne, dexonsmith, ioeric, christof, kadircet, cfe-commits Differential Revision:
  1278. Revert: Honor -fdebug-prefix-map when creating function names for the debug info. This commit reverts r348060 and r348062 due to it breaking the AArch64 Full buildbot:
  1279. [llvm-rc] Support not expressions. Patch by Jacek Caban! Differential Revision:
  1280. [TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from SimplifyDemandedVectorElts These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases.
  1281. [clangd] Dont provide locations for non-existent files. Summary: We were getting assertion errors when we had bad file names, instead we should skip those. Reviewers: hokein Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1282. [DAG] Add fshl/fshr tblgen opcodes Missed off from
  1283. Fix compilation error when using clang 3.6.0
  1284. [test] Disable Modules/prune.m on NetBSD as it requires 'touch -a'
  1285. [test] Skip ThinLTO cache tests requiring atime setting on NetBSD Skip the ThinLTO cache tests on NetBSD. They require 'touch' being able to alter atime of files, while NetBSD inhibits atime updates when filesystem is mounted noatime. Differential Revision:
  1286. [test] Split strip-preserve-time.test, and skip atime test on NetBSD Split timestamp preservation tests into atime and mtime test, and skip the former on NetBSD. When the filesystem is mounted noatime, NetBSD not only inhibits implicit atime updates but also prevents setting atime via utime(), causing the test to fail. Differential Revision:
  1287. [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision:
  1288. [clang] - Simplify tools::SplitDebugName. This is an updated version of the D54576, which was reverted. Problem was that SplitDebugName calls the InputInfo::getFilename which asserts if InputInfo given is not of type Filename: const char *getFilename() const { assert(isFilename() && "Invalid accessor."); return Data.Filename; } At the same time at that point, it can be of type Nothing and we need to use getBaseInput(), like original code did. Differential revision:
  1289. [MC] - Fix build bot. Error was: /home/buildslave/slave_as-bldslv8/lld-perf-testsuite/llvm/lib/MC/MCFragment.cpp:241:22: error: field 'Offset' will be initialized after field 'LayoutOrder' [-Werror,-Wreorder] Atom(nullptr), Offset(~UINT64_C(0)), LayoutOrder(0) {
  1290. Remove superfluous comments. NFCI. As requested in D54698.
  1291. Recommit r348243 - "[llvm-mc] - Do not crash when referencing undefined debug sections." The patch triggered an unrelated msan issue: LayoutOrder variable was not initialized. ( It was fixed. Original commit message: MC has code that pre-creates few debug sections: If users code has a reference to such section but does not redefine it, MC code currently asserts, because still thinks they are normally defined. The patch fixes the issue. Differential revision: ---- Modified : /llvm/trunk/lib/MC/ELFObjectWriter.cpp Added : /llvm/trunk/test/MC/ELF/undefined-debug.s
  1292. [TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts mask Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version..... Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches.
  1293. [ARM GlobalISel] Implement call lowering for Thumb2 The only things that are different from arm are: * different opcodes for calls and returns * Thumb calls take predicate operands
  1294. Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array" .. and also the follow-ups r348336 r348338. It broke stand-alone compiler-rt builds with GCC 4.8: In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,                  from /work/llvm/projects/compiler-rt/lib/xray/ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’ /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’      new (AlignedOffset) T{std::forward<Args>(args)...};      ^ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’ /work/llvm/projects/compiler-rt/lib/xray/   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer] ’ /work/llvm/projects/compiler-rt/lib/xray/   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’ > Summary: > This change makes the allocator and function call trie implementations > move-aware and remove the FunctionCallTrie's reliance on a > heap-allocated set of allocators. > > The change makes it possible to always have storage associated with > Allocator instances, not necessarily having heap-allocated memory > obtainable from these allocator instances. We also use thread-local > uninitialised storage. > > We've also re-worked the segmented array implementation to have more > precondition and post-condition checks when built in debug mode. This > enables us to better implement some of the operations with surrounding > documentation as well. The `trim` algorithm now has more documentation > on the implementation, reducing the requirement to handle special > conditions, and being more rigorous on the computations involved. > > In this change we also introduce an initialisation guard, through which > we prevent an initialisation operation from racing with a cleanup > operation. > > We also ensure that the ThreadTries array is not destroyed while copies > into the elements are still being performed by other threads submitting > profiles. > > Note that this change still has an issue with accessing thread-local > storage from signal handlers that are instrumented with XRay. We also > learn that with the testing of this patch, that there will be cases > where calls to mmap(...) (through internal_mmap(...)) might be called in > signal handlers, but are not async-signal-safe. Subsequent patches will > address this, by re-using the `BufferQueue` type used in the FDR mode > implementation for pre-allocated memory segments per active, tracing > thread. > > We still want to land this change despite the known issues, with fixes > forthcoming. > > Reviewers: mboerger, jfb > > Subscribers: jfb, llvm-commits > > Differential Revision:
  1295. [LICM] *Actually* disable ControlFlowHoisting. Summary: The remaining code paths that ControlFlowHoisting introduced that were not disabled, increased compile time by 3x for some benchmarks. The time is spent in DominatorTree updates. Reviewers: john.brawn, mkazantsev Subscribers: sanjoy, jlebar, llvm-commits Differential Revision:
  1296. Revert "[clang-tidy] new check: bugprone-branch-clone" The patch broke on buildbot with assertion-failure. Revert until this is figured out.
  1297. [clang-tidy] new check: bugprone-branch-clone Summary: Implement a check for detecting if/else if/else chains where two or more branches are Type I clones of each other (that is, they contain identical code) and for detecting switch statements where two or more consecutive branches are Type I clones of each other. Patch by donat.nagy. Reviewers: alexfh, hokein, aaron.ballman, JonasToth Reviewed By: JonasToth Subscribers: MTC, lebedev.ri, whisperity, xazax.hun, Eugene.Zelenko, mgorny, rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Tags: #clang-tools-extra Differential Revision:
  1298. HowToBuildWithPGO.rst: Fix a few details in the manual steps Differential revision:
  1299. Fix a false positive in misplaced-widening-cast Summary: bugprone-misplaced-widening-cast check used to give a false warning to the following example. enum DaysEnum{ MON = 0, TUE = 1 }; day = (DaysEnum)(day + 1); //warning: either cast from 'int' to 'DaysEnum' is ineffective... But i think int to enum cast is not widening neither ineffective. Patch by dkrupp. Reviewers: JonasToth, alexfh Reviewed By: alexfh Subscribers: rnkovacs, Szelethus, gamesh411, cfe-commits Tags: #clang-tools-extra Differential Revision:
  1300. [X86] Remove -costmodel-reduxcost=true from the experimental vector reduction intrinsic tests as it appears to be unnecessary. NFC I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here.
  1301. [X86] Add more cost model tests for vector reductions with narrow vector types. NFC
  1302. [XRay] Use uptr instead of uintptr_t Follow-up to D54989.
  1303. AArch64: support funclets in fastcall and swift_call Functions annotated with `__fastcall` or `__attribute__((__fastcall__))` or `__attribute__((__swiftcall__))` may contain SEH handlers even on Win64. This matches the behaviour of cl which allows for `__try`/`__except` inside a `__fastcall` function. This was detected while trying to self-host clang on Windows ARM64.
  1304. [XRay] Use deallocateBuffer instead of deallocate Follow-up to D54989.
  1305. [XRay] Move-only Allocator, FunctionCallTrie, and Array Summary: This change makes the allocator and function call trie implementations move-aware and remove the FunctionCallTrie's reliance on a heap-allocated set of allocators. The change makes it possible to always have storage associated with Allocator instances, not necessarily having heap-allocated memory obtainable from these allocator instances. We also use thread-local uninitialised storage. We've also re-worked the segmented array implementation to have more precondition and post-condition checks when built in debug mode. This enables us to better implement some of the operations with surrounding documentation as well. The `trim` algorithm now has more documentation on the implementation, reducing the requirement to handle special conditions, and being more rigorous on the computations involved. In this change we also introduce an initialisation guard, through which we prevent an initialisation operation from racing with a cleanup operation. We also ensure that the ThreadTries array is not destroyed while copies into the elements are still being performed by other threads submitting profiles. Note that this change still has an issue with accessing thread-local storage from signal handlers that are instrumented with XRay. We also learn that with the testing of this patch, that there will be cases where calls to mmap(...) (through internal_mmap(...)) might be called in signal handlers, but are not async-signal-safe. Subsequent patches will address this, by re-using the `BufferQueue` type used in the FDR mode implementation for pre-allocated memory segments per active, tracing thread. We still want to land this change despite the known issues, with fixes forthcoming. Reviewers: mboerger, jfb Subscribers: jfb, llvm-commits Differential Revision:
  1306. [X86] Add narrow vector test cases to vector-reduce* tests. Add copies of the tests with -x86-experimental-vector-widening-legalization
  1307. [NFC] Verify memoryssa in test for PR39783
  1308. [clang-tidy/checks] Update objc-property-declaration check to allow arbitrary acronyms and initialisms 🔧 Summary: §1 Description This changes the objc-property-declaration check to allow arbitrary acronyms and initialisms instead of using whitelisted acronyms. In Objective-C it is relatively common to use project prefixes in property names for the purposes of disambiguation. For example, the CIColor¹ and CGColor² properties on UIColor both represent symbol prefixes being used in proeprty names outside of Apple's accepted acronyms³. The union of Apple's accepted acronyms and all symbol prefixes that might be used for disambiguation in property declarations effectively allows for any arbitrary sequence of capital alphanumeric characters to be acceptable in property declarations. This change updates the check accordingly. The test variants with custom configurations are deleted as part of this change because their configurations no longer impact behavior. The acronym configurations are currently preserved for backwards compatibility of check configuration. [1] [2] [3] §2 Test Notes Changes verified by: • Running clang-tidy unit tests. • Used to verify expected output of processing objc-property-declaration.m Reviewers: benhamilton, Wizard Reviewed By: benhamilton Subscribers: jfb, cfe-commits Differential Revision:
  1309. [MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber. This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef. The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before. Differential Revision:
  1310. Update GET_LINK_MAP_BY_DLOPEN_HANDLE() for NetBSD x86 NetBSD 8.99.26 changed the layout of internal structure returned by dlopen(3), switch to it. Set new values for amd64 and i386 based on the results of &((struct Struct_Obj_Entry*)0)->linkmap.
  1311. [clang-query] Continue if compilation command not found for some files When searching for a code pattern in an entire project with a compilation database it's tempting to run ``` clang-query **.cpp ``` And yet, that often breaks because some files are just not in the compilation database: tests, sample code, etc.. clang-query should not stop when encountering such cases. Differential Revision:
  1312. [asan] Add clang flag -fsanitize-address-use-odr-indicator Reviewers: eugenis, m.ostapenko, ygribov Subscribers: hiraditya, llvm-commits Differential Revision:
  1313. [TableGen] Preserve order of output operands in DAGISelMatcherGen Summary: This fixes support in DAGISelMatcher backend for DAG nodes with multiple result values. Previously the order of results in selected DAG nodes always matched the order of results in ISel patterns. After the change the order of results matches the order of operands in OutOperandList instead. For example, given this definition from the attached test case: def INSTR : Instruction { let OutOperandList = (outs GPR:$r1, GPR:$r0); let InOperandList = (ins GPR:$t0, GPR:$t1); let Pattern = [(set i32:$r0, i32:$r1, (udivrem i32:$t0, i32:$t1))]; } the DAGISelMatcher backend currently produces a matcher that creates INSTR nodes with the first result `$r0` and the second result `$r1`, contrary to the order in the OutOperandList. The order of operands in OutOperandList does not matter at all, which is unexpected (and unfortunate) because the order of results of a DAG node does matters, perhaps a lot. With this change, if the order in OutOperandList does not match the order in Pattern, DAGISelMatcherGen emits CompleteMatch opcodes with the order of results taken from OutOperandList. Backend writers can use it to express result reorderings in TableGen. If the order in OutOperandList matches the order in Pattern, the result of DAGISelMatcherGen is unaffected. Patch by Eugene Sharygin Reviewers: andreadb, bjope, hfinkel, RKSimon, craig.topper Reviewed By: craig.topper Subscribers: nhaehnle, craig.topper, llvm-commits Differential Revision:
  1314. [Sema] Remove some conditions of a failing assert We should have been checking that this state is consistent, but its possible for it to be filled later, so it isn't really sound to check it here anyways. Fixes
  1315. [SelectionDAG] Split very large token factors for loads into 64k chunks. There's a 64k limit on the number of SDNode operands, and some very large functions with 64k or more loads can cause crashes due to this limit being hit when a TokenFactor with this many operands is created. To fix this, create sub-tokenfactors if we've exceeded the limit. No test case as it requires a very large function. rdar://45196621 Differential Revision:
  1316. [ADT] Add zip_longest iterators. Like the already existing zip_shortest/zip_first iterators, zip_longest iterates over multiple iterators at once, but has as many iterations as the longest sequence. This means some iterators may reach the end before others do. zip_longest uses llvm::Optional's None value to mark a past-the-end value. zip_longest is not reverse-iteratable because the tuples iterated over would be different for different length sequences (IMHO for the same reason neither zip_shortest nor zip_first should be reverse-iteratable; one can still reverse the ranges individually if that's the expected behavior). In contrast to zip_shortest/zip_first, zip_longest tuples contain rvalues instead of references. This is because llvm::Optional cannot contain reference types and the value-initialized default does not have a memory location a reference could point to. The motivation for these iterators is to use C++ foreach to compare two lists of ordered attributes in D48100 (SemaOverload.cpp and ASTReaderDecl.cpp). Idea by @hfinkel. This re-commits r348301 which was reverted by r348303. The compilation error by gcc 5.4 was resolved using make_tuple in the in the initializer_list. The compileration error by msvc14 was resolved by splitting ZipLongestValueType (which already was a workaround for msvc15) into ZipLongestItemType and ZipLongestTupleType. Differential Revision:
  1317. LTO: Don't internalize available_externally globals. This breaks C and C++ semantics because it can cause the address of the global inside the module to differ from the address outside of the module. Differential Revision:
  1318. [AArch64][GlobalISel] Re-enable selection of volatile loads. We previously disabled this in r323371 because of a bug where we selected an extending load, but didn't delete the old G_LOAD, resulting in two loads being generated for volatile loads. Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the tablegen patterns should no longer be able to select (ext(load x)) patterns, it should be safe to re-enable it. The old test case should still work as expected.
  1319. Remove the hash code from CVRecord. This is no longer used and is just taking up space in the structure. Heap allocation of this structure is on the critical path, so space actually matters.
  1320. [clang-tidy] Ignore namespaced and C++ member functions in google-objc-function-naming check 🙈 Summary: The google-objc-function-naming check applies to functions that are not namespaced and should not be applied to C++ member functions. Such function declarations should be ignored by the check to avoid false positives in Objective-C++ sources. Reviewers: benhamilton, aaron.ballman Reviewed By: aaron.ballman Subscribers: xazax.hun, cfe-commits Differential Revision:
  1321. [asan] Split -asan-use-private-alias to -asan-use-odr-indicator Reviewers: eugenis, m.ostapenko, ygribov Subscribers: mehdi_amini, kubamracek, hiraditya, steven_wu, dexonsmith, llvm-commits Differential Revision:
  1322. [asan] Remove use_odr_indicator runtime flag Summary: Flag was added for testing 3 years ago. Probably it's time to simplify code and usage by removing it. Reviewers: eugenis, m.ostapenko Subscribers: mehdi_amini, kubamracek, steven_wu, dexonsmith, llvm-commits Differential Revision:
  1323. Fix crash if an in-class explicit function specialization has explicit template arguments referring to template paramaeters.
  1324. [InstCombine] add tests for implied simplifications; NFC Ideally, we would fold all of these in InstSimplify in a similar way to rL347896, but this is a bit awkward when we're trying to simplify a compare directly because the ValueTracking API expects the compare as an input, but in InstSimplify, we just have the operands of the compare. Given that we can do transforms besides just simplifications, we might as well just extend the code in InstCombine (which already does simplifications with constant operands).
  1325. AArch64: clean up some whitespace in Windows CC (NFC) Drive by clean up for Windows ARM64 variadic CC (NFC).
  1326. Adding tests for -ast-dump; NFC. This adds tests for the definition data of C++ record objects as well as special member functions.
  1327. Add tests for dumping base classes; NFC.
  1328. [llvm-pdbutil] Remove the analyze subcommand. Nobody has used this since it was introduced, and it doesn't have test coverage.
  1329. [PDB] Emit S_UDT records in LLD. Previously these were dropped. We now understand them sufficiently well to start emitting them. From the debugger's perspective, this now enables us to have debug info about typedefs (both global and function-locally scoped) Differential Revision:
  1330. [AVR] Silence fallthrough warning. NFC.
  1331. Revert "[ADT] Add zip_longest iterators" This reverts commit r348301. Compilation fails on buildbots with older versions of gcc and msvc.
  1332. [Documentation] Make options section in Clang-tidy readability-uppercase-literal-suffix consistent with other checks.
  1333. [ADT] Add zip_longest iterators Like the already existing zip_shortest/zip_first iterators, zip_longest iterates over multiple iterators at once, but has as many iterations as the longest sequence. This means some iterators may reach the end before others do. zip_longest uses llvm::Optional's None value to mark a past-the-end value. zip_longest is not reverse-iteratable because the tuples iterated over would be different for different length sequences (IMHO for the same reason neither zip_shortest nor zip_first should be reverse-iteratable; one can still reverse the ranges individually if that's the expected behavior). In contrast to zip_shortest/zip_first, zip_longest tuples contain rvalues instead of references. This is because llvm::Optional cannot contain reference types and the value-initialized default does not have a memory location a reference could point to. The motivation for these iterators is to use C++ foreach to compare two lists of ordered attributes in D48100 (SemaOverload.cpp and ASTReaderDecl.cpp). Idea by @hfinkel. Differential Revision:
  1334. [PowerPC] Make no-PIC default to match GCC - CLANG Make -fno-PIC default on PowerPC LE. Differential Revision:
  1335. [PowerPC] Make no-PIC default to match GCC - LLVM Change the default for PowerPC LE to -fno-PIC. Differential Revision:
  1336. Fix sanitizer unit test
  1337. [libcxx] Always enable availability in the lit test suite. Summary: Running the tests without availability enabled doesn't really make sense: availability annotations allow catching errors at compile-time instead of link-time. Running the tests without availability enabled allows confirming that a test breaks at link-time under some configuration, but it is more useful to instead check that it should fail at compile-time. Always enabling availability in the lit test suite will greatly simplify XFAILs and troubleshooting of failing tests, which is currently a giant pain because we have these two levels of possible failure: link-time and compile-time. Reviewers: EricWF, mclow.lists Subscribers: christof, jkorous, dexonsmith, libcxx-commits Differential Revision:
  1338. Unbreak build due to style.
  1339. [Sanitizer] intercept part of sysctl Api - Distringuish what FreeBSD/NetBSD can and NetBSD specifics. - Fixing page size value collection. Reviewers: krytarowski, vitalybuka Reviewed By: krytarowski Differential Revision:
  1340. [CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC The old function underspecified the return type, took an unused parameter, and had a misleading name.
  1341. Move llc-start-stop-instance to x86 Avoid bot failures where the host pass setup might not have 2 dead-mi-elimination runs
  1342. [SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI.
  1343. AMDGPU: Add f32 vectors to SGPR register classes
  1344. MIR: Add method to stop after specific runs of passes Currently if you use -{start,stop}-{before,after}, it picks the first instance with the matching pass name. If you run the same pass multiple times, there's no way to distinguish them. Allow specifying a run index wih ,N to specify which you mean.
  1345. [InstCombine] rearrange foldICmpWithDominatingICmp; NFC Move it out from under the constant check, reorder predicates, add comments. This makes it easier to extend to handle the non-constant case.
  1346. [dsymutil] Ensure we're comparing time stamps with the same precision. After TimePoint's precision was increased in LLVM we started seeing failures because the modification times didn't match. This adds a time cast to ensure that we're comparing TimePoints with the same amount of precision.
  1347. [X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need.
  1348. [AST] Assert that no type class is polymorphic Add a static_assert checking that no type class is polymorphic. People should use LLVM style RTTI instead. Differential Revision: Reviewed By: aaron.ballman
  1349. Revert "Avoid emitting redundant or unusable directories in DIFile metadata entries." This reverts commit r348154 and follow-up commits r348211 and r3248213. Reason: the original commit broke compiler-rt tests and a follow-up fix (r348203) broke our integrate and was reverted.
  1350. Revert "Adapt gcov to changes in CFE." This reverts commit r348203. Reason: this produces absolute paths in .gcno files, breaking us internally as we rely on them being consistent with the filenames passed in the command line. Also reverts r348157 and r348155 to account for revert of r348154 in clang repository.
  1351. [AST] Assert that no statement/expression class is polymorphic Add a static_assert checking that no statement/expression class is polymorphic. People should use LLVM style RTTI instead. Differential Revision: Reviewed By: aaron.ballman
  1352. [X86][SSE] Add MOVMSK demandedbits/elts tests
  1353. [AST][NFC] Make ArrayTypeTraitExpr non polymorphic ArrayTypeTraitExpr is the only expression class which is polymorphic. As far as I can tell this is completely pointless. Differential Revision: Reviewed By: aaron.ballman
  1354. [Hexagon] Update builtin definitions
  1355. [InstCombine] auto-generate full checks for icmp overflow tests; NFC
  1356. [InstCombine] add helper for icmp with dominator; NFC There's a potential small enhancement to this code that could solve the cases currently under proposal in D54827 via SimplifyCFG. Whether instcombine should be doing this kind of semi-non-local analysis in the first place is an open question, but separating the logic out can only help if/when we decide to move it to a different pass. AFAICT, any proposal to do this in SimplifyCFG could also be seen as an overreach + it would be incomplete to start the fold from a branch rather than an icmp. There's another question here about the code for processUGT_ADDCST_ADD(). That part may be completely dead after rL234638 ?
  1357. [OPENMP][NVPTX]Fixed emission of the critical region. Critical regions in NVPTX are the constructs, which, generally speaking, are not supported by the NVPTX target. Instead we're using special technique to handle the critical regions. Currently they are supported only within the loop and all the threads in the loop must execute the same critical region. Inside of this special regions the regions still must be emitted as critical, to avoid possible data races between the teams + synchronization must use __kmpc_barrier functions.
  1358. [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent. __kmpc_barrier runtime functions must be marked as convergent to prevent some dangerous optimizations. Also, for NVPTX target all barriers must be emitted as simple barriers.
  1359. [InstCombine] auto-generate full checks for icmp dominator tests; NFC
  1360. [Hexagon] Remove unused checker functions from asm parser
  1361. Remove reference to recently removed PTH Documentation. Removed in r348266 Change-Id: Icff0212f57c42ca84ec174ddd4366ae63a7923fa
  1362. [SimpleLoopUnswitch] Remove debug dump.
  1363. PTH-- Remove feature entirely- When debugging a boost build with a modified version of Clang, I discovered that the PTH implementation stores TokenKind in 8 bits. However, we currently have 368 TokenKinds. The result is that the value gets truncated and the wrong token gets picked up when including PTH files. It seems that this will go wrong every time someone uses a token that uses the 9th bit. Upon asking on IRC, it was brought up that this was a highly experimental features that was considered a failure. I discovered via googling that BoostBuild (mostly Boost.Math) is the only user of this feature, using the CC1 flag directly. I believe that this can be transferred over to normal PCH with minimal effort: Based on advice on IRC and research showing that this is a nearly completely unused feature, this patch removes it entirely. Note: I considered leaving the build-flags in place and making them emit an error/warning, however since I've basically identified and warned the only user, it seemed better to just remove them. Differential Revision: Change-Id: If32744275ef1f585357bd6c1c813d96973c4d8d9
  1364. Add common check prefix. NFCI.
  1365. [yaml2obj] Move redundant statements into a separate static function Reviewers: jhenderson, grimar Reviewed By: jhenderson Subscribers: jakehehrlich, llvm-commits Differential Revision:
  1366. Update MemorySSA in SimpleLoopUnswitch. Summary: Teach SimpleLoopUnswitch to preserve MemorySSA. Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision:
  1367. Fix "array must be initialized with a brace-enclosed initializer" build error. Try to fix clang-bpf-build buildbot.
  1368. [SanitizerCommon] Test `CombinedAllocator::ForEachChunk()` in unit tests. Summary: Previously we weren't testing this function in the unit tests. Reviewers: kcc, cryptoad, dvyukov, eugenis, kubamracek Subscribers: #sanitizers, llvm-commits Differential Revision:
  1369. [GN][NFC] Update readme example to functional command `ninja -C out/gn check-lld` is not a valid command yet Differential revision:
  1370. [X86][NFC] Add more constant-size memcmp tests.
  1371. Fix MSVC "unknown pragma" warning. NFCI.
  1372. Fix -Wparentheses warning. NFCI.
  1373. [X86] Remove unnecessary peekThroughEXTRACT_SUBVECTORs call. The GetSplatValue/IsSplatVector call will call this anyhow and the later code is just for a v2i64 type so doesn't need it.
  1374. [clangd] Partition include graph on auto-index. Summary: Partitions include graphs in auto-index so that each shards contains only part of the include graph related to itself. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1375. [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision:
  1376. Revert rL348121 from llvm/trunk: [NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: ........ This has been causing buildbots failures for the past 24 hours:
  1377. Revert r348243 "[llvm-mc] - Do not crash when referencing undefined debug sections." It broke msan and asan bots it seems:
  1378. [SystemZ] Do not support __float128 As of rev. 268898, clang supports __float128 on SystemZ. This seems to have been in error. GCC has never supported __float128 on SystemZ, since the "long double" type on the platform is already IEEE-128. (GCC only supports __float128 on platforms where "long double" is some other data type.) For compatibility reasons this patch removes __float128 on SystemZ again. The test case is updated accordingly.
  1379. [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.
  1380. [Analyzer] Iterator Checker - Forbid decrements past the begin() and increments past the end() of containers Previously, the iterator range checker only warned upon dereferencing of iterators outside their valid range as well as increments and decrements of out-of-range iterators where the result remains out-of-range. However, the C++ standard is more strict than this: decrementing begin() or incrementing end() results in undefined behaviour even if the iterator is not dereferenced afterwards. Coming back to the range once out-of-range is also undefined. This patch corrects the behaviour of the iterator range checker: warnings are given for any operation whose result is ahead of begin() or past the end() (which is the past-end iterator itself, thus now we are speaking of past past-the-end). Differential Revision:
  1381. [Analyzer] Iterator Checkers - Use the region of the topmost base class for iterators stored in a region If an iterator is represented by a derived C++ class but its comparison operator is for its base the iterator checkers cannot recognize the iterators compared. This results in false positives in very straightforward cases (range error when dereferencing an iterator after disclosing that it is equal to the past-the-end iterator). To overcome this problem we always use the region of the topmost base class for iterators stored in a region. A new method called getMostDerivedObjectRegion() was added to the MemRegion class to get this region. Differential Revision:
  1382. [llvm-mc] - Do not crash when referencing undefined debug sections. MC has code that pre-creates few debug sections: If users code has a reference to such section but does not redefine it, MC code currently asserts, because still thinks they are normally defined. The patch fixes the issue. Differential revision:
  1383. [llvm-dwarfdump] - Dump the older versions of .eh_frame/.debug_frame correctly. The issue is the following. DWARF 2 used version 1 for .debug_frame. (Appendix G, p. 416 lib/MC now always sets version 1 for .eh_frame (and sets 1-4 versions for .debug_frame correctly): In version 1, return_address_register was defined as ubyte, while other versions switched to uleb128. (p 62, Patch teaches llvm-dwarfdump about this difference. Differential revision:
  1384. Extend test for DependentSizedArrayType Use a using declaration to force the type to appear in the -ast-dump output.
  1385. [WIP][Sema] Improve static_assert diagnostics for type traits. Summary: In our codebase, `static_assert(std::some_type_trait<Ts...>::value, "msg")` (where `some_type_trait` is an std type_trait and `Ts...` is the appropriate template parameters) account for 11.2% of the `static_assert`s. In these cases, the `Ts` are typically not spelled out explicitly, e.g. `static_assert(std::is_same<SomeT::TypeT, typename SomeDependentT::value_type>::value, "message");` The diagnostic when the assert fails is typically not very useful, e.g. `static_assert failed due to requirement 'std::is_same<SomeT::TypeT, typename SomeDependentT::value_type>::value' "message"` This change makes the diagnostic spell out the types explicitly , e.g. `static_assert failed due to requirement 'std::is_same<int, float>::value' "message"` See tests for more examples. After this is submitted, I intend to handle `static_assert(!std::some_type_trait<Ts...>::value, "msg")`, which is another 6.6% of static_asserts. Subscribers: cfe-commits Differential Revision:
  1386. Remove unnecessary include.
  1387. [X86] Remove custom DAG combine for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG. We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have.
  1388. [compiler-rt] Use the new zx_futex_wait for Fuchsia sanitizer runtime This finishes the soft-transition to the new primitive that implements priority inheritance. Differential Revision:
  1389. [analyzer] MoveChecker: Add more common state resetting methods. Includes "resize" and "shrink" because they can reset the object to a known state in certain circumstances. Differential Revision:
  1390. [Sema] Provide -fvisibility-global-new-delete-hidden option When the global new and delete operators aren't declared, Clang provides and implicit declaration, but this declaration currently always uses the default visibility. This is a problem when the C++ library itself is being built with non-default visibility because the implicit declaration will force the new and delete operators to have the default visibility unlike the rest of the library. The existing workaround is to use assembly to enforce the visiblity: but that solution is not always available, e.g. in the case of of libFuzzer which is using an internal version of libc++ that's also built with -fvisibility=hidden where the existing behavior is causing issues. This change introduces a new option -fvisibility-global-new-delete-hidden which makes the implicit declaration of the global new and delete operators hidden. Differential Revision:
  1391. Fix -Wmismatched-tags to not warn on redeclarations of structs in system headers. Previously, we would only check whether the new declaration is in a system header, but that requires the user to be able to correctly guess whether a declaration in a system header is declared as a struct or a class when specializing standard library traits templates. We now entirely ignore declarations for which the warning was disabled when determining whether to warn on a tag mismatch. Also extend the diagnostic message to clarify that a) code containing such a tag mismatch is in fact valid and correct, and b) the (non-coding-style) reason to emit such a warning is that the Microsoft C++ ABI is broken and includes the tag kind in decorated names, as it seems a lot of users are confused by our diagnostic here (either not understanding why we produce it, or believing that it represents an actual language rule).
  1392. Improve the regerror(3) interceptor The res returned value might differ with REAL(strlen)(errbuf) + 1, as the buffer's value is limited with errbuf_size. Hot fix for D54584.
  1393. Reverting r348215 Causing failures on ubsan buildbot boxes.
  1394. [analyzer] MoveChecker: Improve warning and note messages. The warning piece traditionally describes the bug itself, i.e. "The bug is a _____", eg. "Attempt to delete released memory", "Resource leak", "Method call on a moved-from object". Event pieces produced by the visitor are usually in a present tense, i.e. "At this moment _____": "Memory is released", "File is closed", "Object is moved". Additionally, type information is added into the event pieces for STL objects (in order to highlight that it is in fact an STL object), and the respective event piece now mentions that the object is left in an unspecified state after it was moved, which is a vital piece of information to understand the bug. Differential Revision:
  1395. Add interceptors for the sysctl(3) API family from NetBSD Summary: Add new interceptors for: - sysctl - sysctlbyname - sysctlgetmibinfo - sysctlnametomib - asysctl - asysctlbyname Cover the API with a new test file TestCases/NetBSD/ Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: devnexen, kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1396. Add interceptors for the fts(3) API family from NetBSD Summary: fts(3) is API to traverse a file hierarchy. Cover this interface with interceptors. Add a test to validate the interface reading the number of regular files in /etc. Based on original work by Yang Zheng. Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: tomsun.0.7, kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1397. [TableGen] Improve the formatting of the emitted predicates (NFC)
  1398. [TableGen] Fix typo in emitted comment (NFC)
  1399. Add new interceptor for regex(3) in NetBSD Summary: Add interceptors for the NetBSD style of regex(3) present inside libc: - regcomp - regexec - regerror - regfree - regnsub - regasub Add a dedicated test verifying the installed interceptors. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1400. [ExecutionEngine] Change NotifyObjectEmitted/NotifyObjectFreed API. This patch renames both methods (NotifyObjectEmitted -> notifyObjectLoaded, and NotifyObjectFreed -> notifyObjectFreed), adds an abstract "ObjectKey" (uint64_t) parameter to notifyObjectLoaded, and replaces the ObjectFile parameter for notifyObjectFreed with an ObjectKey. Using an ObjectKey to track identify events, rather than a reference to the ObjectFile, allows us to free the ObjectFile after notifyObjectLoaded is called, saving memory.
  1401. [ARM64][Windows] Fix local stack size for funclets The comment was misplaced, and the code didn't do what the comment indicated, namely ignoring the varargs portion when computing the local stack size of a funclet in emitEpilogue. This results in incorrect offset computations within funclets that are contained in vararg functions. Differential Revision:
  1402. [asan] Reduce binary size by using unnamed private aliases Summary: --asan-use-private-alias increases binary sizes by 10% or more. Most of this space was long names of aliases and new symbols. These symbols are not needed for the ODC check at all. Reviewers: eugenis Subscribers: hiraditya, llvm-commits Differential Revision:
  1403. [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo This moves the stack check logic into a lambda within getOutliningCandidateInfo. This allows us to be less conservative with stack checks. Whether or not a stack instruction is safe to outline is dependent on the frame variant and call variant of the outlined function; only in cases where we modify the stack can these be unsafe. So, if we move that logic later, when we're looking at an individual candidate, we can make better decisions here. This gives some code size savings as a result.
  1404. [MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic If we dropped too many candidates to be beneficial when dropping candidates that modify the stack, there's no reason to check for other cost model qualities.
  1405. NFC: Make this test kinder on downstream forks Downstream forks that have their own attributes often run into this test failing when a new attribute is added to clang because the number of supported attributes no longer match. This is redundant information for this test, so we can get by without it. rdar://46288577
  1406. [projects] Use directory name for add_llvm_external_projects add_llvm_external_projects expects the directory name instead of the full path, otherwise the check for an in-tree subproject will fail and the project won't be configured.
  1407. [ThinLTO] Look through aliases when computing hash keys Without this, we don't consider types used by aliasees in our cache key. This caused issues when using the same cache for thin-linking the same TU with different sets of virtual call candidates for a virtual call inside of a constructor. That's sort of a mouthful. :) Differential Revision:
  1408. [IR] Don't assume all functions are 4 byte aligned In some cases different alignments for function might be used to save space e.g. thumb mode with -Oz will try to use 2 byte function alignment. Similar patch that fixed this in other areas exists here Differential Revision:
  1409. [Hexagon] Fix intrinsic test
  1410. Relax test even more for Windows
  1411. Remove unused empty arm64 directory
  1412. Relax tests to also work on Windows
  1413. [analyzer] MoveChecker: Restrict to locals and std:: objects. In general case there use-after-move is not a bug. It depends on how the move-constructor or move-assignment is implemented. In STL, the convention that applies to most classes is that the move-constructor (-assignment) leaves an object in a "valid but unspecified" state. Using such object without resetting it to a known state first is likely a bug. Objects Local value-type variables are special because due to their automatic lifetime there is no intention to reuse space. If you want a fresh object, you might as well make a new variable, no need to move from a variable and than re-use it. Therefore, it is not always a bug, but it is obviously easy to suppress when it isn't, and in most cases it indeed is - as there's no valid intention behind the intentional use of a local after move. This applies not only to local variables but also to parameter variables, not only of value type but also of rvalue reference type (but not to lvalue references). Differential Revision:
  1414. NFC: Add .vscode to .gitignore
  1415. [analyzer] MoveChecker: NFC: Remove the workaround for the "zombie symbols" bug. The checker had extra code to clean up memory regions that were sticking around in the checker without ever being cleaned up due to the bug that was fixed in r347953. Because of that, if a region was moved from, then became dead, and then reincarnated, there were false positives. Why regions are even allowed to reincarnate is a separate story. Luckily, this only happens for local regions that don't produce symbols when loaded from. No functional change intended. The newly added test demonstrates that even though no cleanup is necessary upon destructor calls, the early return cannot be removed. It was not failing before the patch. Differential Revision:
  1416. [Hexagon] Switch to auto-generated intrinsic definitions and patterns
  1417. [CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433) If a PHI node out of extracted region has multiple incoming values from it, split this PHI on two parts. First PHI has incomings only from region and extracts with it (they are placed to the separate basic block that added to the list of outlined), and incoming values in original PHI are replaced by first PHI. Similar solution is already used in CodeExtractor for PHIs in entry block (severSplitPHINodes method). It covers PR39433 bug. Patch by Sergei Kachkov! Differential Revision:
  1418. Adapt gcov to changes in CFE. The clang frontend no longer emits the current working directory for DIFiles containing an absolute path in the filename: and will move the common prefix between current working directory and the file into the directory: component. This fixes the GCOV tests in compiler-rt that were broken by the Clang change.
  1419. [Documentation] Fix formatting and wrap up to 80 characters in Clang-tidy readability-uppercase-literal-suffix documentation.
  1420. [analyzer] Rename MisusedMovedObjectChecker to MoveChecker This follows the Static Analyzer's tradition to name checkers after things in which they find bugs, not after bugs they find. Differential Revision:
  1421. [analyzer] Dump stable identifiers for objects under construction. This continues the work that was started in r342313, which now gets applied to object-under-construction tracking in C++. Makes it possible to debug temporaries by dumping exploded graphs again. Differential Revision:
  1422. [AST] [analyzer] NFC: Reuse code in stable ID dumping methods. Use the new fancy method introduced in r348197 to simplify some code. Differential Revision:
  1423. [AST] Generate unique identifiers for CXXCtorInitializer objects. This continues the work started in r342309 and r342315 to provide identifiers to AST objects that are shorter and easier to read and remember than pointers. Differential Revision:
  1424. BumpPtrAllocator: Add a couple of convenient wrappers around identifyObject(). This allows obtaining smaller, more readable identifiers in a more comfortable way. Differential Revision:
  1425. [Hexagon] Extract operand decoders into a separate file, NFC These decoders are automatically generated. Keeping them separated makes updating architectures easier.
  1426. [DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision:
  1427. [mips] Fix TestDWARF32Version5Addr8AllForms test failure on MIPS hosts The `DIEExpr` is used in debug information entries for either TLS variables or call sites. For now the last case is unsupported for targets with delay slots, for MIPS in particular. The `DIEExpr::EmitValue` method calls a virtual `EmitDebugThreadLocal` routine which, in case of MIPS, always emits either `.dtprelword` or `.dtpreldword` directives. That is okay for "main" code, but in unit tests `DIEExpr` instances can be created not for TLS variables only even on MIPS hosts. That is a reason of the `TestDWARF32Version5Addr8AllForms` failure because handling of the `R_MIPS_TLS_DTPREL` relocation writes incorrect value into dwarf structures. And anyway unconditional emitting of `.dtprelword` directives will be incorrect when/if debug information entries for call sites become supported on MIPS. The patch solves the problem by wrapping expression created in the `MipsTargetObjectFile::getDebugThreadLocalSymbol` method in to the `MipsMCExpr` expression with a new `MEK_DTPREL` tag. This tag is recognized in the `MipsAsmPrinter::EmitDebugThreadLocal` method and `.dtprelword` directives created in this case only. In other cases the expression saved as a regular data. Differential Revision:
  1428. [Hexagon] Remove unused encodings, NFC
  1429. Typo correction; NFC.
  1430. [InstCombine] fix undef propagation bug with shuffle+binop When we have a shuffle that extends a source vector with undefs and then do some binop on that, we must make sure that the extra elements remain undef with that binop if we reverse the order of the binop and shuffle. 'or' is probably the easiest example to show the bug because 'or C, undef --> -1' (not undef). But there are other opcode/constant combinations where this is true as shown by the 'shl' test.
  1431. [gn build] Use print_function in No behavior change, just makes the script match the other scripts in llvm/utils/gn/build. Differential Revision:
  1432. NFC: Simplify dumpStmt child handling Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  1433. Re-apply r347954 "[analyzer] Nullability: Don't detect post factum violation..." Buildbot failures were caused by an unrelated UB that was introduced in r347943 and fixed in r347970. Also the revision was incorrectly specified as r344580 during revert. Differential Revision:
  1434. [gcov/Darwin] Ensure external symbols are exported when using an export list Make sure that symbols needed to implement runtime support for gcov are exported when using an export list on Darwin. Without the clang driver exporting these symbols, the linker hides them, resulting in tapi verification failures. rdar://45944768 Differential Revision:
  1435. [WebAssembly] Enforce assembler emits to streamer in order. Summary: The assembler processes directives and instructions in whatever order they are in the file, then directly emits them to the streamer. This could cause badly written (or generated) .s files to produce incorrect binaries. It now has state that tracks what it has most recently seen, to enforce they are emitted in a given order that always produces correct wasm binaries. Also added a new test that compares obj2yaml output from llc (the backend) to that going via .s and the assembler to ensure both paths generate the same binaries. The features this test covers could be extended. Passes all wasm Lit tests. Fixes: Reviewers: sbc100, dschuff, aheejin Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision:
  1436. Portable Python script across Python version Workaround naming and hierarchy changes in BaseHTTPServer and SimpleHTTPServer module. Differential Revision:
  1437. [Hexagon] Update timing classes
  1438. Portable Python script across Python version Python2 supports both backticks and `repr` to access the __repr__ slot. Python3 only supports `repr`. Differential Revision:
  1439. [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds. These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: Fixes
  1440. [cmake] Clean up add_llvm_subdirectory I found the pattern of setting the project_BUILD variable to OFF after processing the project to be pretty confusing. Using global properties to explicitly keep track of whether a project has been processed or not seems much more straightforward, and it also allows us to convert the macro into a function (which is required for the early return). Factor the project+type+name combination out into a variable while I'm here, since it's used a whole bunch of times. I don't believe this should result in any functional changes. Differential Revision:
  1441. [TextAPI] Remove a superfluous semicolon, fixing GCC warnings. NFC.
  1442. [COFF] Remove an outdated/incorrect comment. NFC. Making the section writable doesn't affect how windows does base relocs in case a DLL can't be loaded at the intended base address. This comment dates back to SVN r79346. Differential Revision:
  1443. [COFF] Don't mark mingw .eh_frame sections writable This improves compatibility with GCC produced object files, where the .eh_frame sections are read only. With mixed flags for the involved .eh_frame sections, LLD creates two separate .eh_frame sections in the output binary, one for each flag combination, while ld.bfd probably merges them. The previous setup of flags can be traced back to SVN r79346. Differential Revision:
  1444. Fix compilation failure on Windows. This was introduced earlier but apparently used an incorrect class name so it doesn't compile on Windows.
  1445. [InstCombine] rearrange shuffle+binop fold; NFC This code has a bug dealing with undefs, so we need to add another escape hatch, so doing some cleanup ahead of that.
  1446. [llvm-objcopy] Add --build-id-link-dir flag This flag does not exist in GNU objcopy but has a major use case. Debugging tools support the .build-id directory structure to find debug binaries. There is no easy way to build this structure up however. One way to do it is by using llvm-readelf and some crazy shell magic. This implements the feature directly. It is most often the case that you'll want to strip a file and send the original to the .build-id directory but if you just want to send a file to the .build-id directory you can copy to /dev/null instead. Differential Revision:
  1447. [InstCombine] add tests for shuffle+binop fold; NFC
  1448. [clang-tidy] Fix unordered_map failure with specializing std::hash<> and remove previous wrong attempt at doing so
  1449. [Hexagon] Change instruction type field in TSFlags to 7 bits
  1450. [llvm-tapi] initial commit, supports ELF text stubs TextAPI is a library and accompanying tool that allows conversion between binary shared object stubs and textual counterparts. The motivations and uses cases for this are explained thoroughly in the llvm-dev proposal [1]. This initial commit proposes a potential structure for the TAPI library, also including support for reading/writing text-based ELF stubs (.tbe) in addition to preliminary support for reading binary ELF files. The goal for this patch is to ensure the project architecture appropriately welcomes integration of Mach-O stubbing from Apple's TAPI [2]. Added: - TextAPI library - .tbe read support - .tbe write (to raw_ostream) support [1] [2] Differential Revision:
  1451. [clang-tidy] Recommit: Add the abseil-duration-comparison check Summary: This check finds instances where Duration values are being converted to a numeric value in a comparison expression, and suggests that the conversion happen on the other side of the expression to a Duration. See documentation for examples. This also shuffles some code around so that the new check may perform in sone step simplifications also caught by other checks. Compilation is unbroken, because the hash-function is now directly specified for std::unordered_map, as 'enum class' does not compile as key (seamingly only on some compilers). Patch by hwright. Reviewers: aaron.ballman, JonasToth, alexfh, hokein Reviewed By: JonasToth Subscribers: sammccall, Eugene.Zelenko, xazax.hun, cfe-commits, mgorny Tags: #clang-tools-extra Differential Revision:
  1452. [MachineOutliner] Drop candidates that require fixups if it's beneficial If it's a bigger code size win to drop candidates that require stack fixups than to demote every candidate to that variant, the outliner should do that. This happens if the number of bytes taken by calls to functions that don't require fixups, plus the number of bytes that'd be left is less than the number of bytes that it'd take to emit a save + restore for all candidates. Also add tests for each possible new behaviour. - machine-outliner-compatible-candidates shows that when we have candidates that don't use the stack, we can use the default call variant along with the no save/regsave variant. - machine-outliner-all-stack shows that when it's better to fix up the stack, we still will demote all candidates to that case - machine-outliner-drop-stack shows that we can discard candidates that require stack fixups when it would be beneficial to do so.
  1453. [Hexagon] Add HasV5 predicate for compatibility with auto-generated files
  1454. Fix issue with Tpi Stream hash map. Part of the patch to not build the hash map eagerly was omitted due to a merge conflict. Add it back, which should fix the failing tests.
  1455. Revert "[clang-tidy] Add the abseil-duration-comparison check" This commit broke buildbots and needs adjustments.
  1456. [X86] Fix bad formatting. NFC
  1457. [Hexagon] Remove unused operand definitions, NFC
  1458. [Hexagon] Some formatting changes, NFC
  1459. [clang-tidy] Add the abseil-duration-comparison check Summary: This check finds instances where Duration values are being converted to a numeric value in a comparison expression, and suggests that the conversion happen on the other side of the expression to a Duration. See documentation for examples. This also shuffles some code around so that the new check may perform in sone step simplifications also caught by other checks. Patch by hwright. Reviewers: aaron.ballman, JonasToth, alexfh, hokein Reviewed By: JonasToth Subscribers: sammccall, Eugene.Zelenko, xazax.hun, cfe-commits, mgorny Tags: #clang-tools-extra Differential Revision:
  1460. Don't build the Tpi Hash map by default. This is very slow and should be done for specific cases where lookups will need to happen.
  1461. [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS. Summary: We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly. After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations. This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision:
  1462. [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision:
  1463. Fix non-modular build.
  1464. Adding tests for -ast-dump; NFC. This adds tests for struct and union declarations in C++.
  1465. Update Diagnostic handling for changes in CFE. The clang frontend no longer emits the current working directory for DIFiles containing an absolute path in the filename: and will move the common prefix between current working directory and the file into the directory: component.
  1466. Avoid emitting redundant or unusable directories in DIFile metadata entries. As discussed on llvm-dev recently, Clang currently emits redundant directories in DIFile entries, such as .file 1 "/Volumes/Data/llvm" "/Volumes/Data/llvm/tools/clang/test/CodeGen/debug-info-abspath.c" This patch looks at any common prefix between the compilation directory and the (absolute) file path and strips the redundant part. More importantly it leaves the compilation directory empty if the two paths have no common prefix. After this patch the above entry is (assuming a compilation dir of "/Volumes/Data/llvm/_build"): .file 1 "/Volumes/Data/llvm" "tools/clang/test/CodeGen/debug-info-abspath.c" When building the FileCheck binary with debug info, this patch makes the build artifacts ~1kb smaller. Differential Revision:
  1467. [SimplifyCFG] add tests for cross block compare folding; NFC These are the baseline tests for D54827. Patch based on code originally written by: @yinyuefengyi (luo xionghu) Differential Revision:
  1468. [Serialization][NFC] Remove pointless "+ 0" in ASTReader Remove the pointless "+ 0" which I added for some reason when modifying these statement/expression classes since it looks like this is a typo. Following the suggestion of aaron.ballman in D54902. NFC.
  1469. [CmpInstAnalysis] fix formatting; NFC There are potential improvements to the structure of this API raised by D54994, but remove some cosmetic blemishes before making any functional changes.
  1470. [clangd] Avoid memory-mapping files on Windows Summary: Memory-mapping files on Windows leads to them being locked and prevents editors from saving changes to those files on disk. This is fine for the compiler, but not acceptable for an interactive tool like clangd. Therefore, we choose to avoid using memory-mapped files on Windows. Reviewers: hokein, kadircet Reviewed By: kadircet Subscribers: yvvan, zturner, nik, malaperle, mgorny, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1471. Fix line endings. NFCI.
  1472. [AST][Sema] Remove CallExpr::setNumArgs CallExpr::setNumArgs is the only thing that prevents storing the arguments in a trailing array. There is only 3 places in Sema where setNumArgs is called. D54900 dealt with one of them. This patch remove the other two calls to setNumArgs in ConvertArgumentsForCall. To do this we do the following changes: 1.) Replace the first call to setNumArgs by an assertion since we are moving the responsability to allocate enough space for the arguments from Sema::ConvertArgumentsForCall to its callers (which are Sema::BuildCallToMemberFunction, and Sema::BuildResolvedCallExpr). 2.) Add a new member function CallExpr::shrinkNumArgs, which can only be used to drop arguments and then replace the second call to setNumArgs by shrinkNumArgs. 3.) Add a new defaulted parameter MinNumArgs to CallExpr and its derived classes which specifies a minimum number of argument slots to allocate. The actual number of arguments slots allocated will be max(number of args, MinNumArgs) with the extra args nulled. Note that after the creation of the call expression all of the arguments will be non-null. It is just during the creation of the call expression that some of the last arguments can be temporarily null, until filled by default arguments. 4.) Update Sema::BuildCallToMemberFunction by passing the number of parameters in the function prototype to the constructor of CXXMemberCallExpr. Here the change is pretty straightforward. 5.) Update Sema::BuildResolvedCallExpr. Here the change is more complicated since the type-checking for the function type was done after the creation of the call expression. We need to move this before the creation of the call expression, and then pass the number of parameters in the function prototype (if any) to the constructor of the call expression. 6.) Update the deserialization of CallExpr and its derived classes. Differential Revision: Reviewed By: aaron.ballman
  1473. Fixing -print-module-scope for legacy SCC passes It appears that print-module-scope was not implemented for legacy SCC passes. Fixed to print a whole module instead of just current SCC. Reviewed By: mkazantsev Differential Revision:
  1474. [AArch64] Add command-line option for SSBS Summary: SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds testing for the ssbs command line option, added to allow enabling the feature in previous Armv8-A architectures to 8.5. Reviewers: olista01, samparker, aemerson Reviewed By: samparker Subscribers: javed.absar, kristof.beyls, cfe-commits Differential Revision:
  1475. [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test. A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand
  1476. [SanitizerCommon] Remove RenameFile This function seems to be no longer used by compiler-rt libraries Differential revision:
  1477. [OpenCL][Sema] Improving formatting Reformat comment added in r348120 following review
  1478. [libcxx] Implement P0318: unwrap_ref_decay and unwrap_reference Summary: This was voted into C++20 in San Diego. Note that there was a revision D0318R2 which did include unwrap_reference_t, but we mistakingly voted P0318R1 into the C++20 Working Draft (which does not include unwrap_reference_t). This patch implements D0318R2, which is what we'll end up with in the Working Draft once this mistake has been fixed. Reviewers: EricWF, mclow.lists Subscribers: christof, dexonsmith, libcxx-commits Differential Revision:
  1479. [AArch64] Add command-line option for SSBS Summary: SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SSBS, as it was previously only possible to enable by selecting -march=armv8.5-a. Similar patch upstream in GNU binutils: Reviewers: olista01, samparker, aemerson Reviewed By: samparker Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision:
  1480. [CodeComplete] Cleanup access checking in code completion Summary: Also fixes a crash (see the added 'accessibility-crash.cpp' test). Reviewers: ioeric, kadircet Reviewed By: kadircet Subscribers: cfe-commits Differential Revision:
  1481. [Sema] Avoid CallExpr::setNumArgs in Sema::BuildCallToObjectOfClassType CallExpr::setNumArgs is the only thing that prevents storing the arguments of a call expression in a trailing array since it might resize the argument array. setNumArgs is only called in 3 places in Sema, and for all of them it is possible to avoid it. This deals with the call to setNumArgs in BuildCallToObjectOfClassType. Instead of constructing the CXXOperatorCallExpr first and later calling setNumArgs if we have default arguments, we first construct a large enough SmallVector, do the promotion/check of the arguments, and then construct the CXXOperatorCallExpr. Incidentally this also avoid reallocating the arguments when the call operator has default arguments but this is not the primary goal. Differential Revision: Reviewed By: aaron.ballman
  1482. [clangd] Fix a stale comment, NFC.
  1483. [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed into VOP3 instruction pairs for S_ADD_U64_PSEUDO: V_ADD_I32_e64 V_ADDC_U32_e64 and for S_SUB_U64_PSEUDO V_SUB_I32_e64 V_SUBB_U32_e64 preclude the use of SDWA to encode a constant. SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions, but not on VOP3 instructions. We desire to fold the bit-and operand into the instruction encoding for the V_ADD_I32 instruction. This requires that we transform the VOP3 into a VOP2 form of the instruction (_e32). %19:vgpr_32 = V_AND_B32_e32 255, killed %16:vgpr_32, implicit $exec %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64 %26.sub0:vreg_64, %19:vgpr_32, implicit $exec %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64 %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec which then allows the SDWA encoding and becomes %47:vgpr_32 = V_ADD_I32_sdwa 0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0, implicit-def $vcc, implicit $exec %48:vgpr_32 = V_ADDC_U32_e32 0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec Differential Revision:
  1484. [AST] Fix an uninitialized bug in the bits of FunctionDecl FunctionDeclBits.IsCopyDeductionCandidate was not initialized. This caused a warning with valgrind.
  1485. [clangd] Get rid of AST matchers in CodeComplete, NFC Summary: The isIndexedForCodeCompletion is called in the code patch of SymbolCollector. Reviewers: kadircet Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1486. Portable Python script across Python version Python3 does not support type destructuring in function parameters. Differential Revision:
  1487. [AST][NFC] Pack CXXDeleteExpr Use the newly available space in the bit-fields of Stmt. This saves 8 bytes per CXXDeleteExpr. NFC.
  1488. Portable Python script across version Have all classes derive from object: that's implicitly the default in Python3, it needs to be done explicilty in Python2. Differential Revision:
  1489. Portable Python script across Python version Python2 supports the two following equivalent construct raise ExceptionType, exception_value and raise ExceptionType(exception_value) Only the later is supported by Python3. Differential Revision:
  1490. [Analyzer] Actually check for -model-path being a directory The original patch (r348038) clearly contained a typo and checked for '-ctu-dir' twice.
  1491. [Analysis] Properly prepare test env in test/Analysis/undef-call.c The test expectes the '%T/ctudir' to be present, but does not create it.
  1492. [clang] Do not read from 'test/SemaCXX/Inputs' inside 'test/AST' Our integrate relies on test inputs being taken from the same diretory as the test itself.
  1493. ARM: use target-specific SUBS node when combining cmp with cmov. This has two positive effects. First, using a custom node prevents recombination leading to an infinite loop since the output DAG is notionally a little more complex than the input one. Using a flag-setting instruction also allows the subtraction to be folded with the related comparison more easily.
  1494. [NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision:
  1495. [OpenCL][Sema] Improve BuildResolvedCallExpr handling of builtins Summary: This is a follow-up on, addressing a few issues. This: - adds a FIXME for later improvement for specific builtins: I previously have only checked OpenCL ones and ensured tests cover those. - fixed the CallExpr type. Reviewers: riccibruno Reviewed By: riccibruno Subscribers: yaxunl, Anastasia, kristina, svenvh, cfe-commits Differential Revision:
  1496. [CMake] Add LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR for custom dSYM target directory on Darwin Summary: When using `LLVM_EXTERNALIZE_DEBUGINFO` in LLDB, the default dSYM location for the shared library in LLDB.framework is inside the framework bundle. With `LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR` we can easily fix that. I consider it a useful feature to be able to set a global output directory for external debug info (rather then having a target-specific one). Only implemented for Darwin so far. Reviewers: beanz, aprantl Reviewed By: aprantl Subscribers: mgorny, aprantl, #lldb, lldb-commits, llvm-commits Differential Revision:
  1497. [RISCV] Fix test/MC/Disassembler/RISCV/invalid-instruction.txt after rL347988 The test for [0x00 0x00] failed due to the introduction of c.unimp. This particular test is unnecessary now that c.unimp was defined (and is tested in test/MC/RISCV/rv32c-valid.s).
  1498. [CMake] Store path to vendor-specific headers in clang-headers target property Summary: LLDB.framework wants a copy these headers. With this change LLDB can easily glob for the list of files: ``` get_target_property(clang_include_dir clang-headers RUNTIME_OUTPUT_DIRECTORY) file(GLOB_RECURSE clang_vendor_headers RELATIVE ${clang_include_dir} "${clang_include_dir}/*") ``` By default `RUNTIME_OUTPUT_DIRECTORY` is unset for custom targets like `clang-headers`. Reviewers: aprantl, JDevlieghere, davide, friss, dexonsmith Reviewed By: JDevlieghere Subscribers: mgorny, #lldb, cfe-commits, llvm-commits Differential Revision:
  1499. [llvm-dwarfdump] - Stop printing the bogus empty section name on invalid dwarf. When there is no .debug_addr section for some reason, llvm-dwarfdump would print the bogus empty section name when dumping ranges in .debug_info: DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x0) rangelist = 0x00000004 [0x0000000000000000, 0x0000000000000001) "" [0x0000000000000000, 0x0000000000000002) "") That happens because of the code which uses 0 (zero) as a section index as a default value. The code should use -1ULL instead because technically 0 is a valid zero section index in ELF and -1ULL is a special constant used that means "no section available". This is mostly a fix for the overall correctness/safety of the code, but a test case is provided too. Differential revision:
  1500. [ARM][MC] Move information about variadic register defs into tablegen Currently, variadic operands on an MCInst are assumed to be uses, because they come after the defs. However, this is not always the case, for example the Arm/Thumb LDM instructions write to a variable number of registers. This adds a property of instruction definitions which can be used to mark variadic operands as defs. This only affects MCInst, because MachineInstruction already tracks use/def per operand in each instance of the instruction, so can already represent this. This property can then be checked in MCInstrDesc, allowing us to remove some special cases in ARMAsmParser::isITBlockTerminator. Differential revision:
  1501. [ARM][Asm] Debug trace for the processInstruction loop In the Arm assembly parser, we first match an instruction, then call processInstruction to possibly change it to a different encoding, to match rules in the architecture manual which can't be expressed by the table-generated matcher. This adds debug printing so that this process is visible when using the -debug option. To support this, I've added a new overload of MCInst::dump_pretty which takes the opcode name as a StringRef, since we don't have an InstPrinter instance in the assembly parser. Instead, we can get the same information directly from the MCInstrInfo. Differential revision:
  1502. [KMSAN] Enable -msan-handle-asm-conservative by default This change enables conservative assembly instrumentation in KMSAN builds by default. It's still possible to disable it with -msan-handle-asm-conservative=0 if something breaks. It's now impossible to enable conservative instrumentation for userspace builds, but it's not used anyway.
  1503. [GlobalISel] Fix test irtranslator-stackprotect-check.ll Fix for commit r347862. Use correct AArch64 triple in test CodeGen/AArch64/GlobalISel/irtranslator-stackprotect-check.ll.
  1504. [ARM] FP16: support vld1.16 for vector loads with post-increment Differential Revision:
  1505. [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: steven.zhang Differential Revision:
  1506. [NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not. In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. Differential Revision:
  1507. [gn build] Fix cosmetic bug in Before, #cmakedefine FOO resulted in #define FOO with a trailing space if FOO was set to something truthy. Make it so that it's just #define FOO without a trailing space. No functional difference. Differential Revision:
  1508. [gn build] Slightly simplify write_cmake_config. Before, the script had a bunch of special cases for #cmakedefine and #cmakedefine01 and then did general variable substitution. Now, the script always does general variable substitution for all lines and handles the special cases afterwards. This has no observable effect for the inputs we use, but is easier to explain and slightly easier to implement. Also mention to link to CMake's configure_file() in the docstring. (The new behavior doesn't quite match CMake on lines like #cmakedefine ${FOO}, but nobody does that.) Differential Revision:
  1509. [gn build] Add build files for llvm/lib/Analysis and llvm/lib/ProfileData Differential Revision:
  1510. [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.
  1511. [X86] Fix bad comment. NFC
  1512. Add myself to netbsd buildbot mails
  1513. Replace FullComment member being visited with parameter Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  1514. Extend the CommentVisitor with parameter types Summary: This has precedent in the StmtVisitor. This change will make it possible to clean up the comment handling in ASTDumper. Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  1515. Remove unecessary methods The base class calls VisitExpr
  1516. lldb-amd64-ninja-netbsd8: Enable running tests
  1517. [test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSD Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required for sort implementation on NetBSD. The '-b' modifier is ineffective if specified without any key. Per the manpage: Note that the -b option has no effect unless key fields are specified. Differential Revision:
  1518. [test] Fix ScalarEvolution test to allow __func__ with prototype Fix ScalarEvolution/solve-quadratic.ll test to account for __func__ output listing the complete function prototype rather than just its name, as it does on NetBSD. Example Linux output: GetQuadraticEquation: addrec coeff bw: 4 GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Example NetBSD output: llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): addrec coeff bw: 4 llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Differential Revision:
  1519. [test] Fix BugPoint/compile-custom.ll to use detected python exec Spawn the custom compile command in BugPoint/compile-custom.ll via %python rather than relying on implicit 'env python' shebang, in order to fix it on systems that don't have 'python' executable such as NetBSD. Differential Revision:
  1520. Fix whitespace
  1521. Add dump tests for ArrayInitLoopExpr and ArrayInitIndexExpr
  1522. [ValueTracking] Support funnel shifts in computeKnownBits() If the shift amount is known, we can determine the known bits of the output based on the known bits of two inputs. This is essentially the same functionality as implemented in D54869, but for ValueTracking rather than InstCombine SimplifyDemandedBits. Differential Revision:
  1523. [SelectionDAG] fold constant with undef vector per element This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553).
  1524. [DAGCombiner] guard against an oversized shift crash This change prevents the crash noted in the post-commit comments for rL347478 : We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision:
  1525. [ValueTracking] add helper function for testing implied condition; NFCI We were duplicating code around the existing isImpliedCondition() that checks for a predecessor block/dominating condition, so make that a wrapper call.
  1526. [X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast. Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations.
  1527. [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack. Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.
  1528. [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code.
  1529. OpenCL: Improve vector printf warnings The vector modifier is considered separate, so don't treat it as a conversion specifier. This is still not warning on some cases, like using a type that isn't a valid vector element. Fixes bug 39652
  1530. OpenCL: Extend argument promotion rules to vector types The spec is ambiguous on whether vector types are allowed to be implicitly converted. The only legal context I think this can be used for OpenCL is printf, where it seems necessary.
  1531. [X86] Add vXi8 division/remainder by non-splat constant test cases to prepare for an upcoming patch.
  1532. [MachineOutliner][AArch64] Improve checks for stack instructions If we know that we'll definitely save LR to a register, there's no reason to pre-check whether or not a stack instruction is unsafe to fix up. This makes it so that we check for that condition before mapping instructions. This allows us to outline more, since we don't pessimise as many instructions. Also update some tests, since we outline more.
  1533. Replace w16/w17 in machine-outliner.mir with w11/w12 These registers should not be used here, since they are interprocedural scratch registers in AArch64.
  1534. [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1 Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision:
  1535. Introduce a way to allow the ASan dylib on Darwin platforms to be loaded via `dlopen()`. Summary: The purpose of this option is provide a way for the ASan dylib to be loaded via `dlopen()` without triggering most initialization steps (e.g. shadow memory set up) that normally occur when the ASan dylib is loaded. This new functionality is exposed by - A `SANITIZER_SUPPORTS_INIT_FOR_DLOPEN` macro which indicates if the feature is supported. This only true for Darwin currently. - A `HandleDlopenInit()` function which should return true if the library is being loaded via `dlopen()` and `SANITIZER_SUPPORTS_INIT_FOR_DLOPEN` is supported. Platforms that support this may perform any initialization they wish inside this function. Although disabling initialization is something that could potentially apply to other sanitizers it appears to be unnecessary for other sanitizers so this patch only makes the change for ASan. rdar://problem/45284065 Reviewers: kubamracek, george.karpenkov, kcc, eugenis, krytarowski Subscribers: #sanitizers, llvm-commits Differential Revision:
  1536. [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision:
  1537. [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit. Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it. Differential:
  1538. [llvm-readobj] Improve dynamic section iteration NFC.
  1539. [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision:
  1540. [InstCombine] Support ssub.sat canonicalization for non-splats Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision:
  1541. Correct indentation.
  1542. Specify constant context in constant emitter The constant emitter may need to evaluate the expression in a constant context. For exasmple, global initializer lists.
  1543. [X86] Remove stale FIXME from test case. NFC This was fixed in r346581. I just forgot to remove it.
  1544. [ThinLTO] Allow importing of functions with var args Summary: Follow up to D54270, which allowed importing of var args functions unless they called va_start. As pointed out in the post-commit comments on that patch, the inliner can handle functions that call va_start in certain situations as well. Go ahead and enable importing of all var args functions. Measurements on a large binary show that this increases imports and binary size by an insignificant amount. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision:
  1545. [RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases As noted by Eli Friedman <>, the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. SRAW assumed that (sext_inreg foo, i32) could only be produced when sign-extended an i32. However, it can be produced by input such as: define i64 @tricky_ashr(i64 %a, i64 %b) { %1 = shl i64 %a, 32 %2 = ashr i64 %1, 32 %3 = ashr i64 %2, %b ret i64 %3 } It's important not to select sraw in the above case, because sraw only uses bits lower 5 bits from the shift, while a shift of 32-63 would be valid. Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be produced when zero-extending a value that was originally i32 in LLVM IR. This is obviously incorrect. This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and adds test cases that would demonstrate a miscompile if the incorrect patterns were re-added.
  1546. [clangd] Recommit the "AnyScope" changes in requests.json by rCTE347753 (reverted by rCTE347792) This fixes IndexBenchmark tests.
  1547. [Basic] Move DiagnosticsEngine::dump from .h to .cpp The two LLVM_DUMP_METHOD methods have a undefined reference on clang::DiagnosticsEngine::DiagStateMap::dump. tools/clang/tools/extra/clangd/benchmarks/IndexBenchmark links in clangDaemon but does not link in clangBasic explicitly, which causes a linker error "undefined symbol" in !NDEBUG + -DBUILD_SHARED_LIBS=on builds. Move LLVM_DUMP_METHOD methods to .cpp to fix IndexBenchmark. They should be unconditionally defined as they are also used by non-dump-method #pragma clang __debug diag_mapping
  1548. [projects] Use add_llvm_external_project for implicit projects This allows disabling implicit projects via the LLVM_TOOL_*_BUILD variables, similar to how implicit tools can be disabled. They'll still be enabled by default, since add_llvm_external_project defaults the LLVM_TOOL_*_BUILD variables to ON for in-tree implciit projects. Differential Revision:
  1549. [X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512.
  1550. Relax test to also work on Windows.
  1551. [compiler-rt] Use "ColumnLimit: 0" instead of "clang-format off" in tests Reviewers: eugenis, jfb Subscribers: kubamracek, dberris, llvm-commits Differential Revision:
  1552. Honor -fdebug-prefix-map when creating function names for the debug info. This adds a callback to PrintingPolicy to allow CGDebugInfo to remap file paths according to -fdebug-prefix-map. Otherwise the debug info (particularly function names for C++ lambdas) may contain paths that should have been remapped in the debug info. <rdar://problem/46128056> Differential Revision:
  1553. Use RequireNullTerminator=false in identify_magic. identify_magic does not need the file to be null terminated. Passing true here causes the file reading code to decide not to use mmap in some rare cases (which happen to be true 100% of the time in PDB files) which can lead to very large files failing to load. Since it was probably just an accident that we were passing true here (since it is the default function parameter), this should be strictly an improvement.
  1554. [lit] Add a generic build script with a lit substitution. This adds a script called as well as a lit substitution called %build that we can use to invoke it. The idea is that this allows a lit test to build test inferiors without having to worry about architecture / platform specific differences, command line syntax, finding / configurationg a proper toolchain, and other issues. They can simply write something like: %build --arch=32 -o %t.exe %p/Inputs/foo.cpp and it will just work. This paves the way for being able to run lit tests with multiple configurations, platforms, and compilers with a single test. Differential Revision:
  1555. [NVPTX] Add lowering of i128 numbers as struct fields Addition to D34555 - override VTs computation with ComputePTXValueVTs for struct fields. Author: Denys Zariaiev<> Differential Revision:
  1556. [X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in interleave/strided load/store cost model tests.
  1557. [gn build] Add action to generate VCSRevision.h and use it to add llvm/lib/Object/ Differential Revision:
  1558. Revert "Revert r347417 "Re-Reinstate 347294 with a fix for the failures."" It seems the two failing tests can be simply fixed after r348037 Fix 3 cases in Analysis/builtin-functions.cpp Delete the bad CodeGen/builtin-constant-p.c for now
  1559. [codeview] Remove dead macros for codeview record serialization, NFC These weren't needed when we went to the yaml IO style of serialization, which has "mapOptional".
  1560. LegacyDivergenceAnalysis: fix uninitialized value Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26
  1561. AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision:
  1562. AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision:
  1563. [DA] GPUDivergenceAnalysis for unstructured GPU kernels Summary: This is patch #3 of the new DivergenceAnalysis <> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision:
  1564. [x86] add tests for undef + partial undef constant folding; NFC Keep this file sync'd with the instsimplify version (rL348045).
  1565. [X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512. This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors.
  1566. [InstSimplify] add tests for undef + partial undef constant folding; NFC These tests should probably go under a separate test file because they should fold with just -constprop, but they're similar to the scalar tests already in here.
  1567. [analyzer] Deleting unnecessary test file That I really should've done in rC348031.
  1568. [ValueTracking] Make unit tests easier to write; NFC Generalize the existing MatchSelectPatternTest class to also work with other types of tests. This reduces the amount of boilerplate necessary to write ValueTracking tests in general, and computeKnownBits tests in particular. The inherited convention is that the function must be @test and the tested instruction %A. Differential Revision:
  1569. Support: use std::is_trivially_copyable on MSVC MSVC 2015 and newer have std::is_trivially_copyable available for use. We should prefer that over the std::is_class to get this check be correct.
  1570. Add myself as code owner for OpenBSD driver
  1571. Revert r347417 "Re-Reinstate 347294 with a fix for the failures." Kept the "indirect_builtin_constant_p" test case in test/SemaCXX/constant-expression-cxx1y.cpp while we are investigating why the following snippet fails: extern char extern_var; struct { int a; } a = {__builtin_constant_p(extern_var)};
  1572. [analyzer] Emit an error for invalid -analyzer-config inputs Differential Revision:
  1573. [ExprConstant] Try fixing __builtin_constant_p after D54355 (rC347417) Summary: Reinstate the original behavior (Success(false, E)) before D54355 when this branch is taken. This fixes spurious error of the following snippet: extern char extern_var; struct { int a; } a = {__builtin_constant_p(extern_var)};
  1574. [MachineOutliner] Outline both register save calls + no LR save calls together Instead of treating the outlined functions for these as distinct frames, they should be combined into one case. Neither allows for stack fixups, and both generate the same frame. Thus, they ought to be considered one case. This makes the code far easier to understand, for one thing. It also offers some small code size improvements. It's fairly rare to see a class of outlined functions that doesn't fall entirely into one variant (on CTMark anyway). It does happen from time to time though. This mostly offers some serious simplification. Also update the test to show the added functionality.
  1575. AArch64: Don't emit CFI for SCS register in nounwind functions. All that you can legitimately do with the CFI for a nounwind function is get a backtrace, and adjusting the SCS register is not (currently) required for this purpose. Differential Revision:
  1576. [TableGen] Fix negation of simple predicates Simple predicates, such as those defined by `CheckRegOperandSimple` or `CheckImmOperandSimple`, were not being negated when used with `CheckNot`. This change fixes this issue by defining the previously declared methods to handle simple predicates. Differential revision:
  1577. Adding tests for -ast-dump; NFC. This adds tests for struct and union declarations in C. It also points out a bug when dumping anonymous record types -- they are sometimes reported as being contained by something of the wrong tag type. e.g., an anonymous struct inside of a union named X reports the anonymous struct as being inside of 'struct X' rather than 'union X'.
  1578. Revert r348029. I was git-ing and jumped the gun.
  1579. [analyzer] Evaluate all non-checker config options before analysis In earlier patches regarding AnalyzerOptions, a lot of effort went into gathering all config options, and changing the interface so that potential misuse can be eliminited. Up until this point, AnalyzerOptions only evaluated an option when it was querried. For example, if we had a "-no-false-positives" flag, AnalyzerOptions would store an Optional field for it that would be None up until somewhere in the code until the flag's getter function is called. However, now that we're confident that we've gathered all configs, we can evaluate off of them before analysis, so we can emit a error on invalid input even if that prticular flag will not matter in that particular run of the analyzer. Another very big benefit of this is that debug.ConfigDumper will now show the value of all configs every single time. Also, almost all options related class have a similar interface, so uniformity is also a benefit. The implementation for errors on invalid input will be commited shorty. Differential Revision:
  1580. Revert "Reverting r347949-r347951 because they broke the test bots." This reverts commit 5bad6129c012fbf186eb055be49344e790448ecc. Hopefully fixing the issue which was breaking the bots.
  1581. We're in a constant context in the ConstantEmitter.
  1582. Expect mixed path separators in FileManagerTest when resolving paths on Win32
  1583. Add a new interceptor for getvfsstat(2) from NetBSD Summary: getvfsstat - gets list of all mounted file systems. Add a dedicated test. Reviewers: vitalybuka, joerg Reviewed By: vitalybuka Subscribers: kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  1584. Revert an inadvertent change from r348020.
  1585. [analyzer][PlistMacroExpansion] Part 5.: Support for # and ## From what I can see, this should be the last patch needed to replicate macro argument expansions. Differential Revision:
  1586. [Mem2Reg] Fix nondeterministic corner case Summary: When mem2reg inserts phi nodes in blocks with unreachable predecessors, it adds undef operands for those incoming edges. When there are multiple such predecessors, the order is currently based on the address of the BasicBlocks. This change fixes that by using the BBNumbers in the sort/search predicates, as is done elsewhere in mem2reg to ensure determinism. Also adds a testcase with a bunch of unreachable preds, which (nodeterministically) fails without the fix. Reviewers: majnemer Reviewed By: majnemer Subscribers: mgrang, llvm-commits Differential Revision:
  1587. Updating this test, which changed after the reverts from r348020.
  1588. [DWARFv5] Verify all-or-nothing constraint on DIFile source Update IR verifier to check the constraint that DIFile source is present on all files or no files. Differential Revision:
  1589. [dsymutil] Gather global and local symbol addresses in the main executable. Usually local symbols will have their address described in the debug map. Global symbols have to have their address looked up in the symbol table of the main executable. By playing with 'ld -r' and export lists, you can get a symbol described as global by the debug map while actually being a local symbol as far as the link in concerned. By gathering the address of local symbols, we fix this issue. Also, we prefer a global symbol in case of a name collision to preserve the previous behavior. Note that using the 'ld -r' tricks, people can actually cause symbol names collisions that dsymutil has no way to figure out. This fixes the simple case where there is only one symbol of a given name. rdar://problem/32826621 Differential revision:
  1590. Reverting r347949-r347951 because they broke the test bots.
  1591. [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers. This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.
  1592. [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled. This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.
  1593. Move AST tests into their own test directory; NFC. This moves everything primarily testing the functionality of -ast-dump and -ast-print into their own directory, rather than leaving the tests spread around the testing directory.
  1594. [SelectionDAG] fold FP binops with 2 undef operands to undef
  1595. [clang] Fix rL348006 for windows
  1596. [AMDGPU] Disable SReg Global LD/ST, perf regression Differential Revision:
  1597. Adding tests for -ast-dump; NFC. This adds tests for GenericSelectionExpr; note that it points out a minor whitespace bug for selection expression cases.
  1598. [llvm-mca] Speedup the default resource selection strategy. This patch removes a (potentially) slow while loop in DefaultResourceStrategy::select(). A better (and faster) approach is to do some bit manipulation in order to shrink the range of candidate resources. On a release build, this change gives an average speedup of ~10%.
  1599. [clang] Fill RealPathName for virtual files. Summary: Absolute path information for virtual files were missing even if we have already stat'd the files. This patch puts that information for virtual files that can succesffully be stat'd. Reviewers: ilya-biryukov Subscribers: cfe-commits Differential Revision:
  1600. [clangd] Populate include graph during static indexing action. Summary: This is the second part for introducing include hierarchy into index files produced by clangd. You can see the base patch that introduces structures and discusses the future of the patches in D54817 Reviewers: ilya-biryukov Subscribers: mgorny, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1601. Revert "[BTF] Add BTF DebugInfo" This reverts commit 9c6b970db8bc63b28ce58a129bb1580a6a3c6caf.
  1602. [x86] add tests for fake vector FP ops; NFC
  1603. [BTF] Add BTF DebugInfo This patch adds BPF Debug Format (BTF) as a standalone LLVM debuginfo. The BTF related sections are directly generated from IR. The BTF debuginfo is generated only when the compilation target is BPF. What is BTF? ============ First, the BPF is a linux kernel virtual machine and widely used for tracing, networking and security. BTF is the debug info format for BPF, introduced in the below linux patch in the patch set mentioned in the below lwn article. The BTF format is specified in the above github commit. In summary, its layout looks like struct btf_header type subsection (a list of types) string subsection (a list of strings) With such information, the kernel and the user space is able to pretty print a particular bpf map key/value. One possible example below: Withtout BTF: key: [ 0x01, 0x01, 0x00, 0x00 ] With BTF: key: struct t { a : 1; b : 1; c : 0} where struct is defined as struct t { char a; char b; short c; }; How BTF is generated? ===================== Currently, the BTF is generated through pahole. and available in pahole v1.12 Basically, the bpf program needs to be compiled with -g with dwarf sections generated. The pahole is enhanced such that a .BTF section can be generated based on dwarf. This format of the .BTF section matches the format expected by the kernel, so a bpf loader can just take the .BTF section and load it into the kernel. The .BTF section layout is also specified in this patch: with file include/llvm/BinaryFormat/BTF.h. What use cases this patch tries to address? =========================================== Currently, only the bpf instruction stream is required to pass to the kernel. The kernel verifies it, jits it if configured to do so, attaches it to a particular kernel attachment point, and later executes when a particular event happens. This patch tries to expand BTF to support two more use cases below: (1). BPF supports subroutine calls. During performance analysis, it would be good to differentiate which call is hot instead of just providing a virtual address. This would require to pass a unique identifier for each subroutine to the kernel, the subroutine name is a natual choice. (2). If a particular jitted instruction is hot, we want user to know which source line this jitted instruction belongs to. This would require the source information is available to various profiling tools. Note that in a single ELF file, . there may be multiple loadable bpf programs, . for a particular to-be-loaded bpf instruction stream, its instructions may come from multiple PROGBITS sections, the bpf loader needs to merge them together to a single consecutive insn stream before loading to the kernel. For example: section .text: subroutines funcFoo section _progA: calling funcFoo section _progB: calling funcFoo The bpf loader could construct two loadable bpf instruction streams and load them into the kernel: . _progA funcFoo . _progB funcFoo So per ELF section function offset and instruction offset will need to be adjusted before passing to the kernel, and the kernel essentially expect only one code section regardless of how many in the ELF file. What do we propose and Why? =========================== To support the above two use cases, we propose to add an additional section, .BTF.ext, to the ELF file which is the input of the bpf loader. A different section is preferred since loader may need to manipulate it before loading part of its data to the kernel. The .BTF.ext section has a similar header to the .BTF section and it contains two subsections for func_info and line_info. . the func_info maps the func insn byte offset to a func type in the .BTF type subsection. . the line_info maps the insn byte offset to a line info. . both func_info and line_info subsections are organized by ELF PROGBITS AX sections. pahole is not a good place to implement .BTF.ext as pahole is mostly for structure hole information and more importantly, we want to pass the actual code to the kernel. . bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical as you may not even know who loads this bpf program. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used. LLVM is a good place to implement. . The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. . Another consideration is for JIT. The project like bcc ( use MCJIT to compile a C program into bpf insns and load them to the kernel. The llvm generated BTF sections will be readily available for such cases as well. Design and implementation of emiting .BTF/.BTF.ext sections =========================================================== The BTF debuginfo format is defined. Both .BTF and .BTF.ext sections are generated directly from IR when both "-target bpf" and "-g" are specified. Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target. This patch also contains tests to verify generated .BTF and .BTF.ext sections for all supported types, func_info and line_info subsections. The patch is also tested against linux kernel bpf sample tests and selftests. Signed-off-by: Yonghong Song <> Differential Revision:
  1604. [CodeGen] Prefer static frame index for STATEPOINT liveness args Summary: If a given liveness arg of STATEPOINT is at a fixed frame index (e.g. a function argument passed on stack), prefer to use this fixed location even the address is also in a register. If we use the register it will generate a spill, which is not necessary since the fixed frame index can be directly recorded in the stack map. Patch by Cherry Zhang <>. Reviewers: thanm, niravd, reames Reviewed By: reames Subscribers: cherryyz, reames, anna, arphaman, llvm-commits Differential Revision:
  1605. [SLP]PR39774: Update references of the replaced external instructions. Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision:
  1606. Adding tests for -ast-dump; NFC. This adds tests for DeclStmt and demonstrates that we don't create such an AST node for global declarations currently.
  1607. [gn build] Add build files for llvm/lib/Bitcode/Reader and llvm/lib/MC/MCParser. Differential Revision:
  1608. Adding tests for -ast-dump; NFC. This adds tests for the majority of the functionality around FunctionDecl and CXXMethodDecl.
  1609. [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3) Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision:
  1610. TableGen/ISel: Allow PatFrag predicate code to access captured operands Summary: This simplifies writing predicates for pattern fragments that are automatically re-associated or commuted. For example, a followup patch adds patterns for fragments of the form (add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are automatically commuted to (add $z, (shl $x, $y)), which makes it basically impossible to refer to $x, $y, and $z generically in the PredicateCode. With this change, the PredicateCode can refer to $x, $y, and $z simply as `Operands[i]`. Test confirmed that there are no changes to any of the generated files when building all (non-experimental) targets. Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand Subscribers: wdng, tpr, llvm-commits Differential Revision:
  1611. [RISCV] Add additional CSR instruction aliases (imm. operands) This patch adds CSR instructions aliases for the cases where the instruction takes an immediate operand but the alias doesn't have the i suffix. This is necessary for gas/gcc compatibility. gas doesn't do a similar conversion for fsflags or fsrm, so this should be complete. Differential Revision: Patch by Luís Marques.
  1612. Fix parenthesis warning in IVDescriptors
  1613. Add a new reduction pattern match Adding a new reduction pattern match for vectorizing code similar to TSVC s3111: for (int i = 0; i < N; i++) if (a[i] > b) sum += a[i]; This patch adds support for fadd, fsub and fmull, as well as multiple branches and different (but compatible) instructions (ex. add+sub) in different branches. The difference from the previous patch( is as follows: - Added check of fast-math property of fp-instruction to the previous patch - Fix/add some pattern for if-reduction.ll Differential Revision: Patch by Takahiro Miyoshi <> and Masakazu Ueno <>
  1614. [RISCV] Add UNIMP instruction (32- and 16-bit forms) This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, but still follows the 16-bit instruction form (i.e. bits 0-1 != 11). Until recently unimp was undocumented and supported just by binutils, which printed unimp for either the 16 or 32-bit form. Both forms are now documented <> and binutils now supports c.unimp <>. Differential Revision: Patch by Luís Marques.
  1615. Fix warning about unused variable [NFC]
  1616. [SelectionDAG] Support result type promotion for FLT_ROUNDS_ For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_. Differential Revision:
  1617. [llvm-mca] Simplify code in class Scheduler. NFCI
  1618. [clangd] Penalize destructor and overloaded operators in code completion. Reviewers: hokein Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1619. [clangd] Drop injected class name when class scope is not explicitly specified. Summary: E.g. allow injected "A::A" in `using A::A^` but not in "A^". Reviewers: kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1620. lyzer] [HOTFIX!] SValBuilder crash when `aggressive-binary-operation-simplification` enabled During the review of D41938 a condition check with an early exit accidentally slipped into a branch, leaving the other branch unprotected. This may result in an assertion later on. This hotfix moves this contition check outside of the branch. Differential Revision:
  1621. [SelectionDAG] Support promotion of PREFETCH operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operands of ISD::PREFETCH. Differential Revision:
  1622. [LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783 Terminator folding transform lacks MemorySSA update for memory Phis, while they exist within MemorySSA analysis. They need exactly the same type of updates as regular Phis. Failing to update them properly ends up with inconsistent MemorySSA and manifests in various assertion failures. This patch adds Memory Phi updates to this transform. Thanks to @jonpa for finding this! Differential Revision: Reviewed By: asbirlea
  1623. [SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operand. Differential Revision:
  1624. [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision:
  1625. [NFC] Simplify and reduce tests for PR39783
  1626. [RISCV] Introduce codegen patterns for instructions introduced in RV64I As discussed in the RFC <>, 64-bit RISC-V has i64 as the only legal integer type. This patch introduces patterns to support codegen of the new instructions introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, sraiw, ld, sd. Custom selection code is needed for srliw as SimplifyDemandedBits will remove lower bits from the mask, meaning the obvious pattern won't work: def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32), (SRLIW GPR:$rs1, uimm5:$shamt)>; This is sufficient to compile and execute all of the GCC torture suite for RV64I other than those files using frameaddr or returnaddr intrinsics (LegalizeDAG doesn't know how to promote the operands - a future patch addresses this). When promoting i32 sltu/sltiu operands, it would be more efficient to use sign-extension rather than zero-extension for RV64. A future patch adds a hook to allow this. Differential Revision:
  1627. [docs][AtomicExpandPass] Document the alternate lowering strategy for part-word atomicrmw/cmpxchg D47882, D48130 and D48131 introduce a new lowering strategy for part-word atomicrmw/cmpxchg and uses it to lower these operations for the RISC-V target. Rather than having AtomicExpandPass produce the LL/SC loop in the IR level, it instead calculates the necessary mask values and inserts a target-specific intrinsic, which is lowered at a much later stage (after register allocation). This ensures that architecture-specific restrictions for forward-progress in LL/SC loops can be guaranteed. This patch documents this new AtomicExpandPass functionality. See the previous llvm-dev RFC for more info <>. Differential Revision:
  1628. Fix a use-after-scope bug.
  1629. [clangd] Bump vscode-clangd v0.0.8
  1630. [clangd] Fix junk output in clangd vscode plugin Summary: When using the vscode clangd plugin, lots and lots of junk output is printed to the output window, which constantly reopens itself. Example output: I[11:13:17.733] <-- textDocument/codeAction(4) I[11:13:17.733] --> reply:textDocument/codeAction(4) 0 ms I[11:13:17.937] <-- textDocument/codeAction(5) I[11:13:17.937] --> reply:textDocument/codeAction(5) 0 ms I[11:13:18.557] <-- textDocument/hover(6) I[11:13:18.606] --> reply:textDocument/hover(6) 48 ms This should prevent that from happening. Patch by James Findley! Reviewers: ioeric, ilya-biryukov, hokein Reviewed By: ioeric Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Tags: #clang-tools-extra Differential Revision:
  1631. [X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle.
  1632. [X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we get after DAG combine cleans it up. Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly.
  1633. [ARM] Don't expand sdiv when optimising for minsize Don't expand SDIV with an immediate that is a power of 2 if we optimise for minimum code size. For example: sdiv %1, i32 4 gets expanded to a sequence of 3 instructions, but this is suboptimal for minimum code size so instead we just generate a MOV and a SDIV if integer division is supported. Differential Revision:
  1634. [CodeGen] Fix bugs in BranchFolderPass when debug labels are generated. Skip DBG_VALUE and DBG_LABEL in branch folding algorithms. The bug is reported in Differential Revision:
  1635. [NFC] Refine doxygen format. Differential Revision:
  1636. [SystemZ::TTI] i8/i16 operands extension costs revisited Three minor changes to these extra costs: * For ICmp instructions, instead of adding 2 all the time for extending each operand, this is only done if that operand is neither a load or an immediate. * The operands extension costs for divides removed, because we now use a high cost already for the divide (20). * The costs for lhsr/ashr extra costs removed as this did not seem useful. Review: Ulrich Weigand
  1637. [X86] Fix a couple types in SimplifyDemandedVectorEltsForTargetNode. NFCI We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'.
  1638. [llvm-objcopy] Move elf-specific tests into subfolder In this diff the elf-specific tests are moved into the subfolder llvm-objcopy/ELF (the change was discussed in the comments on A separate code reivew wasn't sent for this change since Phabricator is failing to create such a large diff. Test plan: make check-all make check-llvm-tools make check-llvm-tools-llvm-objcopy
  1639. Revert r344580 "[analyzer] Nullability: Don't detect post factum violation..." Fails under ASan!
  1640. [analyzer] MallocChecker: Avoid redundant transitions. Don't generate a checker-tagged node unconditionally on the first checkDeadSymbols callback when no pointers are tracked. This is a tiny performance optimization; it may change the behavior slightly by making Static Analyzer bail out on max-nodes one node later (which is good) but any test would either break for no good reason or become useless every time someone sneezes. Differential Revision:
  1641. [analyzer] Nullability: Don't detect post factum violation on concrete values. The checker suppresses warnings on paths on which a nonnull value is assumed to be nullable. This probably deserves a warning, but it's a separate story. Now, because dead symbol collection fires in pretty random moments, there sometimes was a situation when dead symbol collection fired after computing a parameter but before actually evaluating call enter into the function, which triggered the suppression when the argument was null in the first place earlier than the obvious warning for null-to-nonnull was emitted, causing false negatives. Only trigger the suppression for symbols, not for concrete values. It is impossible to constrain a concrete value post-factum because it is impossible to constrain a concrete value at all. This covers all the necessary cases because by the time we reach the call, symbolic values should be either not constrained to null, or already collapsed into concrete null values. Which in turn happens because they are passed through the Store, and the respective collapse is implemented as part of getSVal(), which is also weird. Differential Revision:
  1642. [analyzer] Fix the "Zombie Symbols" bug. It's an old bug that consists in stale references to symbols remaining in the GDM if they disappear from other program state sections as a result of any operation that isn't the actual dead symbol collection. The most common example here is: FILE *fp = fopen("myfile.txt", "w"); fp = 0; // leak of file descriptor In this example the leak were not detected previously because the symbol disappears from the public part of the program state due to evaluating the assignment. For that reason the checker never receives a notification that the symbol is dead, and never reports a leak. This patch not only causes leak false negatives, but also a number of other problems, including false positives on some checkers. What's worse, even though the program state contains a finite number of symbols, the set of symbols that dies is potentially infinite. This means that is impossible to compute the set of all dead symbols to pass off to the checkers for cleaning up their part of the GDM. No longer compute the dead set at all. Disallow iterating over dead symbols. Disallow querying if any symbols are dead. Remove the API for marking symbols as dead, as it is no longer necessary. Update checkers accordingly. Differential Revision:
  1643. [analyzer] Fixes after rebase.
  1644. [analyzer] RetainCountChecker for OSObject model the "free" call The "free" call frees the object immediately, ignoring the reference count. Sadly, it is actually used in a few places, so we need to model it. Differential Revision:
  1645. [analyzer] RetainCountChecker: recognize that OSObject can be created directly using an operator "new" Differential Revision:
  1646. [analyzer] Switch retain count checker for OSObject to use OS_* attributes Instead of generalized reference counting annotations. Differential Revision:
  1647. [attributes] Add a family of OS_CONSUMED, OS_RETURNS and OS_RETURNS_RETAINED attributes The addition adds three attributes for communicating ownership, analogous to existing NS_ and CF_ attributes. The attributes are meant to be used for communicating ownership of all objects in XNU (Darwin kernel) and all of the kernel modules. The ownership model there is very similar, but still different from the Foundation model, so we think that introducing a new family of attributes is appropriate. The addition required a sizeable refactoring of the existing code for CF_ and NS_ ownership attributes, due to tight coupling and the fact that differentiating between the types was previously done using a boolean. Differential Revision:
  1648. [analyzer] [NFC] Minor refactoring of RetainCountDiagnostics Move visitors to the implementation file, move a complicated logic into a function. Differential Revision:
  1649. [analyzer] For OSObject, trust that functions starting with Get (uppercase) are also getters. Differential Revision:
  1650. [analyzer] Print a fully qualified name for functions in RetainCountChecker diagnostics Attempt to get a fully qualified name from AST if an SVal corresponding to the object is not available. Differential Revision:
  1651. [analyzer] Add the type of the leaked object to the diagnostic message If the object is a temporary, and there is no variable it binds to, let's at least print out the object name in order to help differentiate it from other temporaries. rdar://45175098 Differential Revision:
  1652. [analyzer] Reference leaked object by name, even if it was created in an inlined function. rdar://45532181 Differential Revision:
  1653. [analyzer] [NFC] Test dumping trimmed graph Differential Revision:
  1654. [analyzer] [NFC] Some miscellaneous clean ups and documentation fixes. Differential Revision:
  1655. Fix build warnings introduced in rL347938 Summary: Suppressed warnings in release builds due to variable used only in assert statement. Subscribers: llvm-commits, eraman, mgorny Differential Revision:
  1656. Revert "Revert r347596 "Support for inserting profile-directed cache prefetches"" Summary: This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9. Fix: correct the use of DenseMap. Reviewers: davidxl, hans, wmi Reviewed By: wmi Subscribers: mgorny, eraman, llvm-commits Differential Revision:
  1657. [CMake] build correctly if build path contains whitespace The add_llvm_symbol_exports function in AddLLVM.cmake creates command line link flags with paths containing CMAKE_CURRENT_BINARY_DIR, but that will break if CMAKE_CURRENT_BINARY_DIR contains whitespace. This patch adds quotes to those paths. Fixes PR39843. Patch by John Garvin. Differential Revision:
  1658. [SCEV] Guard movement of insertion point for loop-invariants r320789 suppressed moving the insertion point of SCEV expressions with dev/rem operations to the loop header in non-loop-invariant situations. This, and similar, hoisting is also unsafe in the loop-invariant case, since there may be a guard against a zero denominator. This is an adjustment to the fix of r320789 to suppress the movement even in the loop-invariant case. This fixes PR30806. Differential Revision:
  1659. Revert r346560 "[winasan] Unpoison the stack in NtTerminateThread" This reverts r343606 again. The NtTerminateThread interceptor is causing problems in NaCl: I reproduced the problem locally and tried my best to debug them, but it's beyond me.
  1660. First part of P0482 - Establish that char8_t is an integral type, and that numeric_limits<char8_t> is valid and sane. (second try)
  1661. [gn build] merge r346978 and r347741.
  1662. [gn build] Set +x bit on .py files in llvm/utils/gn/build. Also add a shebang line to
  1663. [gn build] Add template for running llvm-tblgen and use it to add build file for llvm/lib/IR. Also adds a boring build file for llvm/lib/BinaryFormat (needed by llvm/lib/IR). lib/IR marks Attributes and IntrinsicsEnum as public_deps (because IR's public headers include the generated .inc files), so projects depending on lib/IR will implicitly depend on them being generated. As a consequence, most targets won't have to explicitly list a dependency on these tablegen steps (contrast with intrinsics_gen in the cmake build). This doesn't yet have the optimization where tablegen's output is only updated if it's changed. Differential Revision:
  1664. [-gmodules] Honor -fdebug-prefix-map in the debug info inside PCMs. This patch passes -fdebug-prefix-map (a feature for renaming source paths in the debug info) through to the per-module codegen options and adds the debug prefix map to the module hash. <rdar://problem/46045865> Differential Revision:
  1665. [gn build] Add a script checking if sources in and CMakeLists.txt files match. Also fix a missing file in lib/Support/ found by the script. The script is very stupid and assumes that CMakeLists.txt follow the standard LLVM CMakeLists.txt formatting with one cpp source file per line. Despite its simplicity, it works well in practice. It would be nice if it also checked deps and maybe automatically applied its suggestions. Differential Revision:
  1666. [WebAssembly] Expand unavailable integer operations for vectors Summary: Expands for vector types all of the integer operations that are expanded for scalars because they are not supported at all by WebAssembly. This CL has no tests because such tests would really be testing the target-independent expansion, but I'm happy to add tests if reviewers think it would be helpful. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  1667. Produce an error on non-encodable offsets for darwin ARM scattered relocations. Scattered ARM relocations for Mach-O's only have 24 bits available to encode the offset. This is not checked but just truncated and can result in corrupt binaries after linking because the relocations are applied to the wrong offset. This patch will check and error out in those situations instead of emitting a wrong relocation. Patch by: Sander Bogaert (dzn) Differential revision:
  1668. [libcxx] Make UNSUPPORTED for std::async test more fine grained The test was previously marked as unsupported on all Apple platforms, when we really just want to mark it as unsupported for previously shipped dylibs on macosx.
  1669. [OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker after initialization. Function __kmpc_global_thread_num should be called only after initialization, not earlier.
  1670. Comment tweak requested in code review. NFC I forgot to do this before committing D54755.
  1671. [DAGCombiner] narrow truncated binops The motivating case for this is shown in: and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision:
  1672. [obj2yaml] [COFF] Write RVA instead of VA for sections, fix roundtripping executables yaml2obj writes the yaml value as is to the output file. Differential Revision:
  1673. [OpenMP] Add a new version of the SPMD deinit kernel function Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision:
  1674. [RISCV] Implement codegen for cmpxchg on RV32IA Utilise a similar ('late') lowering strategy to D47882. The changes to AtomicExpandPass allow this strategy to be utilised by other targets which implement shouldExpandAtomicCmpXchgInIR. All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. This is conservative but correct. Differential Revision:
  1675. Adding .vscode to svn:ignore
  1676. [X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead. This seems to produce the same results on the tests we have.
  1677. Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate.
  1678. Introduce MaxUsesToExplore argument to capture tracking Currently CaptureTracker gives up if it encounters a value with more than 20 uses. The motivation for this cap is to keep it relatively cheap for BasicAliasAnalysis use case, where the results can't be cached. Although, other clients of CaptureTracker might be ok with higher cost. This patch introduces an argument for PointerMayBeCaptured functions to specify the max number of uses to explore. The motivation for this change is a downstream user of CaptureTracker, but I believe upstream clients of CaptureTracker might also benefit from more fine grained cap. Reviewed By: hfinkel Differential Revision:
  1679. Revert commit r347904 because it broke older compilers
  1680. [MachineScheduler] Order FI-based memops based on stack direction It makes more sense to order FI-based memops in descending order when the stack goes down. This allows offsets to stay "consecutive" and allow easier pattern matching.
  1681. Revert "NFC: Fix case of CommentVisitor::Visit methods" This reverts commit 0859c80137ac5fb3c86e7802cb8c5ef56f921cce.
  1682. First part of P0482 - Establish that char8_t is an integral type, and that numeric_limits<char8_t> is valid and sane.
  1683. [libcxx] Remove bad_array_length Summary: std::bad_array_length was added by n3467, but this never made it into C++. This commit removes the definition of std::bad_array_length from the headers AND from the shared library. See the comments in the ABI changelog for details about the ABI implications of this change. Reviewers: mclow.lists, dexonsmith, howard.hinnant, EricWF Subscribers: christof, jkorous, libcxx-commits Differential Revision:
  1684. [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision:
  1685. NFC: Fix case of CommentVisitor::Visit methods This difference is very visible because it is used with other Visitor classes together.
  1686. NFC: Move ColorScope to global scope
  1687. NFC: Constify ShowColors
  1688. [X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing. I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1? Differential Revision:
  1689. set default max-page-size to 4KB in lld for Android Aarch64 Summary: This patch passes an option '-z max-page-size=4096' to lld through clang driver. This is for Android on Aarch64 target. The lld default page size is too large for Aarch64, which produces larger .so files and images for arm64 device targets. In this patch we set default page size to 4KB for Android Aarch64 targets instead. Reviewers: srhines, danalbert, ruiu, chh, peter.smith Reviewed By: srhines Subscribers: javed.absar, kristof.beyls, cfe-commits, george.burgess.iv, llozano Differential Revision:
  1690. [InstSimplify] fold select with implied condition This is an almost direct move of the functionality from InstCombine to InstSimplify. There's no reason not to do this in InstSimplify because we never create a new value with this transform. (There's a question of whether any dominance-based transform belongs in either of these passes, but that's a separate issue.) I've changed 1 of the conditions for the fold (1 of the blocks for the branch must be the block we started with) into an assert because I'm not sure how that could ever be false. We need 1 extra check to make sure that the instruction itself is in a basic block because passes other than InstCombine may be using InstSimplify as an analysis on values that are not wired up yet. The 3-way compare changes show that InstCombine has some kind of phase-ordering hole. Otherwise, we would have already gotten the intended final result that we now show here.
  1691. Simplify the __builtin_constant_p test that was used to catch rC347417 failure Reviewers: rsmith, void, shafik Reviewed By: void Subscribers: kristina, cfe-commits Differential Revision:
  1692. [TableGen] Examine entire subreg compositions to detect ambiguity When tablegen detects that there exist two subregister compositions that result in the same value for some register, it will emit a warning. This kind of an overlap in compositions should only happen when it is caused by a user-defined composition. It can happen, however, that the user- defined composition is not identically equal to another one, but it does produce the same value for one or more registers. In such cases suppress the warning. This patch is to silence the warning when building the System Z backend after D50725. Differential Revision:
  1693. [GlobalISel] LegalizationArtifactCombiner: Combine aext([asz]ext x) -> [asz]ext x Summary: Replace `aext([asz]ext x)` with `aext/sext/zext x` in order to reduce the number of instructions generated to clean up some legalization artifacts. Reviewers: aditya_nandakumar, dsanders, aemerson, bogner Reviewed By: aemerson Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision:
  1694. Add missing REQUIRES to new test Test added in r347887 requires an x86 target.
  1695. [llvm-objcopy] Delete redundant !Config.xx.empty() when followed by positive is_contained() check Summary: The original intention of !Config.xx.empty() was probably to emphasize the thing that is currently considered, but I feel the simplified form is actually easier to understand and it is also consistent with the call sites in other llvm components. Reviewers: alexshap, rupprecht, jakehehrlich, jhenderson, espindola Reviewed By: alexshap, rupprecht Subscribers: emaste, arichardson, llvm-commits Differential Revision:
  1696. Avoid redundant reference to isPodLike in SmallVect/Optional implementation NFC, preparatory work for isPodLike cleaning. Differential Revision:
  1697. [LICM] Reapply r347776 "Make LICM able to hoist phis" with fix This commit caused a large compile-time slowdown in some cases when NDEBUG is off due to the dominator tree verification it added. Fix this by only doing dominator tree and loop info verification when something has been hoisted. Differential Revision:
  1698. [analyzer][PlistMacroExpansion] Part 4.: Support for __VA_ARGS__ Differential Revision:
  1699. [ThinLTO] Allow importing of multiple symbols with same GUID Summary: The is the clang side of the fix in D55047, to handle the case where two different modules have local variables with the same GUID because they had the same source file name at compilation time. Allow multiple symbols with the same GUID to be imported, and test that this case works with the distributed backend path. Depends on D55047. Reviewers: evgeny777 Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, cfe-commits Differential Revision:
  1700. [ThinLTO] Import local variables from the same module as caller Summary: We can sometimes end up with multiple copies of a local variable that have the same GUID in the index. This happens when there are local variables with the same name that are in different source files having the same name/path at compile time (but compiled into different bitcode objects). In this case make sure we import the copy in the caller's module. This enables importing both of the variables having the same GUID (but which will have different promoted names since the module paths, and therefore the module hashes, will be distinct). Importing the wrong copy is particularly problematic for read only variables, since we must import them as a local copy whenever referenced. Otherwise we get undefs at link time. Note that the llvm-lto.cpp and ThinLTOCodeGenerator changes are needed for testing the distributed index case via clang, which will be sent as a separate clang-side patch shortly. We were previously not doing the dead code/read only computation before computing imports when testing distributed index generation (like it was for testing importing and other ThinLTO mechanisms alone). Reviewers: evgeny777 Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits Differential Revision:
  1701. git-llvm: Fix incremental population of svn tree. "svn update --depth=..." is, annoyingly, not a specification of the desired depth, but rather a _limit_ added on top of the "sticky" depth in the working-directory. However, if the directory doesn't exist yet, then it sets the sticky depth of the new directory entries. Unfortunately, the svn command-line has no way of expanding the depth of a directory from "empty" to "files", without also removing any already-expanded subdirectories. The way you're supposed to increase the depth of an existing directory is via --set-depth, but --set-depth=files will also remove any subdirs which were already requested. This change avoids getting into the state of ever needing to increase the depth of an existing directory from "empty" to "files" in the first place, by: 1. Use svn update --depth=files, not --depth=immediates. The latter has the effect of checking out the subdirectories and marking them as depth=empty. The former excludes sub-directories from the list of entries, which avoids the problem. 2. Explicitly populate missing parent directories. Using --parents seemed nice and easy, but it marks the parent dirs as depth=empty. Instead, check out parents explicitly if they're missing.
  1702. [SimplifyCFG] auto-generate complete checks; NFC
  1703. [InstCombine] auto-generate complete checks; NFC
  1704. [AMDGPU] Add and update scalar instructions This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential:
  1705. Fix: Add support for TFE/LWE in image intrinsic My change svn-id: 347871 caused a buildbot failure due to an unused variable def (used in an assert). Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336
  1706. [libcxx] More fixes to XFAILs for aligned allocation tests for macosx 10.13 Those tests are a real pain to tweak.
  1707. Revert r347823 "[TextAPI] Switch back to a custom Platform enum." It broke the Windows buildbots, e.g. This also reverts the follow-ups: r347824, r347827, and r347836.
  1708. Mark __builtin_shufflevector as using custom type checking The custom handling seems to all be implemented already. This avoids regressions in a future patch when float vectors are ordinarily promoted to double vectors in variadic calls.
  1709. [CallSiteSplitting] Report edge deletion to DomTreeUpdater Summary: When splitting musttail calls, the split blocks' original terminators get removed; inform the DTU when this happens. Also add a testcase that fails an assertion in the DTU without this fix. Reviewers: fhahn, junbuml Reviewed By: fhahn Subscribers: llvm-commits Differential Revision:
  1710. Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
  1711. inhereit LLVM_ENABLE_LIBXML2 Summary: When building in an LLVM context, we should respect its LLVM_ENABLE_LIBXML2 option. Reviewers: vitalybuka, mspertus, modocache Reviewed By: modocache Subscribers: mgorny, cfe-commits Differential Revision:
  1712. [CVP] tidy processCmp(); NFC 1. The variables were confusing: 'C' typically refers to a constant, but here it was the Cmp. 2. Formatting violations. 3. Simplify code to return true/false constant.
  1713. Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details.
  1714. [CVP] auto-generate complete test checks; NFC
  1715. [OpenCL] Improve diags for addr spaces in templates Fix ICEs on template instantiations that were leading to the creation of invalid code patterns with address spaces. Incorrect cases are now diagnosed properly. Differential Revision:
  1716. Revert r347596 "Support for inserting profile-directed cache prefetches" It causes asserts building BoringSSL. See for repro. This also reverts the follow-ups: Revert r347724 "Do not insert prefetches with unsupported memory operands." Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596" Revert r347607 "Add new passes to X86 pipeline tests"
  1717. Set MustBuildLookupTable on PrimaryContext in ExternalASTMerger Summary: `MustBuildLookupTable` must always be called on a primary context as we otherwise trigger an assert, but we don't ensure that this will always happen in our code right now. This patch explicitly requests the primary context when doing this call as this shouldn't break anything (as calling `getPrimaryContext` on a context which is its own primary context is a no-op) but will catch these rare cases where we somehow operate on a declaration context that is not its own primary context. See also D54863. Reviewers: martong, a.sidorin, shafik Reviewed By: martong Subscribers: davide, rnkovacs, cfe-commits Tags: #lldb Differential Revision:
  1718. [GlobalISel] Fix insertion of stack-protector epilogue * Tell the StackProtector pass to generate the epilogue instrumentation when GlobalISel is enabled because GISel currently does not implement the same deferred epilogue insertion as SelectionDAG. * Update StackProtector::InsertStackProtectors() to find a stack guard slot by searching for the llvm.stackprotector intrinsic when the prologue was not created by StackProtector itself but the pass still needs to generate the epilogue instrumentation. This fixes a problem when the pass would abort because the stack guard AllocInst pointer was null when generating the epilogue -- test CodeGen/AArch64/GlobalISel/arm64-irtranslator-stackprotect.ll. Differential Revision:
  1719. [GlobalISel] Make EnableGlobalISel always set when GISel is enabled Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision:
  1720. Adding a FIXME test to document an area for improvement with the cert-err58-cpp check; NFC.
  1721. [llvm-rc] Support EXSTYLE statement. Patch by Jacek Caban! Differential Revision:
  1722. [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision:
  1723. AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo Summary: MachineLoopInfo cannot be relied on for correctness, because it cannot properly recognize loops in irreducible control flow which can be introduced by late machine basic block optimization passes. See the new test case for the reduced form of an example that occurred in practice. Use a simple fixpoint iteration instead. In order to facilitate this change, refactor WaitcntBrackets so that it only tracks pending events and registers, rather than also maintaining state that is relevant for the high-level algorithm. Various accessor methods can be removed or made private as a consequence. Affects (in radv): - dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex} Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU") Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision:
  1724. AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points Summary: There is one obsolete reference to using -1 as an indication of "unknown", but this isn't actually used anywhere. Using unsigned makes robust wrapping checks easier. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam Differential Revision:
  1725. AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision:
  1726. AMDGPU/InsertWaitcnts: Simplify pending events tracking Summary: Instead of storing the "score" (last time point) of the various relevant events, only store whether an event is pending or not. This is sufficient, because whenever only one event of a count type is pending, its last time point is naturally the upper bound of all time points of this count type, and when multiple event types are pending, the count type has gone out of order and an s_waitcnt to 0 is required to clear any pending event type (and will then clear all pending event types for that count type). This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK. I do not understand what this special handling ever attempted to achieve. It has existed ever since the original port from an internal code base, so my best guess is that it solved a problem related to EXEC handling in that internal code base. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision:
  1727. AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types Summary: It hides the type casting ugliness, and I happened to have to add a new such loop (in a later patch). Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision:
  1728. AMDGPU/InsertWaitcnts: Untangle some semi-global state Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision:
  1729. [CODE_OWNERS] Add myself as code owner for MinGW
  1730. [NFC] Add two XFAIL tests from PR39783
  1731. Disable TermFolding in LoopSimplifyCFG until PR39783 is fixed
  1732. [LoopStrengthReduce] ComplexityLimit as an option Convert ComplexityLimit into a command line value. Differential Revision:
  1733. [Inliner] Modify the merging of min-legal-vector-width attribute to better handle when the caller or callee don't have the attribute. Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute. If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute.
  1734. [Inliner] Add test for merging of min-legal-vector-width function attribute. This should have been added in r337844, but apparently was I failed to 'git add' the file.
  1735. [CGP] Improve compile time for complex addressing mode This is a fix for PR39625 with improvement the compile time by reducing the number of intermediate Phi nodes created. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision:
  1736. Revert "[TextAPI] Fix a memory leak in the TBD reader."
  1737. [TextAPI] Fix a memory leak in the TBD reader. This fixes an issue where we were leaking the YAML document if there was a parsing error.
  1738. [TextAPI] Switch back to a custom Platform enum. Moving to PlatformType from BinaryFormat had some UB fallout when handing unknown platforms or malformed input files. This should fix the sanitizer bots.
  1739. [X86] Correct comment. NFC
  1740. Add Hurd target to Clang driver (2/2) This adds Hurd toolchain support to Clang's driver in addition to handling translating the triple from Hurd-compatible form to the actual triple registered in LLVM. (Phabricator was stripping the empty files from the patch so I manually created them) Patch by sthibaul (Samuel Thibault) Differential Revision:
  1741. Add Hurd target to LLVMSupport (1/2) Add the required target triples to LLVMSupport to support Hurd in LLVM (formally `pc-hurd-gnu`). Patch by sthibaul (Samuel Thibault) Differential Revision:
  1742. [PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection Summary: A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative so for signed comparisons vs. unsigned ones, the input operands just need to be swapped. Reviewed By: steven.zhang Differential Revision:
  1743. [PowerPC] [NFC] Add test cases to the ISD::BR_CC node in the instruction selection Add the following test case for the ISD::BR_CC node in the instruction selection define i64 @testi64slt(i64 %c1, i64 %c2, i64 %c3, i64 %c4, i64 %a1, i64 %a2) #0 { entry: %cmp1 = icmp eq i64 %c3, %c4 %cmp3tmp = icmp eq i64 %c1, %c2 %cmp3 = icmp slt i1 %cmp3tmp, %cmp1 br i1 %cmp3, label %iftrue, label %iffalse iftrue: ret i64 %a1 iffalse: ret i64 %a2 } The data type i64 can be replaced by i32, i64, float, double
 And condition codes can be replaced by: SETEQ, SETEN, SELT, SETLE, SETGT, SETGE,SETULT, SETULE, SSETGT, and SETUGE Reviewed By: steven.zhang Differential Revision:
  1744. [TextAPI] TBD Reader/Writer (bot fixes: take 2) Replace the tuple with a struct to work around an explicit constructor bug.
  1745. NFC. Use unsigned type for uses counter in CaptureTracking
  1746. [Documentation] Try to fix build failure in cppcoreguidelines-narrowing-conversions documentation
  1747. [TextAPI] TBD Reader/Writer (bot fixes) Trying if switching from a vector to an array will appeas the bots.
  1748. [TextAPI] TBD Reader/Writer Add basic infrastructure for reading and writting TBD files (version 1 - 3). The TextAPI library is not used by anything yet (besides the unit tests). Tool support will be added in a separate commit. The TBD format is currently documented in the implementation file (TextStub.cpp). Update: This contains changes to fix issues discovered by the bots: - add parentheses to silence warnings. - rename variables - use PlatformType from BinaryFormat
  1749. [x86] try select simplification for target-specific nodes This failed to select (which might be a separate bug) in X86ISelDAGToDAG because we try to create a select node that can be simplified away after rL347227. This change avoids the problem by simplifying the SHRUNKBLEND node sooner. In the test case, we manage to realize that the true/false values of the select (SHRUNKBLEND) are the same thing, so it simplifies away completely.
  1750. Revert "Move internal usages of `alignof`/`__alignof` to use `_LIBCPP_ALIGNOF`. " This reverts commit 087f065cb0c7463f521a62599884493aaee2ea12. The tests were failing on 32 bit builds, and I don't have time to clean them up right now. I'll recommit tomorrow with fixed tests.
  1751. Ensure that test clang-tidy/export-relpath.cpp works with Windows path separators.
  1752. Allow cpu-dispatch forward declarations. As a followup to r347805, allow forward declarations of cpu-dispatch and cpu-specific for the same reasons. Change-Id: Ic1bde9be369b1f8f1d47d58e6fbdc2f9dfcdd785
  1753. Ensure sanitizer check function calls have a !dbg location Function calls without a !dbg location inside a function that has a DISubprogram make it impossible to construct inline information and are rejected by the verifier. This patch ensures that sanitizer check function calls have a !dbg location, by carrying forward the location of the preceding instruction or by inserting an artificial location if necessary. This fixes a crash when compiling the attached testcase with -Os. rdar://problem/45311226 Differential Revision: Note: This reapllies r344915, modified to reuse the IRBuilder's DebugLoc if one exists instead of picking the one from CGDebugInfo since the latter may get reset when emitting thunks such as block helpers in the middle of emitting another function.
  1754. Revert "[TextAPI] TBD Reader/Writer" Reverting to unbreak bots.
  1755. [TextAPI] TBD Reader/Writer Add basic infrastructure for reading and writting TBD files (version 1 - 3). The TextAPI library is not used by anything yet (besides the unit tests). Tool support will be added in a separate commit. The TBD format is currently documented in the implementation file (TextStub.cpp).
  1756. [DebugInfo] NFC Clang test changes for: IR/Bitcode changes for DISubprogram flags. Differential Revision:
  1757. [DebugInfo] IR/Bitcode changes for DISubprogram flags. Packing the flags into one bitcode word will save effort in adding new flags in the future. Differential Revision:
  1758. Correct 'target' default behavior on redecl, allow forward declaration. Declarations without the attribute were disallowed because it would be ambiguous which 'target' it was supposed to be on. For example: void ___attribute__((target("v1"))) foo(); void foo(); // Redecl of above, or fwd decl of below? void ___attribute__((target("v2"))) foo(); However, a first declaration doesn't have that problem, and erroring prevents it from working in cases where the forward declaration is useful. Additionally, a forward declaration of target==default wouldn't properly cause multiversioning, so this patch fixes that. The patch was not split since the 'default' fix would require implementing the same check for that case, followed by undoing the same change for the fwd-decl implementation. Change-Id: I66f2c5bc2477bcd3f7544b9c16c83ece257077b0
  1759. [Coverage] Specify the Itanium ABI triple for a C++ test
  1760. [Coverage] Do not visit artificial stmts in defaulted methods (PR39822) There is no reason to emit coverage mappings for artificial statements contained within defaulted methods, as these statements are not visible to users. Only emit a mapping for the body of the defaulted method (clang treats the text of the "default" keyword as the body when reporting locations). This allows users to see how often the default method is called, but trims down the coverage mapping by skipping visitation of the children of the method. The immediate motivation for this change is that the lexer's getPreciseTokenLocEnd API cannot return the correct location when given an artificial statement (with a somewhat made-up location) as an input. Test by Orivej Desh! Fixes
  1761. Reapply "[llvm-mca] Return the total number of cycles from method Pipeline::run()." This reapplies r347767 (originally reviewed at: with a fix for the missing std::move of the Error returned by the call to Pipeline::runCycle(). Below is the original commit message from r347767. If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case).
  1762. Revert "[ASTImporter] Changed use of Import to Import_New in ASTImporter." This broke the lldb bots.
  1763. [OPENMP]Fix emission of the target regions in virtual functions. Fixed emission of the target regions found in the virtual functions. Previously we may end up with the situation when those regions could be skipped.
  1764. Revert "[clang-tools-extra] r347753 - [clangd] Build and test IndexBenchmark in check-clangd" This revision was causing failures on the buildbots, and our internal CI. See:
  1765. [NFC] Move MultIversioning::Type into Decl so that it can be used in CodeGen Change-Id: I32b14edca3501277e0e65672eafe3eea38c6f9ae
  1766. Fix bad _LIBCPP_ALIGNOF test
  1767. Implement P0966 - string::reserve should not shrink
  1768. Move internal usages of `alignof`/`__alignof` to use `_LIBCPP_ALIGNOF`. Summary: Starting in Clang 8.0 and GCC 8.0, `alignof` and `__alignof` return different values in same cases. Specifically `alignof` and `_Alignof` return the minimum alignment for a type, where as `__alignof` returns the preferred alignment. libc++ currently uses `__alignof` but means to use `alignof`. See This patch introduces the macro `_LIBCPP_ALIGNOF` so we can control which spelling gets used. This patch does not introduce any ABI guard to provide the old behavior with newer compilers. However, if we decide that is needed, this patch makes it trivial to implement. I think we should commit this change immediately, and decide what we want to do about the ABI afterwards. Reviewers: ldionne, EricWF Reviewed By: EricWF Subscribers: christof, libcxx-commits Differential Revision:
  1769. [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision:
  1770. [X86] Add some cost model entries for sext/zext for avx512bw This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision:
  1771. [X86] Add a combine for back to back VSRAI instructions Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision:
  1772. [libcxx] Remove dynarray Summary: std::dynarray had been proposed for C++14, but it was pulled out from C++14 and there are no plans to standardize it anymore. Reviewers: mclow.lists, EricWF Subscribers: mgorny, christof, jkorous, dexonsmith, arphaman, libcxx-commits Differential Revision:
  1773. [DebugInfo] Give inlinable calls DILocs (PR39807) In PR39807 we incorrectly handle circumstances where calls are common'd from conditional blocks into the parent BB. Calls that can be inlined must always have DebugLocs, however we strip them during commoning, which the IR verifier asserts on. Fix this by using applyMergedLocation: it will perform the same DebugLoc stripping of conditional Locs, but will also generate an unknown location DebugLoc that satisfies the requirement for inlinable calls to always have locations. Some of the prior logic for selecting a DebugLoc is now likely redundant; I'll generate a follow-up to remove it (involves editing more regression tests). Differential Revision:
  1774. [libcxx] Use clang-verify in the lit test suite even when availability is enabled
  1775. [gcov] Disable instrprof-gcov-fork.test. Test has been flaky for over a week and author hasn't fixed.
  1776. [LICM] Enable control flow hoisting by default Differential Revision:
  1777. [analyzer] Cleanup constructors in the Z3 backend Summary: Left only the constructors that are actually required, and marked the move constructors as deleted. They are not used anymore and we were never sure they've actually worked correctly. Reviewers: george.karpenkov, NoQ Reviewed By: george.karpenkov Subscribers: xazax.hun, baloghadamsoftware, szepet, a.sidorin, Szelethus, donat.nagy, dkrupp Differential Revision:
  1778. [LICM] Reapply r347190 "Make LICM able to hoist phis" with fix This commit caused failures because it failed to correctly handle cases where we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We need to make sure that we rehoist the use to _after_ the hoisted phi, which we do by always rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader. An option is also added to control whether control flow is hoisted, which is off in this commit but will be turned on in a subsequent commit. Differential Revision:
  1779. Revert [llvm-mca] Return the total number of cycles from method Pipeline::run(). This reverts commits 347767.
  1780. [RISCV] Support .option push and .option pop This adds support in the RISCVAsmParser the storing of Subtarget feature bits to a stack so that they can be pushed/popped to enable/disable multiple features at once. Differential Revision: Patch by Lewis Revill.
  1781. [InstCombine] Combine saturating add/sub with constant operands Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of
  1782. [InstCombine] Canonicalize ssub.sat to sadd.sat Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of
  1783. [ValueTracking] Determine always-overflow condition for unsigned sub Always-overflow was already determined for unsigned addition, but not subtraction. This patch establishes parity. This allows us to perform some additional simplifications for signed saturating subtractions. This change is part of
  1784. [InstCombine] Use known overflow information for saturating add/sub If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of
  1785. [InstCombine] Canonicalize const arg for saturating adds If a saturating add intrinsic has one constant argument, make sure it is on the RHS. This will simplify further transformations. This change is part of
  1786. [Hexagon] Add missing flags to ELF YAMLIO
  1787. [llvm-mca] Return the total number of cycles from method Pipeline::run(). If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case).
  1788. llvm-git: More tweaks. On python3, use bytes for reading and applying the patch file, rather than str. This fixes encoding issues when applying patches with python3.X (reported by zturner). Also, simplify and speed up "svn update" via svn's "--parents" argument, instead of manually computing and supplying the list of parent directories to update.
  1789. [libcxx] Apply _LIBCPP_INLINE_VISIBILITY for std::hash for string_view
  1790. Fix DynamicLibraryTests build on Windows when LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is ON (introduced in D18826) expects all of its library arguments to be in the same directory - typically <config>/lib. DynamicLibraryLib.lib is instead to be found in lib/<config>. This patch intended to make DynamicLibraryLib.lib be created in <config>/lib alongside most of the other libraries. I previously tried passing absolute paths to but this generated command lines that were too long for Visual Studio 2015: D54587 Differential Revision:
  1791. [ThinLTO] Correct linkonce_any function import linkage. NFC. Summary: This is a NFC as we do not import non-odr vague linkage when computing for import list for a module. Reviewers: tejohnson, pcc Subscribers: inglorion, dexonsmith, llvm-commits Differential Revision:
  1792. Fix build error due to missing cctype include in ARMTargetParser.cpp.
  1793. Fix false positive with lambda assignments in cert-err58-cpp. This check is about preventing exceptions from being thrown before main() executes, and assigning a lambda (rather than calling it) to a global object cannot throw any exceptions.
  1794. [clang-tidy] Added a test -export-fixes with relative paths. Summary: A test for D51864. Reviewers: ioeric, steveire Reviewed By: steveire Subscribers: xazax.hun, cfe-commits Differential Revision:
  1795. [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized. Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision:
  1796. Re-commit r347419 "Update call to EvaluateAsInt() to the new syntax."
  1797. Re-commit r347417 "Re-Reinstate 347294 with a fix for the failures." This was reverted in r347656 due to me thinking it caused a miscompile of Chromium. Turns out it was the Chromium code that was broken.
  1798. [clangd] Fix test broken in r347754.
  1799. [clangd] Less penalty for cross-namespace completions.
  1800. [clangd] Build and test IndexBenchmark in check-clangd Summary: Include IndexBenchmark in check-clangd to make sure we won't forget to update it when doing breaking changes; also fix an out-of-date test input. Reviewers: ilya-biryukov Subscribers: mgorny, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1801. [ASTImporter] Changed use of Import to Import_New in ASTImporter. Reviewers: a.sidorin, shafik, a_sidorin Reviewed By: a_sidorin Subscribers: gamesh411, a_sidorin, dkrupp, martong, Szelethus, cfe-commits Differential Revision:
  1802. Fix -Winfinite-recursion compile error.
  1803. Fix build of r347741 by adding missing vector include to ARMTargetParser.h.
  1804. [MachineScheduler] Add support for clustering mem ops with FI base operands Before this patch, the following stores in `merge_fail` would fail to be merged, while they would get merged in `merge_ok`: ``` void use(unsigned long long *); void merge_fail(unsigned key, unsigned index) { unsigned long long args[8]; args[0] = key; args[1] = index; use(args); } void merge_ok(unsigned long long *dst, unsigned a, unsigned b) { dst[0] = a; dst[1] = b; } ``` The reason is that `getMemOpBaseImmOfs` would return false for FI base operands. This adds support for this. Differential Revision:
  1805. [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision:
  1806. Fix a false-positive with cert-err58-cpp. If a variable is declared constexpr then its initializer needs to be a constant expression, and thus, cannot throw. This check is about not throwing exceptions before main() runs, and so it doesn't apply if the initializer cannot throw. This silences the diagnostic when initializing a constexpr variable and fixes PR35457.
  1807. [DebugInfo] Rename EmitDebugThreadLocal back to EmitDebugValue. NFC This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr for PCOffset. In that case the expression is unrelated to thread locals and so emitting a value of the DIEExpr does not have to always mean emit-debug-thread-local.
  1808. [TableGen] Better error checking for TIED_TO constraints. There are quite strong constraints on how you can use the TIED_TO constraint between MC operands, many of which are currently not checked until compiler run time. MachineVerifier enforces that operands can only be tied together in pairs (no three-way ties), and MachineInstr::tieOperands enforces that one of the tied operands must be an output operand (def) and the other must be an input operand (use). Now we check these at TableGen time, so that if you violate any of them in a new instruction definition, you find out immediately, instead of having to wait until you compile something that makes code generation hit one of those assertions. Also in this commit, all the error reports in ParseConstraint now include the name and source location of the def where the problem happened, so that if you do trigger any of these errors, it's easier to find the part of your TableGen input where you made the mistake. The trunk sources already build successfully with this additional error check, so I think no in-tree target has any of these problems. Reviewers: fhahn, lhames, nhaehnle, MatzeB Reviewed By: MatzeB Subscribers: llvm-commits Differential Revision:
  1809. [ARM, AArch64] Move ARM/AArch64 target parsers into separate files to enable future changes. This moves ARM and AArch64 target parsing into their own files. They are still accessible through TargetParser.h as before. Several functions in AArch64 which were just forwarders to ARM have been removed. All except AArch64::getFPUName were unused, and that was only used in a test. Which itself was overlapping one in ARM, so it has also been removed. Differential revision:
  1810. [clangd] Canonicalize file path in URIForFile. Summary: File paths in URIForFile can come from index or local AST. Path from index goes through URI transformation and the final path is resolved by URI scheme and could be potentially different from the original path. Hence, we should do the same transformation for all paths. We do this in URIForFile, which now converts a path to URI and back to a canonicalized path. Reviewers: sammccall Reviewed By: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1811. [clangd] Fix backward-compatibility - follow-up to textDocument/SymbolInfo Apparently clang 3.6 couldn't build the preceding patch.
  1812. [clangd] Bump vscode-clangd v0.0.7
  1813. [SystemZ::TTI] Improve cost for compare of i64 with extended i32 load CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value in memory. This patch makes such a load considered foldable and so gets a 0 cost. Review: Ulrich Weigand
  1814. [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory. AH, SH and MH costs are already covered in the cases where LHS is 32 bits and RHS is 16 bits of memory sign-extended to i32. As these instructions are also used when LHS is i16, this patch recognizes that the loads will get folded then as well. Review: Ulrich Weigand
  1815. [SystemZ::TTI] Improved cost values for comparison against memory. Single instructions exist for i8 and i16 comparisons of memory against a small immediate. This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1. Review: Ulrich Weigand
  1816. [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap. Since byte-swapping loads and stores are supported, a 'load -> bswap' or 'bswap -> store' sequence should have the cost of one. Review: Ulrich Weigand
  1817. [llvm-objcopy] Hook up the -V alias to --version, output "GNU strip" This allows libtool to detect the presence of llvm-strip and use it with the options --strip-debug and --strip-unneeded. Also hook up the -V alias for objcopy. Differential Revision:
  1818. PR39809: (const void*)0 is not a null pointer constant in C.
  1819. PR12884: Add test (bug is already fixed).
  1820. Move LoopHint.h from Sema to Parse struct LoopHint was only used within Parse and not in any of the Sema or Codegen files. In the non-Parse files where it was included, it either wasn't used or LoopHintAttr was used, so its inclusion did nothing.
  1821. [CodeGen] Fix included headers. Remove the included Parse header because CodeGen should not depend on Parse. Instead, include the Lex headers that it needs instead.
  1822. [diagtool] Remove unneeded header includes.
  1823. Do not insert prefetches with unsupported memory operands. Summary: Ignore advices where the memory operand of the 'anchor' instruction uses unsupported register types. Reviewers: davidxl Subscribers: llvm-commits Differential Revision:
  1824. [OPENMP] remove redundant ColonExpected flag in ParseOpenMP.cpp (NFC) The flag ColonExpected is not changed after being initialized to false at declaration. Patch by Ahsan Saghir Differential Revision:
  1825. [X86] Add test cases to show that we don't properly take -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext. The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost.
  1826. [RISCV] Mark unit tests as "requires: riscv-registered-target" Some of these tests break if the RISCV backend has not been built. Reland D54816.
  1827. [X86] Add exhaustive cost model testing for sext/zext for all vector types we reasonably support. Add cost model tests for truncating to vXi1. Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost. Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables.
  1828. Fix typo in "[clang][ARC] Fix test for commit r347699"
  1829. [OPENMP][NVPTX]Basic support for reductions across the teams. Added basic codegen support for the reductions across the teams.
  1830. [MS] Push outermost class DeclContexts only in -fdelayed-template-parsing This is more or less a complete rewrite of r347627, and it fixes PR38460 I added a reduced test case to DelayedTemplateParsing.cpp.
  1831. [libcxx] Make sure the re-export logic works when paths contain spaces
  1832. [libcxx] Fix libc++ re-exporting logic when Command Line Tools are not installed Summary: When the Xcode Command Line tools are not installed but CMAKE_OSX_SYSROOT is set, we would try to re-export symbols from the libc++abi.dylib shipped in the sysroot, which does not exist. This commit changes the build on OS X to always re-export symbols from the explicit re-export lists, which doesn't change depending on what system you're building on, and is therefore much less flaky. Reviewers: EricWF, mclow.lists Subscribers: mgorny, christof, jkorous, dexonsmith, libcxx-commits Differential Revision:
  1833. [TableGen] Improve readability of generated code (NFC) Improve the readability of the generated code for `MCOpcodeSwitchStatement`.
  1834. [TableGen] Refactor macro names (NFC) Make the names for the macros for `TargetInstrInfo` uniform.
  1835. [clang][ARC] Fix test for commit r347699
  1836. [yaml2obj] Treat COFF/ARM64 as a 64 bit architecture Differential Revision:
  1837. [gn build] Add enough build files to be able to build llvm-tblgen. Adds build files for: - llvm/lib/DebugInfo/CodeView - llvm/lib/DebugInfo/MSF - llvm/lib/MC - llvm/lib/TableGen - llvm/utils/TableGen All the build files just list sources and deps and are uninteresting. Differential Revision:
  1838. [clang][slh] add attribute for speculative load hardening Summary: Resubmit this with no changes because I think the build was broken by a different diff. ----- The prior diff had to be reverted because there were two tests that failed. I updated the two tests in this diff clang/test/Misc/pragma-attribute-supported-attributes-list.test clang/test/SemaCXX/attr-speculative-load-hardening.cpp ----- Summary from Previous Diff (Still Accurate) ----- LLVM IR already has an attribute for speculative_load_hardening. Before this commit, when a user passed the -mspeculative-load-hardening flag to Clang, every function would have this attribute added to it. This Clang attribute will allow users to opt into SLH on a function by function basis. This can be applied to functions and Objective C methods. Reviewers: chandlerc, echristo, kristof.beyls, aaron.ballman Subscribers: llvm-commits Differential Revision:
  1839. [InstCombine] Add tests for saturating add/sub; NFC These are baseline tests for D54534.
  1840. [clang][ARC] Add ARCTargetInfo Based-on-patch-by: Pete Couperus <> Differential Revision:
  1841. [X86] Add cost model tests for experimental.vector.reduce.* with -x86-experimental-vector-widening-legalization
  1842. [X86] Add cost model test for masked load an store with -x86-experimental-vector-widening-legalization
  1843. [X86] Add cost model tests for fp_to_int/int_to_fp with -x86-experimental-vector-widening-legalization
  1844. [X86] Add cost model tests for shifts with -x86-experimental-vector-widening-legalization.
  1845. Don't speculatively emit VTTs for classes unless we are able to correctly emit references to all the functions they will (directly or indirectly) reference. Summary: This fixes a miscompile where we'd emit a VTT for a class that ends up referencing an inline virtual member function that we can't actually emit a body for (because we never instantiated it in the current TU), which in a corner case of a corner case can lead to link errors. Reviewers: rjmccall Subscribers: cfe-commits Differential Revision:
  1846. [lit] Pass more environment variables through to child processes. This arose when I was trying to have a substitution which invoked a python script P, and that python script tried to invoke clang-cl (or even cl). Since we invoke P with a custom environment, it doesn't inherit the environment of the parent, and then when we go to invoke clang-cl, it's unable to find the MSVC installation directory. There were many more I could have passed through which are set by vcvarsall, but I tried to keep it simple and only pass through the important ones. Differential Revision:
  1847. Add missing error checking code intended for r347687
  1848. Revert "[RISCV] Mark unit tests as "requires: riscv-registered-target"" This reverts commit 1a6a0c9ea2716378d55858c11adf5941608531f8.
  1849. [RISCV] Mark unit tests as "requires: riscv-registered-target" Summary: Some of these tests break if the RISCV backend has not been built. Reviewers: asb, apazos, sabuasal Reviewed By: sabuasal Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, cfe-commits Differential Revision:
  1850. [PDB] Add symbol records in bulk Summary: This speeds up linking clang.exe/pdb with /DEBUG:GHASH by 31%, from 12.9s to 9.8s. Symbol records are typically small (16.7 bytes on average), but we processed them one at a time. CVSymbol is a relatively "large" type. It wraps an ArrayRef<uint8_t> with a kind an optional 32-bit hash, which we don't need. Before this change, each DbiModuleDescriptorBuilder would maintain an array of CVSymbols, and would write them individually with a BinaryItemStream. With this change, we now add symbols that happen to appear contiguously in bulk. For each .debug$S section (roughly one per function), we allocate two copies, one for relocation, and one for realignment purposes. For runs of symbols that go in the module stream, which is most symbols, we now add them as a single ArrayRef<uint8_t>, so the vector DbiModuleDescriptorBuilder is roughly linear in the number of .debug$S sections (O(# funcs)) instead of the number of symbol records (very large). Some stats on symbol sizes for the curious: PDB size: 507M sym bytes: 316,508,016 sym count: 18,954,971 sym byte avg: 16.7 As future work, we may be able to skip copying symbol records in the linker for realignment purposes if we make LLVM write them aligned into the object file. We need to double check that such symbol records are still compatible with link.exe, but if so, it's definitely worth doing, since my profile shows we spend 500ms in memcpy in the symbol merging code. We could potentially cut that in half by saving a copy. Alternatively, we could apply the relocations *after* we iterate the symbols. This would require some careful re-engineering of the relocation processing code, though. Reviewers: zturner, aganea, ruiu Subscribers: hiraditya, llvm-commits Differential Revision:
  1851. [TableGen] Preprocessing support Differential Revision:
  1852. [ASTImporter] Added Import functions for transition to new API. Summary: These Import_New functions should be used in the ASTImporter, and the old Import functions should not be used. Later the Import_New should be renamed to Import again and the old Import functions must be removed. But this can happen only after LLDB was updated to use the new Import interface. This commit is only about introducing the new Import_New functions. These are not implemented now, only calling the old Import ones. Reviewers: shafik, rsmith, a_sidorin, a.sidorin Reviewed By: a_sidorin Subscribers: spyffe, a_sidorin, gamesh411, shafik, rsmith, dkrupp, martong, Szelethus, cfe-commits Differential Revision:
  1853. [X86] Replace an APInt that is guaranteed to be 8-bits with just an 'unsigned' We're already mixing this APInt with other 'unsigned' variables. This allows us to use regular comparison operators instead of needing to use APInt::ult or APInt::uge. And it removes a later conversion from APInt to unsigned. I might be adding another combine to this function and this will probably simplify the logic required for that.
  1854. [PartialInliner] Make PHIs free in cost computation. InlineCost also treats them as free and the current implementation can cause assertion failures if PHI nodes are moved outside the region from entry BBs to the region. It also updates the code to use the instructionsWithoutDebug iterator. Reviewers: davidxl, davide, vsk, graham-yiu-huawei Reviewed By: davidxl Differential Revision:
  1855. [X86] Add -march=cascadelake support in clang. This is skylake-avx512 with the addition of avx512vnni ISA. Patch by Jianping Chen Differential Revision:
  1856. [X86] Add cascade lake arch in X86 target. This is skylake-avx512 with the addition of avx512vnni ISA. Patch by Jianping Chen Differential Revision:
  1857. Documentation: add \file markup as needed. This makes Doxygen correctly associate the doc comment with the current file rather than adding to the documentation for namespace llvm.
  1858. Fix linker option for -fprofile-arcs -ftest-coverage Summary: Linux toolchain accidentally added "-u__llvm_runtime_variable" when "-fprofile-arcs -ftest-coverage", this is not added when "--coverage" option is used. Using "-u__llvm_runtime_variable" generates an empty default.profraw file while an application built with "-fprofile-arcs -ftest-coverage" is running. Reviewers: calixte, marco-c, sylvestre.ledru Reviewed By: marco-c Subscribers: vsk, cfe-commits Differential Revision:
  1859. Revert "[clang] - Simplify tools::SplitDebugName." This reverts commit r347035 as it introduced assertion failures under certain conditions. More information can be found here:
  1860. [clangd] textDocument/SymbolInfo extension New method returning symbol info for given source position. Differential Revision: rdar://problem/46050281
  1861. [clangd][NFC] Move SymbolID to a separate file Prerequisity for textDocument/SymbolInfo Differential Revision:
  1862. Implement P1085R2 - Should Span be Regular?. This consists entirely of deletions
  1863. [clang-tidy] Ignore bool -> single bit bitfield conversion in readability-implicit-bool-conversion Summary: There is no ambiguity / information loss in this conversion Reviewers: alexfh, aaron.ballman, hokein Reviewed By: alexfh Subscribers: xazax.hun, cfe-commits Differential Revision:
  1864. [Demangle] remove itaniumFindTypesInMangledName Summary: This (very specialized) function was added to enable an LLDB use case. Now that a more generic interface (overriding of parser functions - D52992) is available, and LLDB has been converted to use that (D54074), the function is unused and can be removed. Reviewers: erik.pilkington, sgraenitz, rsmith Subscribers: mgorny, hiraditya, christof, libcxx-commits, llvm-commits Differential Revision:
  1865. [clangd] Put direct headers into srcs section. Summary: Currently, there's no way of knowing about header files using compilation database, since it doesn't contain header files as entries. Using this information, restoring from cache using compile commands becomes possible instead of doing directory traversal. Also, we can issue indexing actions for out-of-date headers even if source files depending on them haven't changed. Reviewers: sammccall Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1866. [llvm-mca] pass -dispatch-stats flag to a couple of tests. NFC This change is in preparation for a patch that fixes PR36666. llvm-mca currently doesn't know if a buffered processor resource describes a load or store queue. So, any dynamic dispatch stall caused by the lack of load/store queue entries is normally reported as a generic SCHEDULER stall. See for example the -dispatch-stats output from the two tests modified by this patch. In future, processor models will be able to tag processor resources that are used to describe load/store queues. That information would then be used by llvm-mca to correctly classify dynamic dispatch stalls caused by the lack of tokens in the LS.
  1867. [x86] regenerate checks; NFC
  1868. [AMDGPU] Disable DAG combine at -O0 Differential Revision:
  1869. Derive builtin return type from its definition Summary: Prior to this patch, OpenCL code such as the following would attempt to create a BranchInst with a non-bool argument: if (enqueue_kernel(get_default_queue(), 0, nd, ^(void){})) /* ... */ This patch is a follow up on a similar issue with pipe builtin operations. See commit r280800 and This change, while being conservative on non-builtin functions, should set the type of expressions invoking builtins to the proper type, instead of defaulting to `bool` and requiring manual overrides in Sema::CheckBuiltinFunctionCall. In addition to tests for enqueue_kernel, the tests are extended to check other OpenCL builtins. Reviewers: Anastasia, spatel, rsmith Reviewed By: Anastasia Subscribers: kristina, cfe-commits, svenvh Differential Revision:
  1870. Revert r347419 "Update call to EvaluateAsInt() to the new syntax." It's pre-requisite was reverted in r347656.
  1871. Revert r347417 "Re-Reinstate 347294 with a fix for the failures." This caused a miscompile in Chrome (see that's illustrated by this small reduction: static bool f(int *a, int *b) { return !__builtin_constant_p(b - a) || (!(b - a)); } int arr[] = {1,2,3}; bool g() { return f(arr, arr + 3); } $ clang -O2 -S -emit-llvm -o - g() should return true, but after r347417 it became false for some reason. This also reverts the follow-up commits. r347417: > Re-Reinstate 347294 with a fix for the failures. > > Don't try to emit a scalar expression for a non-scalar argument to > __builtin_constant_p(). > > Third time's a charm! r347446: > The result of is.constant() is unsigned. r347480: > A __builtin_constant_p() returns 0 with a function type. r347512: > isEvaluatable() implies a constant context. > > Assume that we're in a constant context if we're asking if the expression can > be compiled into a constant initializer. This fixes the issue where a > __builtin_constant_p() in a compound literal was diagnosed as not being > constant, even though it's always possible to convert the builtin into a > constant. r347531: > A "constexpr" is evaluated in a constant context. Make sure this is reflected > if a __builtin_constant_p() is a part of a constexpr.
  1872. [clangd] Prevent thread starvation in tests on loaded systems. Summary: Background index deliberately runs low-priority, but for tests this may stop them making progress. Reviewers: kadircet Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jfb, cfe-commits Differential Revision:
  1873. [libclang] Fix clang_Cursor_getNumArguments and clang_Cursor_getArgument for CXXConstructExpr Constructors have the same methods for arguments as call expressions. Let's provide a way to get their arguments the same way. Differential Revision:
  1874. InstCombine: add comment explaining malloc deletion. NFC. I tried to change this, not quite realising the logic behind what we were doing. Hopefully this comment will help the next person to come along.
  1875. [clang-tidy] Avoid inconsistent notes in readability-container-size-empty When a warning is issued in a template instantiation, the check would previously use template arguments in a note, which would result in inconsistent or duplicate warnings (depending on how deduplication was done). This patch removes template arguments from the note.
  1876. [clang-tidy] Minor fixes in a test Use CHECK-FIXES where it was intended instead of CHECK-MESSAGES. Fixed compiler warnings to pacify YouCompleteMe.
  1877. [ASTImporter] Typedef import brings in the complete type Summary: When we already have an incomplete underlying type of a typedef in the "To" context, and the "From" context has the same typedef, but the underlying type is complete, then the imported type should be complete. Fixes an assertion in CTU analysis of Xerces: Assertion `DD && "queried property of class with no definition"' failed. This assert is happening in the analyzer engine, because that attempts to query an underlying type of a typedef, which happens to be incomplete. Reviewers: a_sidorin, a.sidorin Subscribers: rnkovacs, dkrupp, Szelethus, cfe-commits Differential Revision:
  1878. [CMake] Add a missing case of TO_CMAKE_PATH This fixes building sanitizers for mingw natively.
  1879. Add missing REQUIRES: asserts
  1880. [X86] Add test cases for vector shifts of v2i32/v2i16/v4i16/v2i8/v4i8/v8i8 with promotion legalization and widening legalization. NFC
  1881. [X86] Use getUnpackl/getUnpackh instead of directly creating UNPCKL/UNPCKH nodes.
  1882. [LoopSimplifyCFG] Turn on term folding after underlying bug fixed
  1883. [LoopSimplifyCFG] Fix corner case with duplicating successors It fixes a bug that doesn't update Phi inputs of the only live successor that is in the list of block's successors more than once. Thanks @uabelho for finding this. Differential Revision: Reviewed By: anna
  1884. [gn build] Merge r347530 to gn.
  1885. Move a file I forgot to move in r347636.
  1886. [gn build] Create abi-breaking.h, config.h, llvm-config.h, and add a build file for llvm/lib/Support. The comments at the top of llvm/utils/gn/secondary/llvm/include/llvm/Config/ and llvm/utils/gn/build/ should explain the main bits happening in this patch. The main parts here are that these headers are generated at build time, not gn time, and that currently they don't do any actual feature checks but just hardcode most things based on the current OS, which seems to work well enough. If this stops being enough, the feature checks should each be their own action writing the result to somewhere, and the config write step should depend on those checks (so that they can run in parallel and as part of the build) -- utils/llvm/gn/README.rst already has some more words on that in "Philosophy". ( is also going to be used to write clang's clang/include/clang/Config/config.h) This also adds a few files for linking to system libraries in a consistent way if needed in llvm/utils/gn/build/libs (and moves pthread to that model).0 I'm also adding llvm/utils/gn/secondary/llvm/lib/Target/targets.gni in this patch because $native_arch is needed for writing llvm-config.h -- the rest of it will be used later, when the build files for llvm/lib/Target get added. That file describes how to select which archs to build. As a demo, also add a build file for llvm-undname and make it the default build target (it depends on everything that can currently be built). Differential Revision:
  1887. [clangd] NFC: Prefer `isa<>` to `dyn_cast<>` to do the checking. Summary: Prefer `isa<>` to `dyn_cast<>` when there only need a checking. Reviewers: ilya-biryukov, MaskRay Reviewed By: ilya-biryukov, MaskRay Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits, MTC Differential Revision:
  1888. [docs] UBSan and ASan are supported on Windows Also fix a bullet list. Fixes PR39775
  1889. [X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets. If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk. We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears.
  1890. [X86] Add a bunch of test cases for storing a scalar bitcasted from a vXi1 type. Currently a store combine will absorb the bitcast before our combine that turns bitcasts into movmsk gets a chance to run. This results in a store being created with a vXi1 type. Type legalization then promotes the input type and makes this a truncating store. Then we badly scalarize this store. Currently we avoid this on v8i1->i8 bitcasts due to an incompletely qualified(per the original intention) check in isLoadBitCastBeneficial. An easy fix is to disable this for all vXi1->iX bitcasts on pre-avx512 targets. We'll still generate terrible code if the IR explicitly contains a store of vXi1 without a bitcast. We could probably solve that by just turning all stores of vXi1 into (store (iX (bitcast))) as an early DAG combine.
  1891. Revert r347627 "[MS] Push fewer DeclContexts for delayed template parsing" It broke the Windows self-host: I can build lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachinePostDominators.cpp.obj to repro.
  1892. [analyzer][PlistMacroExpansion] Part 3.: Macro arguments are expanded This part focuses on expanding macro arguments. Differential Revision:
  1893. Revert "[clang][slh] add attribute for speculative load hardening" until I figure out why the build is failing or timing out *************************** Summary: The prior diff had to be reverted because there were two tests that failed. I updated the two tests in this diff clang/test/Misc/pragma-attribute-supported-attributes-list.test clang/test/SemaCXX/attr-speculative-load-hardening.cpp LLVM IR already has an attribute for speculative_load_hardening. Before this commit, when a user passed the -mspeculative-load-hardening flag to Clang, every function would have this attribute added to it. This Clang attribute will allow users to opt into SLH on a function by function basis. This can be applied to functions and Objective C methods. Reviewers: chandlerc, echristo, kristof.beyls, aaron.ballman Subscribers: llvm-commits Differential Revision: This reverts commit a5b3c232d1e3613f23efbc3960f8e23ea70f2a79. (r347617)
  1894. [MS] Push fewer DeclContexts for delayed template parsing Only push the outermost record as a DeclContext when parsing a function body. See the comments in Sema::getContainingDC about the way the parser pushes contexts. This is intended to match the behavior the parser normally displays where it parses all method bodies from all nested classes at the end of the outermost class, when all nested classes are complete. Fixes PR38460.
  1895. [stack-safety] Update comment
  1896. [stack-safety] Fix and uncomment assert
  1897. [stack-safety] Fix build on gcc 5.4
  1898. Fix filtering of sanitizer_common unittest architectures on Darwin.
  1899. [InstCombine] add tests for rotate/bswap equality; NFC
  1900. [clang][slh] add attribute for speculative load hardening Summary: The prior diff had to be reverted because there were two tests that failed. I updated the two tests in this diff clang/test/Misc/pragma-attribute-supported-attributes-list.test clang/test/SemaCXX/attr-speculative-load-hardening.cpp ----- Summary from Previous Diff (Still Accurate) ----- LLVM IR already has an attribute for speculative_load_hardening. Before this commit, when a user passed the -mspeculative-load-hardening flag to Clang, every function would have this attribute added to it. This Clang attribute will allow users to opt into SLH on a function by function basis. This can be applied to functions and Objective C methods. Reviewers: chandlerc, echristo, kristof.beyls, aaron.ballman Subscribers: llvm-commits Differential Revision:
  1901. Fix debug build break Comment out an assertion from D54543 which failed with error: no member named 'Range' in '(anonymous namespace)::PassAsArgInfo'.
  1902. Notify the linker when a TU compiled with split-stack has a function without a prologue. More context here:
  1903. Remove trailing empty line
  1904. [stack-safety] Analysis documentation Summary: Basic documentation of the Stack Safety Analysis. It will be improved during review and upstream of an implementation. Reviewers: kcc, eugenis, vlad.tsyrklevich, glider Reviewed By: vlad.tsyrklevich Subscribers: arphaman, llvm-commits Differential Revision:
  1905. [stack-safety] Inter-Procedural Analysis implementation Summary: IPA is implemented as module pass which produce map from Function or Alias to StackSafetyInfo for a single function. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, mgrang, llvm-commits Differential Revision:
  1906. [stack-safety] Empty local passes for Stack Safety Global Analysis Reviewers: eugenis, vlad.tsyrklevich Subscribers: hiraditya, llvm-commits Differential Revision:
  1907. AArch64ISelLowering: Remove a return-of-assignment to allow NRVO Patch by Arthur O'Dwyer!
  1908. Remove duplicate _LIBCPP_INLINE_VISIBILITY attributes. This attribute should appear only on the first declaration. This patch cleans up <string> by removing the attribute on redeclarations.
  1909. Add new passes to X86 pipeline tests Summary: Fixes test failures introduced by rL347596. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision:
  1910. [X86] Add dependency from X86 to ProfileData after rL347596
  1911. [ICP] Remove incompatible attributes at indirect-call promoted callsites. Summary: Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in at least a IR verification error. Reviewers: davidxl, xur, mssimpso Subscribers: llvm-commits Differential Revision:
  1912. [InstCombine] add helper function to reduce code duplication; NFC
  1913. [stack-safety] Local analysis implementation Summary: Analysis produces StackSafetyInfo which contains information with how allocas and parameters were used in functions. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, llvm-commits Differential Revision:
  1914. [stack-safety] Empty local passes for Stack Safety Local Analysis Reviewers: eugenis, vlad.tsyrklevich Subscribers: mgorny, hiraditya, llvm-commits Differential Revision:
  1915. [cfi] Help sanstats to find binary if they are not at the original location Summary: By default sanstats search binaries at the same location where they were when stats was collected. Sometime you can not print report immediately or you need to move post-processing to another workstation. To support this use-case when original binary is missing sanstats will fall-back to directory with sanstats file. Reviewers: pcc Subscribers: llvm-commits Differential Revision:
  1916. [cfi] Make sanstats print address of the check Summary: Help with off-line symbolization or other type debugging. Reviewers: pcc Subscribers: llvm-commits Differential Revision:
  1917. [AArch64] Refactor the scheduling predicates (3/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasExtendedReg()`. Differential revision:
  1918. [AArch64] Refactor the scheduling predicates (2/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasShiftedReg()`. Differential revision:
  1919. [AArch64] Refactor the scheduling predicates (1/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::isScaledAddr()` Differential revision:
  1920. Support for inserting profile-directed cache prefetches Summary: Support for profile-driven cache prefetching (X86) This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (, and LLVM. A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: The recommender makes prefetch recommendations in terms of: * the binary offset of an instruction with a memory operand; * a delta; * and a type (nta, t0, t1, t2) meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access. For example: 0x400ab2,64,nta and assuming the instruction at 0x400ab2 is: movzbl (%rbx,%rdx,1),%edx means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such: prefetchnta 0x40(%rbx,%rdx,1) movzbl (%rbx, %rdx, 1), %edx The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well): 1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information). 2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept). 3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations. 4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3. Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4. The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism. Reviewers: davidxl, wmi, craig.topper Reviewed By: davidxl, wmi, craig.topper Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits Differential Revision:
  1921. AMDGPU: Record SGPR spills when restoring too It's possible in some cases to have a restore present without a corresponding spill. Due to an apparent bug in D54366 <>, only the restore for a register was emitted. It's probably always a bug for this to happen, but due to how SGPR spilling is implemented, this makes the issues appear worse than it is.
  1922. [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision:
  1923. [ThinLTO] Consolidate cache key computation between new/old LTO APIs Summary: The old legacy LTO API had a separate cache key computation, which was a subset of the cache key computation in the new LTO API (from what I can tell this is largely just because certain features such as CFI, dsoLocal, etc are only utilized via the new LTO API). However, having separate computations is unnecessary (much of the code is duplicated), and can lead to bugs when adding new optimizations if both cache computation algorithms aren't updated properly - it's much easier to maintain if we have a single facility. This patch refactors the old LTO API code to use the cache key computation from the new LTO API. To do this, we set up an lto::Config object and fill in the fields that the old LTO was hashing (the others will just use the defaults). There are two notable changes: - I added a Freestanding flag to the LTO Config. Currently this is only used by the legacy LTO API. In the patch that added it (D30791) I had asked about adding it to the new LTO API, but it looks like that was not addressed. This should probably be discussed as a follow up to this change, as it is orthogonal. - The legacy LTO API had some code that was hashing the GUID of all preserved symbols defined in the module. I looked back at the history of this (which was added with the original hashing in the legacy LTO API in D18494), and there is a comment in the review thread that it was added in preparation for future internalization. We now do the internalization of course, and that is handled in the new LTO API cache key computation by hashing the recorded linkage type of all defined globals. Therefore I didn't try to move over and keep the preserved symbols handling. Reviewers: steven_wu, pcc Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, dang, llvm-commits Differential Revision:
  1924. [SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking through an add/or We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C) Differential Revision:
  1925. [X86] Add test case for D54818
  1926. Add basic_string::__resize_default_init (from P1072) This patch adds an implementation of __resize_default_init as described in P1072R2. Additionally, it uses it in filesystem to demonstrate its intended utility. Once P1072 lands, or if it changes it's interface, I will adjust the internal libc++ implementation to match.
  1927. Revert "[clang][slh] add attribute for speculative load hardening" This reverts commit 801eaf91221ba6dd6996b29ff82659ad6359e885.
  1928. [clang][slh] add attribute for speculative load hardening Summary: LLVM IR already has an attribute for speculative_load_hardening. Before this commit, when a user passed the -mspeculative-load-hardening flag to Clang, every function would have this attribute added to it. This Clang attribute will allow users to opt into SLH on a function by function basis. This can be applied to functions and Objective C methods. Reviewers: chandlerc, echristo Subscribers: llvm-commits Differential Revision:
  1929. [libcxx] Fix XFAILs for aligned allocation tests In r339743, I marked several aligned allocation tests as downright unsupported on macosx in an attempt to unbreak the build. It turns out that marking them as unuspported whenever we're on OS X is way too coarse grained. This commit marks the tests as XFAIL with more granularity.
  1930. [CodeGen] Support custom format of stack maps Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision:
  1931. [OPENMP][NVPTX]Emit default locations with the correct Exec|Runtime modes. If the region is inside target|teams|distribute region, we can emit the locations with the correct info for execution mode and runtime mode. Patch adds this ability to the NVPTX codegen to help the optimizer to produce better code.
  1932. [clang][slh] Forward mSLH only to Clang CC1 Summary: -mno-speculative-load-hardening isn't a cc1 option, therefore, before this change: clang -mno-speculative-load-hardening hello.cpp would have the following error: error: unknown argument: '-mno-speculative-load-hardening' This change will only ever forward -mspeculative-load-hardening which is a CC1 option based on which flag was passed to clang. Also added a test that uses this option that fails if an error like the above is ever thrown. Thank you ericwf for help debugging and fixing this error. Reviewers: chandlerc, EricWF Subscribers: llvm-commits Differential Revision:
  1933. Delete dead code introduced in r347354. ParentTy is never used other than an assignment, and since it is a pointer, there is no side effect. Some versions of GCC notice and warn on this. Change-Id: I37dc1a18c7b58040419afb803621de13d8904a8f
  1934. [libcxx] Fix XFAIL for aligned deallocation test with trunk Clang The test was marked as failing whenever the deployment target was 10.12 or older, but in reality the test passes when the deployment target is 10.12 on recent Clangs. This happens because only older clangs do not honor the -faligned-allocation flag, which disables any availability error related to aligned allocation support, regardless of the deployment target.
  1935. [NFC] Replace magic numbers with CodeGenOpt enums Use enum values from llvm/Support/CodeGen.h for the optimisation levels in CompilerInvocation.
  1936. AMDGPU: Cleanup / relax tests for future changes
  1937. [clangd] Do not drop diagnostics from macros if they still end up being in the main file.
  1938. AMDGPU: Don't optimize exec masks at -O0
  1939. AMDGPU: Only add implicit super-reg def for first subreg
  1940. [AArch64] Add aarch64_vector_pcs function attribute to Clang This is the Clang patch to complement the following LLVM patches: More information describing the vector ABI and procedure call standard can be found here:\ hpc/arm-compiler-for-hpc/vector-function-abi Patch by Kerry McLaughlin. Reviewed By: rjmccall Differential Revision:
  1941. [clang-tidy] Improving narrowing conversions Summary: Newly flagged narrowing conversions: - integer to narrower signed integer (this is compiler implementation defined), - integer - floating point narrowing conversions, - floating point - integer narrowing conversions, - constants with narrowing conversions (even in ternary operator). Reviewers: hokein, alexfh, aaron.ballman, JonasToth Reviewed By: aaron.ballman, JonasToth Subscribers: lebedev.ri, courbet, nemanjai, xazax.hun, kbarton, cfe-commits Tags: #clang-tools-extra Differential Revision:
  1942. [CodeGen] Take SPAdj into account for STATEPOINT liveness args Summary: STATEPOINT records its args' locations on stack relative to SP. If the SP is changed, take that into account. This patch authored by Cherry Zhang <>. Reviewers: thanm, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision:
  1943. [libcxx] Use a type that is always an aggregate in variant's tests Summary: In PR39232, we noticed that some variant tests started failing in C++2a mode with recent Clangs, because the rules for literal types changed in C++2a. As a result, a temporary fix was checked in (enabling the test only in C++17). This commit is what I believe should be the long term fix: I removed the tests that checked constexpr default-constructibility with a weird type from the tests for index() and valueless_by_exception(), and instead I added tests for those using an obviously literal type in the test for the default constructor. Reviewers: EricWF, mclow.lists Subscribers: christof, jkorous, dexonsmith, arphaman, libcxx-commits, rsmith Differential Revision:
  1944. [clangd] Enable auto-index behind a flag. Summary: Ownership and configuration: The auto-index (background index) is maintained by ClangdServer, like Dynamic. (This means ClangdServer will be able to enqueue preamble indexing in future). For now it's enabled by a simple boolean flag in ClangdServer::Options, but we probably want to eventually allow injecting the storage strategy. New 'sync' command: In order to meaningfully test the integration (not just unit-test components) we need a way for tests to ensure the asynchronous index reads/writes occur before a certain point. Because these tests and assertions are few, I think exposing an explicit "sync" command for use in tests is simpler than allowing threading to be completely disabled in the background index (as we do for TUScheduler). Bugs: I fixed a couple of trivial bugs I found while testing, but there's one I can't. JSONCompilationDatabase::getAllFiles() may return relative paths, and currently we trigger an assertion that assumes they are absolute. There's no efficient way to resolve them (you have to retrieve the corresponding command and then resolve against its directory property). In general I think this behavior is broken and we should fix it in JSONCompilationDatabase and require CompilationDatabase::getAllFiles() to be absolute. Reviewers: kadircet Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  1945. [clangd] Fix compilation of IndexBenchmark
  1946. Remove an unnecessary file; NFC. This source file has not been needed since r346522 and was triggering diagnostics in MSVC about an object file which exports no public symbols (LNK4221).
  1947. [ASTImporter][Structural Eq] Check for isBeingDefined Summary: If one definition is currently being defined, we do not compare for equality and we assume that the decls are equal. Reviewers: a_sidorin, a.sidorin, shafik Reviewed By: a_sidorin Subscribers: gamesh411, shafik, rnkovacs, dkrupp, Szelethus, cfe-commits Differential Revision:
  1948. [clangd] Fix use-after-free with expected types in indexing
  1949. [clangd] Add type boosting in code completion Reviewers: sammccall, ioeric Reviewed By: sammccall Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1950. [DemandedBits] Add support for funnel shifts Add support for funnel shifts to the DemandedBits analysis. The demanded bits of the first two operands can be determined if the shift amount is constant. The demanded bits of the third operand (shift amount) can be determined if the bitwidth is a power of two. This is basically the same functionality as implemented in D54869 and D54478, but for DemandedBits rather than InstCombine. Differential Revision:
  1951. [clangd] Collect and store expected types in the index Summary: And add a hidden option to control whether the types are collected. For experiments, will be removed when expected types implementation is stabilized. The index size is almost unchanged, e.g. the YAML index for all clangd sources increased from 53MB to 54MB. Reviewers: ioeric, sammccall Reviewed By: sammccall Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1952. [clangd] Initial implementation of expected types Summary: Provides facilities to model the C++ conversion rules without the AST. The introduced representation can be stored in the index and used to implement type-based ranking improvements for index-based completions. Reviewers: sammccall, ioeric Reviewed By: sammccall Subscribers: malaperle, mgorny, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1953. [Index] Expose USR generation for types Summary: Used in clangd. Reviewers: sammccall, ioeric Reviewed By: sammccall Subscribers: kadircet, cfe-commits Differential Revision:
  1954. [x86] promote all multiply i8 by constant to i32 We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of our existing LEA and other multiply expansion magic happens as it would for i32 ops. Some of the test diffs show that we could end up with an actual 32-bit mul instruction here because we choose not to expand to simpler ops. That instruction could be slower depending on the subtarget. On the plus side, this means we don't need a separate instruction to load the constant operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, we could add a later transform that tries to shrink it back to i8 based on subtarget timing. I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means LEA is generally cheap. Differential Revision:
  1955. [PowerPC] Vector load/store builtins overstate alignment of pointers A number of builtins in altivec.h load/store vectors from pointers to scalar types. Currently they just cast the pointer to a vector pointer, but expressions like that have the alignment of the target type. Of course, the input pointer did not have that alignment so this triggers UBSan (and rightly so). This resolves Differential revision:
  1956. Create a diagnostic group for warn_call_to_pure_virtual_member_function_from_ctor_dtor, so it can be turned into an error using Werror Summary: Patch by Arnaud Bienner Reviewers: davide, rsmith, jkorous Reviewed By: jkorous Subscribers: jkorous, sylvestre.ledru, cfe-commits Differential Revision:
  1957. [clangd] Fix missing include from r347538 - fix windows buildbots
  1958. [clang-tidy] No warning for auto new expression in smart check Summary: The fix for `auto` new expression is illegal. Reviewers: aaron.ballman Subscribers: xazax.hun, cfe-commits Differential Revision:
  1959. [clangd] Tune down scope boost for global scope Summary: This improves cross-namespace completions and has ignorable impact on other completion types. Metrics ``` ================================================================================================== OVERALL (excl. CROSS_NAMESPACE) ================================================================================================== Total measurements: 109367 (-6) All measurements: MRR: 68.11 (+0.04) Top-1: 58.59% (+0.03%) Top-5: 80.00% (+0.01%) Top-100: 95.92% (-0.02%) Full identifiers: MRR: 98.35 (+0.09) Top-1: 97.87% (+0.17%) Top-5: 98.96% (+0.01%) Top-100: 99.03% (+0.00%) Filter length 0-5: MRR: 23.20 (+0.05) 58.72 (+0.01) 70.16 (-0.03) 73.44 (+0.03) 76.24 (+0.00) 80.79 (+0.14) Top-1: 11.90% (+0.03%) 45.07% (+0.03%) 58.49% (-0.05%) 62.44% (-0.02%) 66.31% (-0.05%) 72.10% (+0.07%) Top-5: 35.51% (+0.08%) 76.94% (-0.01%) 85.10% (-0.13%) 87.40% (-0.02%) 88.65% (+0.01%) 91.84% (+0.17%) Top-100: 83.25% (-0.02%) 96.61% (-0.15%) 98.15% (-0.02%) 98.43% (-0.01%) 98.53% (+0.01%) 98.66% (+0.02%) ================================================================================================== CROSS_NAMESPACE ================================================================================================== Total measurements: 17702 (+27) All measurements: MRR: 28.12 (+3.26) Top-1: 21.07% (+2.70%) Top-5: 35.11% (+4.48%) Top-100: 74.31% (+1.02%) Full identifiers: MRR: 79.20 (+3.72) Top-1: 71.78% (+4.86%) Top-5: 88.39% (+2.84%) Top-100: 98.99% (+0.00%) Filter length 0-5: MRR: 0.92 (-0.10) 5.51 (+0.57) 18.30 (+2.34) 21.62 (+3.76) 32.00 (+6.00) 41.55 (+7.61) Top-1: 0.56% (-0.08%) 2.44% (+0.15%) 9.82% (+1.47%) 12.59% (+2.16%) 21.17% (+4.47%) 30.05% (+6.72%) Top-5: 1.20% (-0.15%) 7.14% (+1.04%) 25.17% (+3.91%) 29.74% (+5.90%) 43.29% (+9.59%) 54.75% (+9.79%) Top-100: 5.49% (-0.01%) 56.22% (+2.59%) 86.69% (+1.08%) 89.03% (+2.04%) 93.74% (+0.78%) 96.99% (+0.59%) ``` Reviewers: sammccall Reviewed By: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1960. [clangd] Use testPath in the test.
  1961. [clang-tidy] PrintStackTraceOnErrorSignal
  1962. [ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF We can now select CLZ via the TableGen'erated code, so support G_CTLZ and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32. Legalizer: If the CLZ instruction is available, use it for both G_CTLZ and G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and lower G_CTLZ in terms of it. In order to achieve this we need to add support to the LegalizerHelper for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2). We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF if that is supported as a libcall, as opposed to just if it is Legal or Custom. Due to a minor refactoring of the helper function in charge of this, we will also allow the same behaviour for G_CTTZ and G_CTPOP. This is not going to be a problem in practice since we don't yet have support for treating G_CTTZ and G_CTPOP as libcalls (not even in DAGISel). Reg bank select: Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point. Instruction select: Nothing to do.
  1963. Fix typo in comment. NFC
  1964. [ARM] Prevent parallel macs for unsigned values Both zext and sext are currently allowed during the search for narrow sequences and sexts operands are later added to the mac candidates. But operands of muls are also added, without checking whether they're sext or zext, which means we can generate a signed smlad when we shouldn't. Differential Revision:
  1965. Revert "[TTI] Reduction costs only need to include a single extract element cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body.
  1966. [clangd] Cleanup after landing documentSymbol. NFC - fix compile error on older gcc in Protocol.cpp, - remove redundant 'llvm::' qualifiers from Protocol.cpp, - remove unused variables in AST.cpp
  1967. [clangd] Auto-index watches global CDB for changes. Summary: Instead of receiving compilation commands, auto-index is triggered by just filenames to reindex, and gets commands from the global comp DB internally. This has advantages: - more of the work can be done asynchronously (fetching compilation commands upfront can be slow for large CDBs) - we get access to the CDB which can be used to retrieve interpolated commands for headers (useful in some cases where the original TU goes away) - fits nicely with the filename-only change observation from r347297 The interface to GlobalCompilationDatabase gets extended: when retrieving a compile command, the GCDB can optionally report the project the file belongs to. This naturally fits together with getCompileCommand: it's hard to implement one without the other. But because most callers don't care, I've ended up with an awkward optional-out-param-in-virtual method pattern - maybe there's a better one. This is the main missing integration point between ClangdServer and BackgroundIndex, after this we should be able to add an auto-index flag. Reviewers: ioeric, kadircet Subscribers: MaskRay, jkorous, arphaman, cfe-commits, ilya-biryukov Differential Revision:
  1968. [clang-tidy] Don't generate incorrect fixes for class with deleted copy constructor in smart_ptr check. Summary: The fix for aggregate initialization (`std::make_unique<Foo>(Foo {1, 2})` needs to see Foo copy constructor, otherwise we will have a compiler error. So we only emit the check warning. Reviewers: JonasToth, aaron.ballman Subscribers: xazax.hun, cfe-commits Differential Revision:
  1969. Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction" This reverts commits r347532. Forget add the option -mtriple powerpc64-unknown-linux-gnu. So other platform is error except for PowerPC.
  1970. [X86] Add test cases to show bad type legalization of fptosi/fptosui v16f32->v16i8 and v8f64->v8i16 on pre-AVX512 targets. When splitting the v16f32/v8f64 result type, type legalization will try to promote the integer result type before a concat and an explicit truncate. But for the fptoui test case this is particularly bad since fptoui isn't supported on X86 until AVX512. We could use an fptosi since the result range would fit in a signed 32-bit value, but the generic type legalization doesn't do that transformation when splitting. It does do this when promoting.
  1971. [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: nemanjai Differential Revision:
  1972. A "constexpr" is evaluated in a constant context. Make sure this is reflected if a __builtin_constant_p() is a part of a constexpr.
  1973. [Support/FileSystem] Add sub-second precision for atime/mtime of sys::fs::file_status on unix platforms Summary: getLastAccessedTime() and getLastModificationTime() provided times in nanoseconds but with only 1 second resolution, even when the underlying file system could provide more precise times than that. These changes add sub-second precision for unix platforms that support improved precision. Also add some comments to make sure people are aware that the resolution of times can vary across different file systems. Reviewers: labath, zturner, aaron.ballman, kristina Reviewed By: aaron.ballman, kristina Subscribers: lebedev.ri, mgorny, kristina, llvm-commits Differential Revision:
  1974. [CodeComplete] Simplify CodeCompleteConsumer.cpp, NFC Use range-based for loops Use < 0 Format misaligned code
  1975. [MetadataTest] Fix off-by-one strncpy warning reported by gcc8. (NFC)
  1976. [CodeGen] translate MS rotate builtins to LLVM funnel-shift intrinsics This was originally part of: D50924 and should resolve PR37387: ...but it was reverted because some bots using a gcc host compiler would crash for unknown reasons with this included in the patch. Trying again now to see if that's still a problem.
  1977. [x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov.
  1978. [x86] add tests for select-of-fp-constants; NFC There are many options here depending on subtarget, but we are uniformly relying on a transform that was driven by performance for a 32-bit SSE2 target in 2009. Note: The same motivation was apparently used to do this transform for *all* targets, so non-x86 may want to look at this too.
  1979. [IPSCCP] Use input operand instead of OriginalOp for ssa_copy. OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772.
  1980. [SelectionDAG] move constant or splat functions to common location rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible.
  1981. [llvm-mca] Add support for instructions with a variadic number of operands. By default, llvm-mca conservatively assumes that a register operand from the variadic sequence is both a register read and a register write. That is because MCInstrDesc doesn't describe extra variadic operands; we don't have enough dataflow information to tell which register operands from the variadic sequence is a definition, and which is a use instead. However, if a variadic instruction is flagged 'mayStore' (but not 'mayLoad'), and it has no 'unmodeledSideEffects', then llvm-mca (very) optimistically assumes that any register operand in the variadic sequence is a register read only. Conversely, if a variadic instruction is marked as 'mayLoad' (but not 'mayStore'), and it has no 'unmodeledSideEffects', then llvm-mca optimistically assumes that any extra register operand is a register definition only. These assumptions work quite well for variadic load/store multiple instructions defined by the ARM backend.
  1982. add Kang Zhang( to the CREDITS.TXT
  1983. A bit of AST matcher cleanup, NFC. Removed the uses of the allOf() matcher inside node matchers that are implicit allOf(). Replaced uses of allOf() with the explicit node matcher where it makes matchers more readable. Replace anyOf(hasName(), hasName(), ...) with the more efficient and readable hasAnyName().
  1984. [X86][compiler-rt] Add missing semicolon
  1985. [X86] Synchronize a macro in getAvailableFeatures in Host.cpp with the same macro in compiler-rt to fix a negative shift amount warning.
  1986. [X86] Make conversion of feature bits into a mask explicitly unsigned by using 1U instead of 1.
  1987. [X86][compiler-rt] Attempt to fix a warning about a shift amount being negative in a macro expansion.
  1988. [InstCombine] Determine demanded and known bits for funnel shifts Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision:
  1989. [llvm-mca] InstrBuilder: warnings for call/ret instructions are only reported once.
  1990. [analyzer] INT50-CPP. Do not cast to an out-of-range enumeration checker This checker implements a solution to the "INT50-CPP. Do not cast to an out-of-range enumeration value" rule [1]. It lands in alpha for now, and a number of followup patches are planned in order to enable it by default. [1] Patch by: Endre Fülöp and Alexander Zaitsev! Differential Revision:
  1991. isEvaluatable() implies a constant context. Assume that we're in a constant context if we're asking if the expression can be compiled into a constant initializer. This fixes the issue where a __builtin_constant_p() in a compound literal was diagnosed as not being constant, even though it's always possible to convert the builtin into a constant.
  1992. Revert unapproved commit
  1993. [AArch64] Enable libm vectorized functions via SLEEF This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. * A comprehensive test case is included in this changeset. * In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Patch by Stefan Teleman Differential Revision:
  1994. [clangd] Add 'Switch header/source' command in clangd-vscode Summary: Alt+o is used on Windows/Linux and Option+Cmd+o on macOS. Signed-off-by: Marc-Andre Laperle <> Reviewers: hokein, ilya-biryukov, ioeric Reviewed By: ioeric Subscribers: sammccall, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  1995. [CodeComplete] Delete unused variable in rC342449
  1996. [CodeComplete] Format SemaCodeComplete.cpp and improve code consistency There are some mis-indented places and missing spaces here and there. Just format the whole file. Also, newer code (from 2014 onwards) in this file prefers const auto *X = dyn_cast to not repeat the Decl type name. Make other occurrences consistent. Remove two anonymous namespaces that are not very necessary: 1) a typedef 2) a local function (should use static)
  1997. [ARM] Add dependency from ARMAsmParser to ARMAsmPrinter after r347494 This fixes -DBUILD_SHARED_LIBS=on
  1998. [InstCombine] Simplify funnel shift with zero/undef operand to shift The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision:
  1999. [TableGen] Emit more variant transitions `llvm-mca` relies on the predicates to be based on `MCSchedPredicate` in order to resolve the scheduling for variant instructions. Otherwise, it aborts the building of the instruction model early. However, the scheduling model emitter in `TableGen` gives up too soon, unless all processors use only such predicates. In order to allow more processors to be used with `llvm-mca`, this patch emits scheduling transitions if any processor uses these predicates. The transition emitted for the processors using legacy predicates is the one specified with `NoSchedPred`, which is based on `MCSchedPredicate`. Preferably, `llvm-mca` should instead assume a reasonable default when a variant transition is not based on `MCSchedPredicate` for a given processor. This issue should be revisited in the future. Differential revision:
  2000. [llvm-mca] Refactor some of the logic in InstrBuilder, and add a verifyOperands method. With this change, InstrBuilder emits an error if the MCInst sequence contains an instruction with a variadic opcode, and a non-zero number of variadic operands. Currently we don't know how to correctly analyze variadic opcodes. The problem with variadic operands is that there is no information for them in the opcode descriptor (i.e. MCInstrDesc). That means, we don't know which variadic operands are defs, and which are uses. In future, we could try to conservatively assume that any extra register operands is both a register use and a register definition. This patch fixes a subtle bug in the evaluation of read/write operands for ARM VLD1 with implicit index update. Added test vld1-index-update.s
  2001. [DAG] consolidate shift simplifications ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes.
  2002. [x86] make test immune to oversized shift simplification I'm not sure if this actually preserves the original intent of this test, but if we leave it as-is, the -1 (oversized) shift should be folded to undef and allow deleting half of the output.
  2003. Revert r347490 as it breaks address sanitizer builds
  2004. [clangd] Add support for hierarchical documentSymbol Reviewers: ioeric, sammccall, simark Reviewed By: sammccall Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2005. Remove the optional dependency from libclang to clang-tidy/include-fixer clangd does a better job on both of these, so don't slow down everyone's build for a poorly working libclang feature.
  2006. [clang-tidy] Ignore matches in template instantiations (cert-dcl21-cpp) The test fails with a local modification to clang-tidy/ClangTidyDiagnosticConsumer.cpp to include fixes into the key when deduplicating the warnings.
  2007. [ARM][AsmParser] Improve debug printing of parsed asm operands In ARMOperand::print: - Print human-readable register names, instead of numbers. - Print the correct names for IT condition masks (these were in the wrong order before). - Print all parts of memory operands, not just the base register. This makes the output of llvm-mc -show-inst-operands more readable. Differential revision:
  2008. [llvm-mca][View] Improved Retire Control Unit Statistics. RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change.
  2009. Attempt to fix buildbot after r347489
  2010. Revert r343341 - Cannot reproduce the build failure locally and the build logs have been deleted.
  2011. [ThinLTO] Assembly representation of ReadOnly attribute Differential revision:
  2012. [NFC] Add test that demonstrates buggy behavior on term folding of LoopSimplifyCFG
  2013. [ARM][NFC] codegen tests cleanup: remove dangling check prefixes I am working on making FileCheck stricter (in D54769 and D53710) so that it issues diagnostics when there's something wrong with tests. This is a cleanup for dangling prefixes in the ARM codegen tests, e.g.: --check-prefixes=A,B where A occurs in the check file, but B doesn't. This can be innocent if A does all the required checking, but can also be a bug in that test if it results in the test actually not checking anything (if A for example only checks a common label). Test CodeGen/ARM/smml.ll is such an example. Differential Revision:
  2014. Disable LoopSimplifyCFG terminator folding by default
  2015. [LoopSimplifyCFG] Don't delete LCSSA Phis When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision:
  2016. [NFC] Add verification flags to tests
  2017. [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading towards scalarizing the type. This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together. But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code. The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway.
  2018. [Object] Also treat STB_GNU_UNIQUE symbols as exported to other DSO All of STB_GLOBAL/STB_WEAK/STB_GNU_UNIQUE are treated as export symbols, see: glibc/elf/dl-lookup.c:do_lookup_x musl/ldso/dynlink.c OK_BINDS Though does not read binding, the currently used STV_DEFAULT or STV_PROTECTED is a good emulation of linker behavior.
  2019. A __builtin_constant_p() returns 0 with a function type.
  2020. [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to SplitVecOp_UnaryOp if splitting the output type would be a legal type. SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that. The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked.
  2021. [DAGCombiner] form 'not' ops ahead of shifts (PR39657) We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2
  2022. [NFC] Fix typo in comment
  2023. Reland test/MC/Mips/reloc-directive-label-offset.s The test was reverted because it failed on llvm-clang-x86_64-expensive-checks-win builder, and that was because -DEXPENSIVE_CHECKS adds randomness to llvm::sort(), affecting the order of relocation table entries. Modified the test to not have two relocations at the same offset.
  2024. [libcxx] Reintroduce UNSUPPORTED annotation for strstreambuf overflow test This is a revert of r347421, except I'm using the with_system_cxx_lib lit feature instead of availability to mark the test as unsupported (because the problem is a bug in the dylib itself). In r347421, I said I wasn't able to reproduce the issue and that's why I was removing it: this was because I ran lit slightly wrong. The problem mentioned really exists.
  2025. [clangd] Cleanup: make the diags callback global in TUScheduler Reviewers: sammccall Reviewed By: sammccall Subscribers: javed.absar, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2026. [libcxx] Add XFAIL for test on OS X 10.12 to 10.14
  2027. [clangd] Cleanup error consumption code. NFC - Remove reimplementations of llvm::consumeError. - Simplify test code by using EXPECT_ERROR where it fits.
  2028. [NFC][libcxx] Print human-friendly command line when lit test fails We used to print a Python list corresponding to the command. It is more useful to print the joined string so it can be copy/pasted directly when a test fails.
  2029. [clang-tidy] Ignore template instantiations in modernize-use-using The test I'm adding passes without the change due to the deduplication logic in ClangTidyDiagnosticConsumer::take(). However this bug manifests in our internal integration with clang-tidy. I've verified the fix by locally changing LessClangTidyError to consider replacements.
  2030. [llvm-mca] LSUnit: use a SmallSet to model load/store queues. NFCI Also, try to minimize the number of queries to the memory queues to speedup the analysis. On average, this change gives a small 2% speedup. For memcpy-like kernels, the speedup is up to 5.5%.
  2031. [clangd] Cleanup: make diagnostics callbacks from TUScheduler non-racy Summary: Previously, removeDoc followed by an addDoc to TUScheduler resulted in racy diagnostic responses, i.e. the old dianostics could be delivered to the client after the new ones by TUScheduler. To workaround this, we tracked a version number in ClangdServer and discarded stale diagnostics. After this commit, the TUScheduler will stop delivering diagnostics for removed files and the workaround in ClangdServer is not required anymore. Reviewers: sammccall Reviewed By: sammccall Subscribers: javed.absar, ioeric, MaskRay, jkorous, arphaman, jfb, kadircet, cfe-commits Differential Revision:
  2032. [clangd] Cleanup: stop passing around list of supported URI schemes. Summary: Instead of passing around a list of supported URI schemes in clangd, we expose an interface to convert a path to URI using any compatible scheme that has been registered. It favors customized schemes and falls back to "file" when no other scheme works. Changes in this patch are: - URI::create(AbsPath, URISchemes) -> URI::create(AbsPath). The new API finds a compatible scheme from the registry. - Remove URISchemes option everywhere (ClangdServer, SymbolCollecter, FileIndex etc). - Unit tests will use "unittest" by default. - Move "test" scheme from ClangdLSPServer to ClangdMain.cpp, and only register the test scheme when lit-test or enable-lit-scheme is set. (The new flag is added to make lit protocol.test work; I wonder if there is alternative here.) Reviewers: sammccall Reviewed By: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2033. [clangd] Cleanup: use index file instead of header in workspace symbols lit test. Summary: The full path of the input header depends on the execution environment and may result in different behavior (e.g. when different URI schemes are used). Reviewers: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2034. [clang-format] Do not treat asm clobber [ as ObjCExpr, refined Summary: r346756 refined clang-format to not treat the `[` in `asm (...: [] ..)` as an ObjCExpr. However that's not enough, as we might have a comma-separated list of such clobbers as in the newly added test. This updates the detection to instead look at the Line's first token being `asm` and not mark `[`-s as ObjCExprs in this case. Reviewers: djasper, benhamilton Reviewed By: djasper, benhamilton Subscribers: benhamilton, cfe-commits Differential Revision:
  2035. [llvm-mca] Use a SmallVector instead of std::vector to track register reads/writes. NFCI This avoids a heap allocation most of the times. This patch gives a small but consistent 3% speedup on a release build (up to ~5% on a debug build).
  2036. Revert rL347462 "[ASTMatchers] Add hasSideEffect() matcher." Breaks some buildbots.
  2037. [ASTMatchers] Add hasSideEffect() matcher. Summary: Exposes Expr::HasSideEffects. Reviewers: aaron.ballman Subscribers: cfe-commits Differential Revision:
  2038. [libcxx] Remove incorrect XFAIL on macos 10.12
  2039. [clangd] Fix use-after-scope in unit test The scheduler holds a reference to `Proceed`, so it has to be destroyed after the scheduler. Found by asan.
  2040. [llvm-mca] Fix an invalid memory read introduced by r346487. This patch fixes an invalid memory read introduced by r346487. Before this patch, partial register write had to query the latency of the dependent full register write by calling a method on the full write descriptor. However, if the full write is from an already retired instruction, chances are that the EntryStage already reclaimed its memory. In some parial register write tests, valgrind was reporting an invalid memory read. This change fixes the invalid memory access problem. Writes are now responsible for tracking dependent partial register writes, and notify them in the event of instruction issued. That means, partial register writes no longer need to query their associated full write to check when they are ready to execute. Added test X86/BtVer2/partial-reg-update-7.s
  2041. [NFC] Assert that all blocks staying in loop are live
  2042. [NFC] Ensure deterministic order of dead exit blocks
  2043. [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop. Fix this by making LowerBUILD_VECTOR not do this transformation for those vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element constant vector. Doing that means we get a DUP, which we then need to recognise in ISel as a copy.
  2044. [NFC] Simplify code by using standard exit blocks collection
  2045. [ASTMatchers] Re-generate ast matchers doc after rL346455.
  2046. [TI removal] Leverage the fact that TerminatorInst is gone to create a normal base class that provides all common "call" functionality. This merges two complex CRTP mixins for the common "call" logic and common operand bundle logic into a single, normal base class of `CallInst` and `InvokeInst`. Going forward, users can typically `dyn_cast<CallBase>` and use the resulting API. No more need for the `CallSite` wrapper. I'm planning to migrate current usage of the wrapper to directly use the base class and then it can be removed, but those are simpler and much more incremental steps. The big change is to introduce this abstraction into the type system. I've tried to do some basic simplifications of the APIs that I couldn't really help but touch as part of this: - I've tried to organize the attribute API and bundle API into groups to make understanding the API of `CallBase` easier. Without this, I wasn't able to navigate the API sanely for all of the ways I needed to modify it. - I've added what seem like more clear and consistent APIs for getting at the called operand. These ended up being especially useful to consolidate the *numerous* duplicated code paths trying to do this. - I've largely reworked the organization and implementation of the APIs for computing the argument operands as they needed to change to work with the new subclass approach. To minimize any cost associated with this abstraction, I've moved the operand layout in memory to store the called operand last. This makes its position relative to the end of the operand array the same, regardless of the subclass. It should make it much cheaper to reference from the `CallBase` abstraction, and this is likely one of the most frequent things to query. We do still pay one abstraction penalty here: we have to branch to determine whether there are 0 or 2 extra operands when computing the end of the argument operand sequence. However, that seems both rare and should optimize well. I've implemented this in a way specifically designed to allow it to optimize fairly well. If this shows up in profiles, we can add overrides of the relevant methods to the subclasses that bypass this penalty. It seems very unlikely that this will be an issue as the code was *already* dealing with an ever present abstraction of whether or not there are operand bundles, so this isn't the first branch to go into the computation. I've tried to remove as much of the obvious vestigial API surface of the old CRTP implementation as I could, but I suspect there is further cleanup that should now be possible, especially around the operand bundle APIs. I'm leaving all of that for future work in this patch as enough things are changing here as-is. One thing that made this harder for me to reason about and debug was the pervasive use of unsigned values in subtraction and other arithmetic computations. I had to debug more than one unintentional wrap. I've switched a few of these to use `int` which seems substantially simpler, but I've held back from doing this more broadly to avoid creating confusing divergence within a single class's API. I also worked to remove all of the magic numbers used to index into operands, putting them behind named constants or putting them into a single method with a comment and strictly using the method elsewhere. This was necessary to be able to re-layout the operands as discussed above. Thanks to Ben for reviewing this (somewhat large and awkward) patch! Differential Revision:
  2047. Unbreak FreeBSD build. M lib/sanitizer_common/
  2048. [clangd] Respect task cancellation in TUScheduler. Summary: - Reads are never executed if canceled before ready-to run. In practice, we finalize cancelled reads eagerly and out-of-order. - Cancelled reads don't prevent prior updates from being elided, as they don't actually depend on the result of the update. - Updates are downgraded from WantDiagnostics::Yes to WantDiagnostics::Auto when cancelled, which allows them to be elided when all dependent reads are cancelled and there are subsequent writes. (e.g. when the queue is backed up with cancelled requests). The queue operations aren't optimal (we scan the whole queue for cancelled tasks every time the scheduler runs, and check cancellation twice in the end). However I believe these costs are still trivial in practice (compared to any AST operation) and the logic can be cleanly separated from the rest of the scheduler. Reviewers: ilya-biryukov Subscribers: javed.absar, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2049. Move the llvm lit test dependencies to clang-tools-extra. Summary: Part of revert r343473 Reviewers: mgorny Subscribers: cfe-commits Differential Revision:
  2050. Revert r343473 "Move llvm util dependencies from clang-tools-extra to add_lit_target." Summary: It will cause test tools `FileCheck`, `count`, `not` being built blindly, these dependencies should move back to clang-tools-extra. Reviewers: mgorny Subscribers: llvm-commits Differential Revision:
  2051. [ARM GlobalISel] Add test for BFC. NFCI r334871 has made it possible for TableGen'erated code to select BFC, but it has not added a test for it on the ARM side. Add it now to make sure we don't introduce regressions if we ever change anything about that rule.
  2052. The result of is.constant() is unsigned.
  2053. [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics. Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand
  2054. [Driver] Support XRay on Fuchsia This enables support for XRay in Fuchsia Clang driver. Differential Revision:
  2055. [XRay] Support for Fuchsia This extends XRay to support Fuchsia. Differential Revision:
  2056. tsan: Update measurements in These changed as a result of r347379. Unfortunately there was a regression; filed PR39748 to track it. Differential Revision:
  2057. [llvm-size] Use empty() and range-based for loop. NFC
  2058. [llvm-mca] Add test case (NFC) Add test case that will serve as the base for D54820.
  2059. tsan: Correct the name of an executable.
  2060. [x86] use FileCheck to verify output; NFC
  2061. [llvm-mca] Add test case (NFC) Fix previous commit r347434.
  2062. Add a ubsan blacklist entry for libstdc++ 8.0.1.
  2063. [libcxx] Improve error message when an invalid directory is provided as use_system_cxx_lib
  2064. [llvm-mca] Add test case (NFC) Add test case that will serve as the base for D54777.
  2065. Removing test/MC/Mips/reloc-directive-label-offset.s temporarily This test is failing on llvm-clang-x86_64-expensive-checks-win builder. Removing it until I get it fixed.
  2066. [PM] correcting return value for new-pass-manager version of Scalarizer Obvious mistake missed during D54695 review.
  2067. [mingw] Use unmangled name after the $ in the section name GCC does it this way, and we have to be consistent. This includes stdcall and fastcall functions with suffixes. I confirmed that a fastcall function named "foo" ends up in ".text$foo", not ".text$@foo@8". Based on a patch by Andrew Yohn! Fixes PR39218. Differential Revision:
  2068. Revert "[Driver] Use --push/pop-state with Sanitizer link deps" This reverts commit r347413: older versions of that are used by Android don't support --push/pop-state which broke sanitizer bots.
  2069. [PowerPC][NFC] Split PPCMCCodeEmitter into header and cpp file. This is further cleanup for PPCMCCodeEmitter. The class had been contained within the cpp file alone. Now it has been split up between a header file and a cpp file which allows other classes to make use of the functions in this class if required.
  2070. [libcxx] Remove unused definition of aligned allocation macro on old OS X We don't support mac OS 10.6 and older anymore, so this macro can never be defined. This bit of code had been added in D28931 as a fix for PR31448, but it doesn't seem necessary anymore.
  2071. [Sanitizer] Adding setvbuf in supported platforms and other stream buffer functions - Enabling setvbuf interceptions for non NetBSD platforms. - setbuf, setbuffer, setlinebuf as well. Reviewers: vitalybuka, krytarowski Reviewed By: vitalybuka Differential Revision:
  2072. [OPENMP][NVPTX]Emit default locations as constant with undefined mode. For the NVPTX target default locations should be emitted as constants + additional info must be emitted in the reserved_2 field of the ident_t structure. The 1st bit controls the execution mode and the 2nd bit controls use of the lightweight runtime. The combination of the bits for Non-SPMD mode + lightweight runtime represents special undefined mode, used outside of the target regions for orphaned directives or functions. Should allow and additional optimization inside of the target regions.
  2073. [DAGCombiner] refactor select-of-FP-constants transform This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load.
  2074. [libcxx] Fix incorrect iterator type in vector container test The iterator types for different specializations of containers with the same element type but different allocators are not required to be convertible. This patch makes the test to take the iterator type from the same container specialization as the created container. Reviewed as Thanks to Andrey Maksimov for the patch.
  2075. [PowerPC][NFC] Minor Code Cleaup for PPCMCCodeEmitter.
  2076. [libcxx] Mark strstreams tests as being supported on all OS X versions I wasn't able to reproduce the issue referred to by the comment using the libc++'s shipped with mac OS X 10.7 and 10.8, so I assume this may have been fixed in a function that is now shipped in the headers. In that case, the tests will pass no matter what dylib we're using. In the worst case, some test bots will start failing and I'll understand why I was wrong, and I can create an actual lit feature for it. Note that I could just leave this test alone, but this change is on the path towards eradicating vendor-specific availability markup from the test suite.
  2077. [LLVM] Allow modulemap installation Summary: Currently we can't install the modulemaps provided by LLVM, since they are not structured to support headers generated as part of the build (ex. `llvm/IR/Attributes.gen`). This patch restructures the module maps in order to support installation. Modules containing generated headers are defined in the new `module.extern.modulemap` file, and are referenced from the main `module.modulemap` using `extern module`. There are two versions of the `module.extern.modulemap` file; one used when building and another, `module.install.modulemap`, which is re-named during installation. Users can opt-into module map installation using `-DLLVM_INSTALL_MODULEMAPS=ON`. The default value is `OFF` due to Reviewers: rsmith, mehdi_amini, bruno, EricWF Reviewed By: EricWF Subscribers: tschuett, chapuni, mgorny, llvm-commits Differential Revision:
  2078. Update call to EvaluateAsInt() to the new syntax.
  2079. Re-Reinstate 347294 with a fix for the failures. Don't try to emit a scalar expression for a non-scalar argument to __builtin_constant_p(). Third time's a charm!
  2080. Fix missing includes in test header
  2081. [compiler-rt][UBSan] silence_unsigned_overflow: do *NOT* ignore *fatal* unsigned overflows Summary: D48660 / rL335762 added a `silence_unsigned_overflow` env flag for [[ | oss-fuzz needs ]], that allows to silence the reports from unsigned overflows. It makes sense, it is there because `-fsanitize=integer` sanitizer is not enabled on oss-fuzz, so this allows to still use it as an interestingness signal, without getting the actual reports. However there is a slight problem here. All types of unsigned overflows are ignored. Even if `-fno-sanitize-recover=unsigned` was used (which means the program will die after the report) there will still be no report, the program will just silently die. At the moment there are just two projects on oss-fuzz that care: * [[ | libc++ ]] * [[ | RawSpeed ]] (me) I suppose this could be overridden there ^, but i really don't think this is intended behavior in any case.. Reviewers: kcc, Dor1s, #sanitizers, filcab, vsk, kubamracek Reviewed By: Dor1s Subscribers: dberris, mclow.lists, llvm-commits Tags: #sanitizers Differential Revision:
  2082. [InstCombine] Add tests for funnel shift with zero operand; NFC These are additional baseline tests for D54778.
  2083. [Driver] Use --push/pop-state with Sanitizer link deps Sanitizer runtime link deps handling passes --no-as-needed because of PR15823, but it never undoes it and this flag may affect other libraries that come later on the link line. To avoid this, wrap Sanitizer link deps in --push/pop-state. Differential Revision:
  2084. [OPENMP] Refactor code for parsing omp declare target directive and its clauses (NFC) This patch refactor the code for parsing omp declare target directive and its clauses. Patch by pjeeva01 (Jeeva P.) Differential Revision:
  2085. [DAGCombiner] reduce code duplication; NFC
  2086. [OPENMP]Fix handling of the LCVs in loop-based directives. Loop-control variables with the default data-sharing attributes should not be captured in the OpenMP region as they are private by default. Also, default attributes should be emitted for such variables in the inner OpenMP regions for the correct data sharing during codegen.
  2087. [OPENMP] remove redundant MapTypeModifierSpecified flag in ParseOpenMP.cpp (NFC) Whether the map type modifier is specified or not, the flag MapTypeModifierSpecified is always set to true. Patch by Ahsan Saghir Differential Revision:
  2088. [MergeFuncs] Generate alias instead of thunk if possible The MergeFunctions pass was originally intended to emit aliases instead of thunks where possible (unnamed_addr). However, for a long time this functionality was behind a flag hardcoded to false, bitrotted and was eventually removed in r309313. Originally the functionality was first disabled in r108417 due to lack of support for aliases in Mach-O. I believe that this is no longer the case nowadays, but not really familiar with this area. In the interest of being conservative, this patch reintroduces the aliasing functionality behind a default disabled -mergefunc-use-aliases flag. Differential Revision:
  2089. [x86] add tests for select-of-FP-constants; NFC
  2090. [OPENMP] Support relational-op != (not-equal) as one of the canonical forms of random access iterator In OpenMP 4.5, only 4 relational operators are supported: <, <=, >, and >=. This work is to enable support for relational operator != (not-equal) as one of the canonical forms. Patch by Anh Tuyen Tran Differential Revision:
  2091. [x86] fix predicate for avoiding vblendv It only makes sense to produce the logic ops when 1 of the constants is +0.0. Otherwise, go with vblendv to reduce code.
  2092. Mark lambda decl as invalid if a captured variable has an invalid type. This causes the compiler to crash when trying to compute a layout for the lambda closure type (see included test).
  2093. [x86] add test for FP select with constant; NFC
  2094. [libcxx] Make sure operator+ is declared with the right visibility attribute Otherwise, Clang complains about internal_linkage not being applied to the first declaration of the operator (and rightfully so).
  2095. [libcxx] Mark stray symbols as hidden to try and fix the build r347395 changed the ABI list on Linux, but two of those symbols are still being exported from the shared object: _ZSt18make_exception_ptrINSt3__112future_errorEESt13exception_ptrT_ _ZNSt3__1plIcNS_11char_traitsIcEENS_9allocatorIcEEEENS_12basic_stringIT_T0_T1_EERKS9_PKS6_ This commit makes sure those symbols are not exported, as they should be.
  2096. [mips][mc] Add basic support for R_MIPS_JALR/R_MICROMIPS_JALR R_MIPS_JALR/R_MICROMIPS_JALR can now be parsed in .s files and emitted to .o. They are still not generated with JALR. Differential revision:
  2097. [MC] Support labels as offsets in .reloc directive Currently, expressions like .reloc 1f, R_MIPS_JALR, foo 1: nop are not allowed, ie. an offset in .reloc can only be absolute value. This patch adds support for labels as offsets. If offset is a forward declared label, MCObjectStreamer keeps the fixup locally and adds it to the fixups vector after the label (and its offset) is defined. label+number is not supported yet. Differential revision:
  2098. [NFC][libcxx] Add revision number to ABI changelog
  2099. [libcxx] Make sure we can build with -fvisibility=hidden on Linux Summary: This commit marks a few functions as hidden and removes them from the ABI list on Linux such that libc++ can be built with -fvisibility=hidden. The functions marked as hidden by this patch were exported from the shared object only because they were implicitly instantiated function templates. It is safe to stop exporting those symbols from the shared object because nobody could actually depend on them: implicit instantiations are not taken from shared objects. The symbols removed in this commit are basically the same that had been removed in, but that patch had to be reverted because it broke the build (because the functions were not marked as hidden like this patch does). Reviewers: EricWF, mclow.lists Subscribers: christof, jkorous, dexonsmith, libcxx-commits Differential Revision:
  2100. [x86] add checks for asm to test; NFC
  2101. [TargetLowering] SimplifyDemandedBits - only reduce known bits for integer constants Avoids fuzzing crash found by Mikael Holmén.
  2102. [PM] Port Scalarizer to the new pass manager. Patch by: markus (Markus Lavin) Reviewers: chandlerc, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: llvm-commits, Ka-Ka, bjope Differential Revision:
  2103. Revert 347366, its prerequisite 347364 got reverted.
  2104. Revert r347364 again, the fix was incomplete.
  2105. [nios2] Add missing Nios2CodeGen -> Nios2AsmPrinter linkage Add missing linkage from Nios2CodeGen library to Nios2AsmPrinter library. The missing dependency causes shared-lib build to fail with the following reason: lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmMemoryOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)': Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter21PrintAsmMemoryOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x2b): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)' lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)': Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter15PrintAsmOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x97): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)' collect2: error: ld returned 1 exit status Differential Revision:
  2106. [X86][AVX] Remove BROADCAST if we only need the 0'th element We don't catch this with target shuffle simplification if the src/dst types are different.
  2107. Test commit: Delete trailing space in comment
  2108. [NFC] More complex tests for LoopSimplifyCFG
  2109. tsan: add pthread_tryjoin_np and pthread_timedjoin_np interceptors Add pthread_tryjoin_np() and pthread_timedjoin_np() interceptors on Linux, so that ThreadSanitizer can handle programs using these functions. Author: Yuri Per (yuri) Reviewed in:
  2110. Add header <atomic> which is needed to compile with some older library versions.
  2111. [NFC] Add some sophisticated tests on LoopSimplifyCFG
  2112. [X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1. The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example.
  2113. [LVI] run transfer function for binary operator even when the RHS isn't a constant LVI was symbolically executing binary operators only when the RHS was constant, missing the case where we have a ConstantRange for the RHS, but not an actual constant. Tested using check-all and by bootstrapping. Compile time is not impacted measurably. Differential Revision:
  2114. [Driver] Link sanitizer runtime deps on Fuchsia when needed Even though these deps weren't needed, this makes Fuchsia driver better match other drivers, and it may be necessary when trying to use different C libraries on Fuchsia. Differential Revision:
  2115. [libc++] Implement P0487R1 - Fixing operator>>(basic_istream&, CharT*) Summary: Avoid buffer overflow by replacing the pointer interface with an array reference interface in C++2a. Tentatively ready on Batavia2018. Reviewers: mclow.lists, ldionne, EricWF Reviewed By: ldionne Subscribers: libcxx-commits, cfe-commits, christof Differential Revision:
  2116. [PowerPC] Do not use vectors to codegen bswap with Altivec turned off We have efficient codegen on P9 for lowering bswap that involves moving the value into a vector reg and moving it back. However, the check under which we custom lowered it did not adequately reflect the actual requirements. It required only that the subtarget be an implementation of ISA 3.0 since all compliant implementations have to provide the vector instructions. However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9 (i.e. don't emit vector code, don't have to save vector regs for context switch). So we should require the correct features for this lowering. Fixes
  2117. [X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly. These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel. Fixes PR39733
  2118. [X86] Add a copy of avx512-trunc.ll with -x86-experimental-vector-widening-legalization enabled.
  2119. [clang-tidy] Add a test for proper handling of locations in scratch space. This test examines the behavior change of clang::tooling::Diagnostic in r347372.
  2120. clang::tooling::Diagnostic: Don't store offset in the scratch space. These offsets are useless (and even harmful in certain cases) in exported diagnostics. The test will be added to clang-tidy, since it's the main user of the clang::tooling::Diagnostic class.
  2121. Implement YAML serialization of notes in clang::tooling::Diagnostic.
  2122. [docs] Add C++ Performance Benchmark to test-suite proposals.
  2123. [XRay] Add a test for re-initialising FDR mode (NFC) This change adds an end-to-end test that ensures FDR mode can be re-initialised safely in the face of multiple threads being traced.
  2124. [NFC] Rename lit feature to '-fsized-deallocation' for consistency The '-faligned-allocation' flag uses a feature with the same name (with a leading dash).
  2125. Update EvaluateAsInt to the new syntax.
  2126. Reinstate 347294 with a fix for the failures. EvaluateAsInt() is sometimes called in a constant context. When that's the case, we need to specify it as so.
  2127. [NFC] Reformat availability #defines in __config Aligning everything makes what we're doing more obvious.
  2128. [NFC] Fix formatting in availability documentation
  2129. [X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8. We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering. Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets.
  2130. A couple of tests were broken when clang implemented the compiler parts of P0482 (support for char8_t). Comment out those bits until we implement the corresponding bits in libc++
  2131. Fix pointer options mask. It was off by 1 bit.
  2132. Revert "[Sanitizer] intercept setvbuf on other platforms where it is supported"
  2133. [Sanitizer] Unbreak non NetBSD builds.
  2134. [DAGCombiner] look through bitcasts when trying to narrow vector binops This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision:
  2135. [Sanitizer] intercept setvbuf on other platforms where it is supported Unit tests enabled only in platform tested. Reviewers: krytarowski, vitalybuka Reviewed By: krytarowski, vitalybuka Differential Revision:
  2136. [CodeView] Add support for ref-qualified member functions. When you have a member function with a ref-qualifier, for example: struct Foo { void Func() &; void Func2() &&; }; clang-cl was not emitting this information. Doing so is a bit awkward, because it's not a property of the LF_MFUNCTION type, which is what you'd expect. Instead, it's a property of the this pointer which is actually an LF_POINTER. This record has an attributes bitmask on it, and our handling of this bitmask was all wrong. We had some parts of the bitmask defined incorrectly, but importantly for this bug, we didn't know about these extra 2 bits that represent the ref qualifier at all. Differential Revision:
  2137. [CodeView] Mark this pointers as const. This is for compatibility with MSVC, which also marks this pointers as being const-qualified. Fixes Differential Revision:
  2138. [CodeComplete] Penalize inherited ObjC properties for auto-completion Summary: Similar to auto-completion for ObjC methods, inherited properties should be penalized / direct class and category properties should be prioritized. Note that currently, the penalty for using a result from a base class (CCD_InBaseClass) is equal to the penalty for using a method as a property (CCD_MethodAsProperty). Reviewers: jkorous, sammccall, akyrtzi, arphaman, benlangmuir Reviewed By: sammccall, akyrtzi Subscribers: arphaman, cfe-commits Differential Revision:
  2139. [OpenMP] Update CHECK-DAG usage in target_parallel_codegen.cpp This patch adjusts a test not to depend on deprecated FileCheck behavior that permits overlapping matches within a block of CHECK-DAG directives. Thus, this patch also removes uses of FileCheck's -allow-deprecated-dag-overlap command-line option. There were two issues in this test: 1. There were sets of patterns for store instructions in which a pattern X could match a superset of a pattern Y. While X appeared before Y, Y's intended match appeared before X's intended match. The result was that X matched Y's intended match. Under the old overlapping behavior, Y also matched Y's intended match. Under the new non-overlapping behavior, Y had nothing left to match. This patch fixes this by gathering these sets in one place and putting the most specific patterns (Y) before the more general patterns (X). 2. The CHECK-DAG patterns involving the variables CBPADDR3 and CBPADDR4 were the same, but there was only one match in the text, so CBPADDR4 patterns had nothing to match under the new non-overlapping behavior. Moreover, a preceding related series of directives had variables (SADDR0, BPADDR0, etc.) numbered only 0 through 4, but this series had variables numbered 0 through 5. Assuming CBPADDR4's directives were not intended, this patch removes them. Reviewed By: ABataev Differential Revision:
  2140. [OpenMP] Update CHECK-DAG usage in for_codegen.cpp This patch adjusts a test not to depend on deprecated FileCheck behavior that permits overlapping matches within a block of CHECK-DAG directives. Thus, this patch also removes uses of FileCheck's -allow-deprecated-dag-overlap command-line option. Specifically, the FileCheck variables DBG_LOC_START, DBG_LOC_END, and DBG_LOC_CANCEL were all set to the same value. As a result, three TERM_DEBUG-DAG patterns, one for each variable, all matched the same text under the old overlapping behavior. Under the new non-overlapping behavior, that's not permitted. This patch's solution is to replace these variables with one variable and replace these patterns with one pattern. Reviewed By: ABataev Differential Revision:
  2141. [CodeView] RelocPtr points to little endian data. Don't use a uint32_t*, use a ulittle32_t* to make this correct on big endian systems. Patch by James Clarke Differential Revision:
  2142. [X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets. Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine. Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll
  2143. [libcxx] Fix threads detection on GNU/Hurd GNU/Hurd provides standard Posix threads Reviewed as Thanks to Samuel Thibault for the patch.
  2144. [unittests] Fix ExpandTilde test to match handling home dirs with trailing slash The `expandTildeExpr` routine just replaces a tilde by a home dir path. If the home dir has a trailing slash, the result of substitution will contain double slashes. For example, `HOME=/foo/ ~/bar` gives `/foo//bar`. That corresponds to (at least) Bash behaviour because the following command `$HOME=/foo/ echo ~/bar` prints `/foo//bar`. The `ExpandTilde` test constructs a path expected as the `fs::expand_tilde` call result by calling `path::append` and the expected path has a single slash. This patch fixes that and allows to pass the unittest on hosts where the `HOME` is `/`. Differential Revision:
  2145. Silence C4709 in MSVC because it is buggy. The diagnostic will trigger on code that does not have any comma operator, but instead default-constructs an object with an explicitly defaulted constructor as the array index argument.
  2146. Note that P0899R1 requires no work.
  2147. Mark P0771 as complete; we already did this - I just added tests to be sure
  2148. [x86] add tests for 8-bit multiply with constant; NFC This is based on the existing file for 16-bit. We also already have 32-bit and 64-bit variants.
  2149. [WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0 Rather than assuming that `tempRet0` exists in linear memory only assume the getter/setter functions exist. This avoids conflicting with binaryen which declares a wasm global for this purpose and defines it's own getter and setter for that. The other advantage of doing things this way is that it leaving it up to the linker/finalizer to decide how to actually store this temporary. As it happens binaryen uses a wasm global which is more appropriate since it is thread safe. This also allows us to change the way this is stored in the future (memory, TLS memory, wasm global) without modifying LLVM. This is part of a 4 part change: LLVM: fastcomp: emscripten: binaryen: Differential Revision:
  2150. [clang][Parse] Diagnose useless null statements / empty init-statements Summary: clang has `-Wextra-semi` (D43162), which is not dictated by the currently selected standard. While that is great, there is at least one more source of need-less semis - 'null statements'. Sometimes, they are needed: ``` for(int x = 0; continueToDoWork(x); x++) ; // Ugly code, but the semi is needed here. ``` But sometimes they are just there for no reason: ``` switch(X) { case 0: return -2345; case 5: return 0; default: return 42; }; // <- oops ;;;;;;;;;;; <- OOOOPS, still not diagnosed. Clearly this is junk. ``` Additionally: ``` if(; // <- empty init-statement true) ; switch (; // empty init-statement x) { ... } for (; // <- empty init-statement int y : S()) ; } As usual, things may or may not go sideways in the presence of macros. While evaluating this diag on my codebase of interest, it was unsurprisingly discovered that Google Test macros are *very* prone to this. And it seems many issues are deep within the GTest itself, not in the snippets passed from the codebase that uses GTest. So after some thought, i decided not do issue a diagnostic if the semi is within *any* macro, be it either from the normal header, or system header. Fixes [[ | PR39111 ]] Reviewers: rsmith, aaron.ballman, efriedma Reviewed By: aaron.ballman Subscribers: cfe-commits Differential Revision:
  2151. [cmake] Fix detecting terminfo library Copy the fix for determining the correct terminfo library from LLVM -- use distinct variables for check_library_exists() calls. Otherwise, the first check (for -ltinfo) populates the variable and no other checks are performed. Effectively, systems with other libraries than the first one listed are presumed not to have terminfo routines at all. Also sync the check order to include the NetBSD fix from r347156. This partially fixes undefined symbols when linking XRay tests. It's probably not the best solution to the problem there but as long as the terminfo check stays in config-ix, I thnk it's worth fixing. Differential Revision:
  2152. [unittest] Skip W+X MappedMemoryTests when MPROTECT is enabled Skip all MappedMemoryTest variants that rely on memory pages being mapped for MF_WRITE|MF_EXEC when MPROTECT is enabled on NetBSD. W^X protection causes all those mmap() calls to fail, causing the tests to fail. Differential Revision:
  2153. [tsan] Add __cxa_guard_acquire hooks to support cooperative scheduling Reviewers: dvyukov Subscribers: krytarowski, kubamracek, llvm-commits Differential Revision:
  2154. [X86] Remove -verify-machineinstrs=0 now that PR38391 is fixed.
  2155. [Docs] Documentation for the saturation addition and subtraction intrinsics Differential Revision:
  2156. [InstCombine] add tests for funnel shifts; NFC These are included in D54666, so adding them first with baseline results. Patch by: @nikic (Nikita Popov)
  2157. [InstSimplify] fold funnel shifts with undef operands Splitting these off from the D54666. Patch by: nikic (Nikita Popov)
  2158. [InstSimplify] add tests for funnel shift with undef operands; NFC These are part of D54666, so adding them here before the patch to show the baseline (currently unoptimized) results. Patch by: @nikic (Nikita Popov)
  2159. [InstructionSimplify] Add support for saturating add/sub Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported: sat(X + 0) -> X sat(X + undef) -> -1 sat(X uadd MAX) -> MAX (and commutative variants) sat(X - 0) -> X sat(X - X) -> 0 sat(X - undef) -> 0 sat(undef - X) -> 0 sat(0 usub X) -> 0 sat(X usub MAX) -> 0 Patch by: @nikic (Nikita Popov) Differential Revision:
  2160. Add benchmarks for sorting and heap functions. Summary: Benchmarks for std::sort, std::stable_sort, std::make_heap, std::sort_heap, std::pop_heap and std::push_heap. The benchmarks are run with integers and strings, and with different sorted input. Reviewers: EricWF Subscribers: christof, mgrang, ldionne, libcxx-commits Differential Revision:
  2161. [ConstantFolding] Add support for saturating add/sub Support saturating add/sub in constant folding, based on the APInt methods introduced in D54332. Patch by: @nikic (Nikita Popov) Differential Revision:
  2162. [AMDGPU] Regenerate weird stores tests. Makes an upcoming SimplifyDemandedBits optimization much easier to understand.
  2163. [LoopSink] Add preheader to alias set This patch fixes PR39695. The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large. Differential Revision:
  2164. [APInt] Add methods for saturated add and sub This adds the sadd_sat, uadd_sat, ssub_sat, usub_sat methods for performing saturating additions and subtractions to APInt. Split out from D54237. Patch by: nikic (Nikita Popov) Differential Revision:
  2165. [NFC] Remove MS line endings in diagnostics file. Change-Id: I74704acf052e2e8fe707f18230bc5655c2bf2a91
  2166. [AST] Store the expressions in ParenListExpr in a trailing array Use the newly available space in the bit-fields of Stmt and store the expressions in a trailing array. This saves 2 pointer per ParenListExpr. Differential Revision: Reviewed By: rjmccall
  2167. [AST][NFC] Factor out some repeated code in ArraySubscriptExpr. Factor out the test for whether the LHS is the base of the array subscript expression into a private method lhsIsBase. NFC.
  2168. [PatternMatch] Handle undef vectors consistently This patch fixes the issue noticed in D54532. The problem is that cst_pred_ty-based matchers like m_Zero() currently do not match scalar undefs (as expected), but *do* match vector undefs. This may lead to optimization inconsistencies in rare cases. There is only one existing test for which output changes, reverting the change from D53205. The reason here is that vector fsub undef, %x is no longer matched as an m_FNeg(). While I think that the new output is technically worse than the previous one, it is consistent with scalar, and I don't think it's really important either way (generally that undef should have been folded away prior to reassociation.) I've also added another test case for this issue based on InstructionSimplify. It took some effort to find that one, as in most cases undef folds are either checked first -- and in the cases where they aren't it usually happens to not make a difference in the end. This is the only case I was able to come up with. Prior to this patch the test case simplified to undef in the scalar case, but zeroinitializer in the vector case. Patch by: @nikic (Nikita Popov) Differential Revision:
  2169. [AST][NFC] Pack ArraySubscriptExpr Use the newly available space in the bit-fields of Stmt. This saves one pointer per ArraySubscriptExpr.
  2170. [AArch64, x86] add tests for shift-not (PR39657); NFC
  2171. [clang-tidy] Don't generate incorrect fixes for class constructed from list-initialized arguments Summary: Currently the smart_ptr check (modernize-make-unique) generates the fixes that cannot compile for cases like below -- because brace list can not be deduced in `make_unique`. ``` struct Bar { int a, b; }; struct Foo { Foo(Bar); }; auto foo = std::unique_ptr<Foo>(new Foo({1, 2})); ``` Reviewers: aaron.ballman Subscribers: xazax.hun, cfe-commits Differential Revision:
  2172. Revert 347294, it turned many bots on red.
  2173. [DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.
  2174. Update the documentation for attribute feature tests. This clarifies that __has_cpp_attribute is no longer always an extension since it's now available in C++2a. Also, Both __has_cpp_attribute and __has_c_attribute can accept attribute scope tokens with alternative spelling (clang vs _Clang and gnu vs __gnu__).
  2175. [PowerPC] Add Itineraries for STWU/STWUX etc When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8. Since there are already multiple IIC for store update, this patch also merge IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU IIC_LdStSTDUX to IIC_LdStSTUX and we add a new testcase in to show the difference. Differential Revision:
  2176. [PowerPC][NFC]Add testcase for STWU scheduling check This patch add a STWU testcase for scheduling check. Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd, We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future). We will fix the missing itineraries of IIC_LdStStoreUpd in following patch, and update this testcase to show the scheduling difference only there. Differential Revision:
  2177. [llvm-exegesis][NFC] Some code style cleanup Apply review comments of to other target as well, specifically: 1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces 2. Add missing header to some files 3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode 4. Fix typo Differential Revision:
  2178. Fix MSVC 'truncation of constant value' warning. NFCI.
  2179. [clang-format] JS: don't treat is: as a type matcher Summary: Clang-format is treating all occurences of `is` in js as type matchers. In some cases this is wrong, as it might be a dict key. Reviewers: mprobst Reviewed By: mprobst Subscribers: cfe-commits Differential Revision:
  2180. [ASTImporter] Set redecl chain of functions before any other import Summary: FunctionDecl import starts with a lookup and then we create a new Decl. Then in case of CXXConstructorDecl we further import other Decls (base classes, members through CXXConstructorDecl::inits()) before connecting the redecl chain. During those in-between imports structural eq fails because the canonical decl is different. This commit fixes this. Synthesizing a test seemed extremely hard, however, Xerces analysis reproduces the problem. Reviewers: a_sidorin, a.sidorin Subscribers: rnkovacs, dkrupp, Szelethus, cfe-commits Differential Revision:
  2181. Allow force updating the NumCreatedFIDsForFileID. Our internal clients implement parsing cache based on FileID. In order for the Preprocessor to reenter the cached FileID it needs to reset its NumCreatedFIDsForFileID. Differential Revision:
  2182. [X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions. Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.
  2183. [X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero Noticed while working on improving demanded elts target shuffle shuffle combining
  2184. [TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits support For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask. I've raised to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem. Differential Revision:
  2185. [X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE. As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier.
  2186. [X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions. As discussed on rL347240.
  2187. [clangd] Replay preamble #includes to clang-tidy checks. Summary: This is needed to correctly handle checks that use IncludeInserter, which is very common. I couldn't find a totally safe example of a check to enable for testing, I picked modernize-deprecated-headers which some will probably hate. We should get configuration working... This depends on D54691 which ensures our calls to getFile(open=false) don't break subsequent accesses via the FileManager. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2188. [clangd] Allow observation of changes to global CDBs. Summary: Currently, changes *within* CDBs are not tracked (CDB has no facility to do so). However, discovery of new CDBs are tracked (all files are marked as modified). Also, files whose compilation commands are explicitly set are marked modified. The intent is to use this for auto-index. Newly discovered files will be indexed with low priority. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2189. [X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef. Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output. As near as I can tell this makes v16i8 behavior consistent with every other VT now. This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.
  2190. [X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1 This helps with a future patch and makes us less reliant on DAG combine merging shuffles.
  2191. Use is.constant intrinsic for __builtin_constant_p Summary: A __builtin_constant_p may end up with a constant after inlining. Use the is.constant intrinsic if it's a variable that's in a context where it may resolve to a constant, e.g., an argument to a function after inlining. Reviewers: rsmith, shafik Subscribers: jfb, kristina, cfe-commits, nickdesaulniers, jyknight Differential Revision:
  2192. [libclang] Unify getCursorDecl and getCursorParentDecl They do the same thing, thus the latter (which has only 2 call sites) can be deleted.
  2193. [X86] Replace more calls to getZeroVector with regular getConstant. getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it. The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.
  2194. Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches" The initial version of patch lacked Phi nodes updates in destinations of removed edges. This version contains this update and tests on this situation. Differential Revision:
  2195. [PowerPC] Don't combine to bswap store on 1-byte truncating store Turns out that there was no check for a store that truncates down to a single byte when combining a (store (bswap...)) into a byte-swapping store. This patch just adds that check. Fixes
  2196. [SelectionDAG] Compute known bits and num sign bits for live out vector registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks Summary: We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply. This should help with this ispc issue that a coworker pointed me to Reviewers: spatel, efriedma, RKSimon, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision:
  2197. [XRay] Add a test for allocator exhaustion Use a more representative test of allocating small chunks for oddly-sized (small) objects from an allocator that has a page's worth of memory.
  2198. Ensure FileManagerTest expects "\\" as path separator on Windows platforms
  2199. Driver: SCS is compatible with every other sanitizer. Because SCS relies on system-provided runtime support, we can use it together with any other sanitizer simply by linking the runtime for the other sanitizer. Differential Revision:
  2200. [ExecutionEngine][Interpreter] Fix out-of-bounds array access. If args is empty then accesing element 0 is illegal. Patch by Eugene Sharygin. Thanks Eugene!
  2201. [XRay] Move buffer extents back to the heap Summary: This change addresses an issue which shows up with the synchronised race between threads writing into a buffer, and another thread reading the buffer. In a lot of cases, we cannot guarantee that threads will always see the signal to finalise their buffers in time despite the grace periods and state machine maintained through atomic variables. This change addresses it by ensuring that the same instance being updated to indicate how much of the buffer is "used" by the writing thread is the same instance being read by the thread processing the buffer to be written out to disk or handled through the iterators. To do this, we ensure that all the "extents" instances live in their own the backing store, in a different contiguous page from the buffer-specific backing store. We also take precautions to ensure that the atomic variables are cache-line-sized to prevent false-sharing from unnecessarily causing cache contention on unrelated writes/reads. It's feasible that we may in the future be able to move the storage of the extents objects into the single backing store, slightly changing the way to compute the size(s) of the buffers, but in the meantime we'll settle for the isolation afforded by having a different backing store for the extents instances. Reviewers: mboerger Subscribers: jfb, llvm-commits Differential Revision:
  2202. [compiler-rt] Use zx_futex_wait_deprecated for Fuchsia sanitizer runtime This change is part of the soft-transition to the new synchronization primitives which implement priority inheritance. Differential Revision:
  2203. [DAGCombiner] reduce code duplication in visitXOR; NFC
  2204. [WebAssembly] Remove unused function return types (NFC) Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2205. [CodeView] Don't print PointerAttributes when dumping. PointerAttributes is a bitwise-or of several other fields, each of which is already printed on its own line with a better explanation. So this doesn't really help much.
  2206. Implement computeKnownBits for scalar_to_vector Differential Revision:
  2207. It's its
  2208. Add interceptor for the setvbuf(3) from NetBSD Summary: setvbuf(3) is a routine to setup stream buffering. Enable the interceptor for NetBSD. Add dedicated tests for setvbuf(3) and functions on top of this interface: setbuf, setbuffer, setlinebuf. Based on original work by Yang Zheng. Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: devnexen, tomsun.0.7, kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  2209. [Transforms] Prefer static and avoid namespaces, NFC Put 'static' on three functions in an anonymous namespace as per our coding style. Remove the 'namespace llvm {}' around the .cpp file and explicitly declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'. I prefer this style for free functions because the compiler will error out if the .h and .cpp files don't agree on the function name or prototype.
  2210. [X86] Rename combineVSZext->combineExtendVectorInreg. NFC Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use.
  2211. [NFC][libcxx] Fix incorrect comments
  2212. [X86] Add test case to show missed opportunity to use a single pmuludq to implement a multiply when a zext lives in another basic block. This can occur when one of the inputs to the multiply is loop invariant. Though my test cases just use two basic blocks with an unconditional jump which we won't merge until after isel in the codegen pipeline. For scalars, I believe SelectionDAGBuilder can add an AssertZExt to pass knowledge across basic blocks but its explicitly disabled for vectors.
  2213. AMDGPU: Fix V_FMA_F16 selection on GFX9 GFX9 should select opsel version. Differential Revision:
  2214. [libcxx] Fix XFAIL for GCC 4.9 The XFAIL started passing since we're only testing for trivial-copyability of reference_wrapper in C++14 and above. This commit constrains the XFAIL to gcc-4.9 with C++14 (it would also fail on C++17 and above, but those standards are not available with GCC 4.9).
  2215. [libcxx] Update test of trivial copyability of reference_wrapper N4151 is not an extension anymore, it was standardized in C++14.
  2216. [Coverage] Fix PR39258: support coverage regions that start deeper than they end popRegions used to assume that the start location of a region can't be nested deeper than the end location, which is not always true. Patch by Orivej Desh! Differential Revision:
  2217. [Sema] Fix PR38987: keep end location of a direct initializer list If PerformConstructorInitialization of a direct initializer list constructor is called while instantiating a template, it has brace locations in its BraceLoc arguments but not in the Kind argument. This reverts the hunk Patch by Orivej Desh! Differential Revision:
  2218. Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches" This reverts commits r347183 & r347184. Crashes while building libxml.
  2219. [AMDGPU] Restored selection of scalar_to_vector (v2x16) This works if DAG combiner is enabled, but without combining we cannot select scalar_to_vector of <2 x half> and <2 x i16>. Differential Revision:
  2220. [clang][CodeGen] Implicit Conversion Sanitizer: discover the world of CompoundAssign operators Summary: As reported by @regehr (thanks!) on twitter (, we (me) has completely forgot about the binary assignment operator. In AST, it isn't represented as separate `ImplicitCastExpr`'s, but as a single `CompoundAssignOperator`, that does all the casts internally. Which means, out of these two, only the first one is diagnosed: ``` auto foo() { unsigned char c = 255; c = c + 1; return c; } auto bar() { unsigned char c = 255; c += 1; return c; } ``` This patch does handle the `CompoundAssignOperator`: ``` int main() { unsigned char c = 255; c += 1; return c; } ``` ``` $ ./bin/clang -g -fsanitize=integer /tmp/test.c && ./a.out /tmp/test.c:3:5: runtime error: implicit conversion from type 'int' of value 256 (32-bit, signed) to type 'unsigned char' changed the value to 0 (8-bit, unsigned) #0 0x2392b8 in main /tmp/test.c:3:5 #1 0x7fec4a612b16 in __libc_start_main (/lib/x86_64-linux-gnu/ #2 0x214029 in _start (/build/llvm-build-GCC-release/a.out+0x214029) ``` However, the pre/post increment/decrement is still not handled. Reviewers: rsmith, regehr, vsk, rjmccall, #sanitizers Reviewed By: rjmccall Subscribers: mclow.lists, cfe-commits, regehr Tags: #clang, #sanitizers Differential Revision:
  2221. [InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phi Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi improves backtrace quality. Fixes
  2222. [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock Add methods to BasicBlock which make it easier to efficiently check whether a block has N (or more) predecessors. This can be more efficient than using pred_size(), which is a linear time operation. We might consider adding similar methods for successors. I haven't done so in this patch because succ_size() is already O(1). With this patch applied, I measured a 0.065% compile-time reduction in user time for running `opt -O3` on the sqlite3 amalgamation (30 trials). The change in mergeStoreIntoSuccessor alone saves 45 million linked list iterations in a stage2 Release build of llc. See for a harder but more general way of achieving similar results. Differential Revision:
  2223. [DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI operands (PR21207) Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs. Differential Revision:
  2224. Fix clang test suite on Windows by reverting part of r347216 Otherwise, the clang analyzer tests fail on Windows when attempting to unpickle AnalyzerTest objects in the worker processes. The pattern of, add to path, import, remove from path, serialize, deserialize, doesn't work. Once something gets added to the path, if we want to move it across the wire for multiprocessing, we need to keep the module on sys.path.
  2225. Fix Wdocumentation warning. NFCI.
  2226. Fix unused function warning.
  2227. [TargetLowering] expandFP_TO_UINT - improve fp16 support As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode. I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary. Differential Revision:
  2228. [IR] DISubprogram::toSPFlags(): fix "enumeral and non-enumeral type in conditional expression" /build/llvm/include/llvm/IR/DebugInfoMetadata.h: In static member function ‘static llvm::DISubprogram::DISPFlags llvm::DISubprogram::toSPFlags(bool, bool, bool, unsigned int)’: /build/llvm/include/llvm/IR/DebugInfoMetadata.h:1636:50: warning: enumeral and non-enumeral type in conditional expression [-Wextra] (IsLocalToUnit ? SPFlagLocalToUnit : 0) | ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ /build/llvm/include/llvm/IR/DebugInfoMetadata.h:1637:49: warning: enumeral and non-enumeral type in conditional expression [-Wextra] (IsDefinition ? SPFlagDefinition : 0) | ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ /build/llvm/include/llvm/IR/DebugInfoMetadata.h:1638:48: warning: enumeral and non-enumeral type in conditional expression [-Wextra] (IsOptimized ? SPFlagOptimized : 0)); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
  2229. Add missing stream operator for Polynomial class to fix debug builds.
  2230. [X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't one we care about We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them. This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see. Differential Revision:
  2231. Add missing closing bracket.
  2232. Fix build break from r347239
  2233. Fix Wdocumentation warning. NFCI.
  2234. Add docker configurations used by the buildbots. These are the scripts I use to create the docker images for the build bots and run them.
  2235. [X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703) SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction. Differential Revision:
  2236. [InterleavedLoadCombine] Fix warnings * remove unused function * fix compare
  2237. [X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane. Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together. This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb. Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split. Differential Revision:
  2238. [DebugInfo] DISubprogram flags get their own flags word. NFC. This will hold flags specific to subprograms. In the future we could potentially free up scarce bits in DIFlags by moving subprogram-specific flags from there to the new flags word. This patch does not change IR/bitcode formats, that will be done in a follow-up. Differential Revision:
  2239. [ARM] Attempt to fix arm selfhost bots after rL347191
  2240. Address comments.
  2241. Use digest size instead of hardcoding it.
  2242. [clangd] Store source file hash in IndexFile{In,Out} Summary: Puts the digest of the source file that generated the index into serialized index and stores them back on load, if exists. Reviewers: sammccall Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits Differential Revision:
  2243. [AMDGPU] Fix -Wunused-variable
  2244. [libcxx] Fix incorrect #include for std::hash Reviewed as Thanks to Andrey Maksimov for the patch.
  2245. [libcxx] Add missing <cstddef> includes in tests Some tests use type std::max_align_t, but don't include <cstddef> header directly. As a result, these tests won't compile against some conformant libraries. Reviewed as Thanks to Andrey Maksimov for the patch.
  2246. [AMDGPU] Convert insert_vector_elt into set of selects This allows to avoid scratch use or indirect VGPR addressing for small vectors. Differential Revision:
  2247. [llvm-nm] Fix use-after-free for MachOUniversalBinaries MachOObjectFile::getHostArch() returns a temporary, and getArchName returns a StringRef pointing to a temporary std::string. No tests since it doesn't trigger any errors except with the sanitizers.
  2248. [InterleavedLoadCombine] Fix warning unused variable Differential Revision:
  2249. [WebAssembly] replaced .param/.result by .functype Summary: This makes it easier/cleaner to generate a single signature from this directive. Also: - Adds the symbol name, such that we don't depend on the location of this directive anymore. - Actually constructs the signature in the assembler, and make the assembler own it. - Refactor the use of MVT vs ValType in the streamer and assembler to require less conversions overall. - Changed 700 or so tests to use it. Reviewers: sbc100, dschuff Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits Differential Revision:
  2250. [SelectionDAG] simplify vector select with undef operand(s)
  2251. [InterleavedLoadCombine] Remove unused include. NFC.
  2252. Revert "[LICM] Make LICM able to hoist phis" This reverts commit r347190.
  2253. Enable armv7/aarch64 build cache locations for clang builds
  2254. [AMDGPU] Derive GCNSubtarget from MF to get overridden target features Summary: AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the TM. However, this means that overridden target features are not detected and can result in incorrect behaviour. Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere in the same function). Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision:
  2255. [LV] Avoid vectorizing unsafe dependencies in uniform address Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision:
  2256. [libcxx] Add availability markup for bad_optional_access, bad_variant_access and bad_any_cast Reviewers: dexonsmith, EricWF Subscribers: christof, arphaman, libcxx-commits Differential Revision:
  2257. [Hexagon] make test immune to improvements in undef simplification
  2258. [x86] add/make tests immune to improvements in undef simplification
  2259. Fix some issues with LLDB's lit configuration files. Recently I tried to port LLDB's lit configuration files over to use a on the surface, but broke some cases that weren't broken before and also exposed some additional problems with the old approach that we were just getting lucky with. When we set up a lit environment, the goal is to make it as hermetic as possible. We should not be relying on PATH and enabling the use of arbitrary shell commands. Instead, only whitelisted commands should be allowed. These are, generally speaking, the lit builtins such as echo, cd, etc, as well as anything for which substitutions have been explicitly set up for. These substitutions should map to the build output directory, but in some cases it's useful to be able to override this (for example to point to an installed tools directory). This is, of course, how it's supposed to work. What was actually happening is that we were bringing in PATH and LD_LIBRARY_PATH and then just running the given run line as a shell command. This led to problems such as finding the wrong version of clang-cl on PATH since it wasn't even a substitution, and flakiness / non-determinism since the environment the tests were running in would change per-machine. On the other hand, it also made other things possible. For example, we had some tests that were explicitly running cl.exe and link.exe instead of clang-cl and lld-link and the only reason it worked at all is because it was finding them on PATH. Unfortunately we can't entirely get rid of these tests, because they support a few things in debug info that clang-cl and lld-link don't (notably, the LF_UDT_MOD_SRC_LINE record which makes some of the tests fail. The high level changes introduced in this patch are: 1. Removal of functionality - The lit test suite no longer respects LLDB_TEST_C_COMPILER and LLDB_TEST_CXX_COMPILER. This means there is no more support for gcc, but nobody was using this anyway (note: The functionality is still there for the dotest suite, just not the lit test suite). There is no longer a single substitution %cxx and %cc which maps to <arbitrary-compiler>, you now explicitly specify the compiler with a substitution like %clang or %clangxx or %clang_cl. We can revisit this in the future when someone needs gcc. 2. Introduction of the LLDB_LIT_TOOLS_DIR directory. This does in spirit what LLDB_TEST_C_COMPILER and LLDB_TEST_CXX_COMPILER used to do, but now more friendly. If this is not specified, all tools are expected to be the just-built tools. If it is specified, the tools which are not themselves being tested but are being used to construct and run checks (e.g. clang, FileCheck, llvm-mc, etc) will be searched for in this directory first, then the build output directory. 3. Changes to core llvm lit files. The use_lld() and use_clang() functions were introduced long ago in anticipation of using them in lldb, but since they were never actually used anywhere but their respective problems, there were some issues to be resolved regarding generality and ability to use them outside their project. 4. Changes to .test files - These are all just replacing things like clang-cl with %clang_cl and %cxx with %clangxx, etc. 5. Changes to - Previously we would load up some system environment variables and then add some new things to them. Then do a bunch of work building out our own substitutions. First, we delete the system environment variable code, making the environment hermetic. Then, we refactor the substitution logic into two separate helper functions, one which sets up substitutions for the tools we want to test (which must come from the build output directory), and another which sets up substitutions for support tools (like compilers, etc). 6. New substitutions for MSVC -- Previously we relied on location of MSVC by bringing in the entire parent's PATH and letting subprocess.Popen just run the command line. Now we set up real substitutions that should have the same effect. We use PATH to find them, and then look for INCLUDE and LIB to construct a substitution command line with appropriate /I and /LIBPATH: arguments. The nice thing about this is that it opens the door to having separate %msvc-cl32 and %msvc-cl64 substitutions, rather than only requiring the user to run vcvars first. Because we can deduce the path to 32-bit libraries from 64-bit library directories, and vice versa. Without these substitutions this would have been impossible. Differential Revision:
  2260. [LoopPass] fixing 'Modification' messages in -debug-pass=Executions for loop passes Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass run, however condition for issuing it is accumulated among all the runs. That leads to confusing 'modification' messages as soon as the first modification is done. Changing condition to be "current pass made modifications", similar to how it is being done in all other pass managers.
  2261. [OpenMP] Check target architecture supports unified shared memory for requires directive. Differential Review:
  2262. [SelectionDAG] simplify select FP with undef condition
  2263. [x86] add test for select FP with undef condition; NFC
  2264. [SelectionDAG] add simplifySelect() to reduce code duplication; NFC This should be extended to handle FP and vectors in follow-up patches.
  2265. [llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands(). Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision:
  2266. Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads. This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision:
  2267. [ThinLTO] Fix comment. NFC
  2268. [SelectionDAG] fix formatting; NFC
  2269. [FileManager] getFile(open=true) after getFile(open=false) should open the file. Summary: Old behavior is to just return the cached entry regardless of opened-ness. That feels buggy (though I guess nobody ever actually needed this). This came up in the context of clangd+clang-tidy integration: we're going to getFile(open=false) to replay preprocessor actions obscured by the preamble, but the compilation may subsequently getFile(open=true) for non-preamble includes. Reviewers: ilya-biryukov Subscribers: ioeric, kadircet, cfe-commits Differential Revision:
  2270. [llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors Summary: As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known, it will contain at most Points_.size() minus one (the center of the cluster) While that is the upper bound, meaning in the most cases, the actual count will be much smaller, since D54390 made the allocation persistent, we no longer have to worry about overly-optimistically `reserve()`ing. Old: (D54393) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% ) ... 6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% ) ... 6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% ) ``` And that is another -~4%. Since this is the last (as of this moment) patch in this patch series, it is a good time to summarize: Old: (svn trunk, as stated in D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` So these patches, on a given benchmark, has decreased llvm-exegesis analysis time by 74.62%. There surely is more room for further improvements. D54514 may improve thins by -11.5% more (relative to this patch). Parallelization may improve things further significantly, too. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision:
  2271. [llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% ) ... 7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% ) ... 6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% ) ``` And another -12%. You'd think it would be `inline`d anyway, but no! :) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision:
  2272. [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into llvm::SmallVectorImpl& output parameter Summary: I do believe this is the correct fix. We call `rangeQuery()` *very* often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help. Old: (D54389) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% ) ... 7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% ) ... 7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% ) ``` And another -7%. And that isn't even the good bit yet. Old: * calls to allocation functions: 2081419 * temporary allocations: 219658 (10.55%) * bytes allocated in total (ignoring deallocations): 4.31 GB New: * calls to allocation functions: 1880295 (-10%) * temporary allocations: 18758 (1%) (-91% *sic*) * bytes allocated in total (ignoring deallocations): 545.15 MB (-88% *sic*) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision:
  2273. [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace std::vector<> with std::deque<> in llvm::SetVector<> Summary: Old: (D54388) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% ) ... 8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% ) ... 7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% ) ``` Another -~7%. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, RKSimon Subscribers: tschuett, llvm-commits Differential Revision:
  2274. [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage. Summary: Old: (D54383) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% ) ... 9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% ) ... 8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% ) ``` So another -6%. That is because the `SmallVector` **doubles** it size when reallocating, which is great here, since we can't `reserve()` since we can't know how many `Neighbors` we will have. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Subscribers: tschuett, llvm-commits Differential Revision:
  2275. [llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision:
  2276. [llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision:
  2277. [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<> Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.469s user 0m9.797s sys 0m0.672s ``` So -60%. And that isn't the good bit yet. Old: * calls to allocation functions: 106560180 (yes, 107 *million* allocations.) * bytes allocated in total (ignoring deallocations): 12.17 GB New: * calls to allocation functions: 3347676 (-96.86%) (just 3 mil) * bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less) --- Two points i want to raise: * `std::unordered_set<>` should not have been used there in the first place. It is banned by the * There is no tests, so i'm not fully sure this is correct. Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok? * I tried other containers suggested in, this `llvm::SetVector<>` seems to be best here. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: kristina, bobsayshilol, tschuett, llvm-commits Differential Revision:
  2278. Fixed uninitialized variable issue. This commit should fix failing bots.
  2279. [X86] Add codegen tests for slow-shld scalar funnel shifts
  2280. Test commit - delete trailing space.
  2281. Test commit - delete a trailing space.
  2282. AMDGPU/InsertWaitcnts: Some more const-correctness Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision:
  2283. [ARM] Remove trunc sinks in ARM CGP Truncs are treated as sources if their produce a value of the same type as the one we currently trying to promote. Truncs used to be considered as a sink if their operand was the same value type. We now allow smaller types in the search, so we should search through truncs that produce a smaller value. These truncs can then be converted to an AND mask. This leaves sinks as being: - points where the value in the register is being observed, such as an icmp, switch or store. - points where value types have to match, such as calls and returns. - zext are included to ease the transformation and are generally removed later on. During this change, it also became apart from truncating sinks was broken: if a sink used a source, its type information had already been lost by the time the truncation happens. So I've changed the method of caching the type information. Differential Revision:
  2284. [LICM] Make LICM able to hoist phis The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that. This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important. Differential Revision:
  2285. [OpenCL] Fix address space deduction in template args. Don't deduce address spaces for non-pointer-like types in template args. Fixes PR38603! Differential Revision:
  2286. [MSP430] Optimize srl/sra in case of A >> (8 + N) There is no variable-length shifts on MSP430. Therefore "eat" 8 bits of shift via bswap & ext. Path by Kristina Bessonova! Differential Revision:
  2287. Fix disturbing warning - NFCI
  2288. [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.
  2289. [LoopSimplifyCFG] Add requires: asserts after rL347183
  2290. [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches This patch introduces infrastructure and the simplest case for constant-folding of branch and switch instructions within loop into unconditional branches. It is useful as a cleanup for such passes as loop unswitching that sometimes produce such branches. Only the simplest case supported in this patch: after the folding, no block should become dead or stop being part of the loop. Support for more sophisticated cases will go separately in follow-up patches. Differential Revision: Reviewed By: anna
  2291. [ProfileSummary] Standardize methods and fix comment Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision:
  2292. [X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.
  2293. [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.
  2294. [PowerPC] Set the default PLT mode on OpenBSD/powerpc to Secure PLT. OpenBSD/powerpc only supports Secure PLT.
  2295. Replace the UTF-8 characters in the error message.
  2296. [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions.
  2297. [X86] Add custom type legalization for extending v4i8/v4i16->v4i64. Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.
  2298. [X86] Add a 32-bit command line with only sse2 to vector-sext.ll and vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support. Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend.
  2299. [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. SSE vector shifts only use the bottom 64-bits of the shift amount vector.
  2300. [X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.
  2301. [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision:
  2302. [DAG] add undef simplifications for select nodes Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday...
  2303. Remove unused variable. NFCI.
  2304. [X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector Refactor towards making this recursive (necessary for PR38243 rotation splat detection). IsSplatVector returns the original vector source of the splat and the splat index. GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.
  2305. [x86] regenerate full checks; NFC
  2306. [SystemZ] make test immune to improvements in undef simplification
  2307. [Hexagon] make tests immune to improvements in undef simplification
  2308. [ARM] make test immune to improvements in undef simplification
  2309. Add the abseil-duration-factory-scale check. This check removes unneeded scaling of arguments when calling Abseil Time factory functions. Patch by Hyrum Wright.
  2310. [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.
  2311. [x86] make tests immune to improvements in undef handling
  2312. [SelectionDAG] simplify code; NFC
  2313. [X86][SSE] Add some generic masked gather codegen tests
  2314. [X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.
  2315. [analyzer][NFC] Move CheckerOptInfo to CheckerRegistry.cpp, and make it local CheckerOptInfo feels very much out of place in CheckerRegistration.cpp, so I moved it to CheckerRegistry.h. Differential Revision:
  2316. Swap order of discovering of -ltinfo and -lterminfo Summary: NetBSD ships with native curses(3) and -ltinfo is a part of ncurses. Set -lterminfo before -ltinfo, as it allows to prioritize native curses libraries. Mixing curses and ncurses does not work well, especially in software built on top of llvm. Original patch by Ryo Onodera (NetBSD) in pkgsrc. Reviewers: labath, dim, mgorny Reviewed By: dim, mgorny Subscribers: llvm-commits Differential Revision:
  2317. [WebAssembly] Add null streamer support Summary: Now `llc -filetype=null` works. Reviewers: eush Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits Differential Revision:
  2318. [WebAssembly] Add equality comparison operators for WasmEventType Summary: This was missing in D54096. Independent tests for this is not available here, because these are used in lld. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2319. [analyzer][UninitializedObjectChecker] Uninit regions are only reported once Especially with pointees, a lot of meaningless reports came from uninitialized regions that were already reported. This is fixed by storing all reported fields to the GDM. Differential Revision:
  2320. cmake: z3: Remove EXACT from 4.7.1 after being compatible with 4.8.1 After check-in of D54391 a comment there by @mikhail.ramalho says: Since we're supporting version 4.8.1 now, the cmake file should be changed to "minimum" instead of "exact". Differential Revision:
  2321. [X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI. I don't yet have any test cases for this, but its the right thing to do based on log file inspection.
  2322. [X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC
  2323. [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.
  2324. tighten up a couple of assertions. hitting the BitPosition == BitWidth case that was previously not caught resulted in nasty corruption of APInts that (on my system at least) could not be detected using UBSan, ASan, or Valgrind. this patch does not cause any extra failures in a check-all nor does it interfere with bootstrapping. David Blaikie informally approved this change.
  2325. [CorrelatedValuePropagation] Preserve debug locations (PR38178) Fix all of the missing debug location errors in CVP found by debugify. This includes the missing-location-after-udiv-truncation case described in
  2326. Fix bot failure from r347145 The #if check around the statistics computation gave an error about the statistic being an unused variable. Instead, guard with AreStatisticsEnabled().
  2327. [ThinLTO] Add some stats for read only variable internalization Summary: Follow up to D49362 ([ThinLTO] Internalize read only globals). Add a statistic on the number of read only variables (only counting live variables since dead variables will be dropped anyway). Reviewers: evgeny777 Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision:
  2328. [Clang] Add options -fprofile-filter-files and -fprofile-exclude-files to filter the files to instrument with gcov (after revert Summary: the previous patch ( has been reverted because of test failure under windows. So this patch fix the test cfe/trunk/test/CodeGen/code-coverage-filter.c. Reviewers: marco-c Reviewed By: marco-c Subscribers: cfe-commits, sylvestre.ledru Differential Revision:
  2329. [X86] Add support for matching PACKUSWB from a v64i8 shuffle.
  2330. [X86] Add test case to show missed opportunity to use PACKUSWB in v64i8 shuffle lowering.
  2331. Sink BuryPointer from Clang into LLVM for reuse there
  2332. Move BuryPointer from Clang to LLVM for use in other LLVM tools Specifically planning to use this in llvm-symbolizer to remove the cost of cleanup there.
  2333. [X86][SSE] Add shuffle demanded elts test case for PR39549
  2334. [AST][NFC] Pack CXXDefaultInitExpr Use the newly available space in the bit-fields of Stmt. This saves one pointer per CXXDefaultInitExpr.
  2335. [AST][NFC] Pack CXXDefaultArgExpr Use the newly available space in the bit-fields of Stmt. This saves one pointer per CXXDefaultArgExpr.
  2336. [AST][NFC] Pack CXXThrowExpr Use the newly available space in the bit-fields of Stmt. This saves 8 bytes per CXXThrowExpr.
  2337. [llvm-objdump] Print a blank row at the end of sections Summary: When using option `-x` (--all-headers), it will print `Sections`, `Symbol Table`, `Program Header` ... `Sections` and `Symbol Table` will be connected together. Before: ``` Sections: Idx Name Size Address Type 0 00000000 0000000000000000 ... 29 .shstrtab 0000011a 0000000000000000 SYMBOL TABLE: ... ``` After: ``` Sections: Idx Name Size Address Type 0 00000000 0000000000000000 ... 29 .shstrtab 0000011a 0000000000000000 SYMBOL TABLE: ... ``` Reviewers: Higuoxing Reviewed By: Higuoxing Subscribers: llvm-commits, jhenderson Differential Revision:
  2338. llvm-symbolizer: Avoid calling getFromOffset when the index entry is already available Especially for symbolizer it can be efficient to have to search through the entire index when it isn't needed - llvm-symbolizer looks up only a few CUs & already has an index available in getUnitForEntry, once it's passed down to DWARFUnitHeader::extract then there's no need for it to call getFromOffset.
  2339. Fix unused variable warning.
  2340. [clang-tidy/checks] Implement a clang-tidy check to verify Google Objective-C function naming conventions 📜 Summary: §1 Description This check finds function names in function declarations in Objective-C files that do not follow the naming pattern described in the Google Objective-C Style Guide. Function names should be in UpperCamelCase and functions that are not of static storage class should have an appropriate prefix as described in the Google Objective-C Style Guide. The function `main` is a notable exception. Function declarations in expansions in system headers are ignored. Example conforming function definitions: ``` static bool IsPositive(int i) { return i > 0; } static bool ABIsPositive(int i) { return i > 0; } bool ABIsNegative(int i) { return i < 0; } ``` A fixit hint is generated for functions of static storage class but otherwise the check does not generate a fixit hint because an appropriate prefix for the function cannot be determined. §2 Test Notes * Verified clang-tidy tests pass successfully. * Used to verify expected output of processing google-objc-function-naming.m Reviewers: benhamilton, hokein, Wizard, aaron.ballman Reviewed By: benhamilton Subscribers: Eugene.Zelenko, mgorny, xazax.hun, cfe-commits Tags: #clang-tools-extra Differential Revision:
  2341. [X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256.
  2342. [X86] Add test cases to show incorrect use of a 512 bit vector in v32i8 multiply lowering with prefer-vector-width=256. On the min-legal-vector-width test this actually causes some of the v32i16 operations we emitted to be scalarized.
  2343. Reverted r347092 due to the following build fails:
  2344. Add initial scaffolding for the GN build. See "GN build roundtable summary; adding GN build files to the repo" on llvm-dev and cfe-dev for discussion. In particular, this build is completely unsupported. People adding new files to LLVM are not expected to update the GN build files, and reviewers are not supposed to request the gn build files to be updated. This adds just enough to be able to build llvm/lib/Demangle. It requires using a monorepo. This adds a few build config options you can set in (`gn args out/foo --list` for all): - is_debug = true to enable debug builds (defaults to release) - llvm_enable_assertions to toggle assertions (defaults to true) - clang_base_path, if set an absolute path to a locally-built clang to be used as host compiler Differential Revision:
  2345. [X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask.
  2346. Use llvm::copy. NFC
  2347. [llvm-objcopy] Use llvm::all_of and rename the variables "Segment" to avoid confusion with the type of the same name
  2348. [hwasan] don't check tail magic when in right_align mode (should fix the bot)
  2349. [clangd] Fix crash hovering on non-decltype trailing return Summary: More specifically, hovering on "auto" in auto main() -> int { return 0; } Signed-off-by: Marc-Andre Laperle <> Reviewers: ilya-biryukov Reviewed By: ilya-biryukov Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2350. [hwasan] make the heap-buffer-overflow.c test more robust and re-enable it. With malloc_align_right the relative offsets of heap chunks are less predictable to simply don't test for them.
  2351. [hwasan] implement free_checks_tail_magic=1 Summary: With free_checks_tail_magic=1 (default) HWASAN writes magic bytes to the tail of every heap allocation (last bytes of the last granule, if the last granule is not fully used) and checks these bytes on free(). This feature will detect buffer overwires within the last granule at the time of free(). This is an alternative to malloc_align_right=[1289] that should have fewer compatibility issues. It is also weaker since it doesn't detect read overflows and reports bugs at free() instead of at access. Reviewers: eugenis Subscribers: kubamracek, delcypher, #sanitizers, llvm-commits Differential Revision:
  2352. Moved dag-combine-select-undef.ll into amdgpu. NFC. Tests really needs target arch to be specified.
  2353. Make git-llvm python3 compatible again. Hopefully. :)
  2354. Fixed test after r347110 Comments in llc outputs are printed differently on different platforms, some with '#', some with '##'. Removed non-essential part of the checks.
  2355. Add missing test for r347072 -gcodeview-ghash
  2356. DAG combiner: fold (select, C, X, undef) -> X Differential Revision:
  2357. [CMake] Use lld and llvm-objcopy for first stage compiler in Fuchsia When cross-compiling the second stage to a different target, we need to make sure that the first-stage compiler can produce binaries for that target. Using lld and llvm-objcopy as the default linker and objcopy tool eliminates some of the dependencies on the host toolchain. Differential Revision:
  2358. [hwasan] use reads instead of writes in a test
  2359. Revert "Cast the 2nd argument of _Unwind_SetIP() to _Unwind_Ptr" _Unwind_Ptr is unknown on some targets. Detected on green-dragon-21 (MacPro Late 2013 | OS X 10.14(18A391) | Xcode 10.1(10B61)).
  2360. [X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement.
  2361. Speed up git-llvm script by only svn up'ing affected directories. Also, support modifications to toplevel files in git (which need to be committed to "monorepo-root" in svn). Differential Revision:
  2362. Cast the 2nd argument of _Unwind_SetIP() to _Unwind_Ptr This modification is require for NetBSD with GCC, as there is a custom unwind.h header implementation with different types. No functional change intended for others. Cherry-picked chunk from D33878.
  2363. Cast _Unwind_GetIP() and _Unwind_GetRegionStart() to uintptr_t This modification is require for NetBSD with GCC, as there is a custom unwind.h header implementation with different types. No functional change intended for others. Cherry-picked chunk from D33878.
  2364. [X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization. If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed.
  2365. [sanitizer] Update global_symbols.txt
  2366. [WebAssembly] Cleanup unused declares in test code. NFC. In one case probably you have be using it, in the other it looks like it was redundant. Differential Revision:
  2367. [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with We need to control exponential behavior of loop-unswitch so we do not get run-away compilation. Suggested solution is to introduce a multiplier for an unswitch cost that makes cost prohibitive as soon as there are too many candidates and too many sibling loops (meaning we have already started duplicating loops by unswitching). It does solve the currently known problem with compile-time degradation (PR 39544). Tests are built on top of a recently implemented CHECK-COUNT-<num> FileCheck directives. Reviewed By: chandlerc, mkazantsev Differential Revision:
  2368. [OPENMP]Fix PR39694: do not capture `this` in non-`this` region. If lambda is used inside of the OpenMP region and captures `this`, we should recapture it in the OpenMP region also. But we should do this only if the OpenMP region is used in the context of the same class, just like the lambda.
  2369. [X86] Remove a branch on SSE4.1 from LowerLoad We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG.
  2370. [LegalizeVectorOps] After custom legalizing an extending load or a truncating store, make sure the custom code is also legal. For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization. Unfortunately, this doesn't seem to matter for the output of any existing lit tests.
  2371. [X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC
  2372. Preprocessing support in tablegen. Differential Revision:
  2373. [hwasan] disable one test line while investigating a bot failure
  2374. [PowerPC][NFC] Add tests for vector fp <-> int conversions This NFC patch just adds test cases for conversions that currently require scalarization of vectors. An updcoming patch will change the legalization for these and it is more suitable on the review to show the diferences in code gen rather than just the new code gen.
  2375. AArch64: Emit a call frame instruction for the shadow call stack register. When unwinding past a function that uses shadow call stack, we must subtract 8 from the value of the x18 register. This patch causes us to emit a call frame instruction that causes that to happen. Differential Revision:
  2376. Add new interceptor for mi_vector_hash(3) Summary: mi_vector_hash(3) provides fast 32bit hash functions. Add a test for this interface. Enable the API for NetBSD. Based on original work by Yang Zheng. Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: tomsun.0.7, kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  2377. [FNeg] Add FNeg Instruction to LangRef document The FNeg IR Instruction code was added with D53877. Differential Revision:
  2378. [libcxx] Add missing includes in tests A bunch of unordered containers tests call library functions but don't directly include the corresponding header files: - fabs() (defined in <cmath> which is not included); - is_permutation() (defined in <algorithm> which is not included); - next() (defined in <iterator> which is not included). - As a result, these tests won't compile against some conformant libraries. Reviewed as Thanks to Andrey Maksimov for the patch.
  2379. Add new interceptor for getmntinfo(3) from NetBSD Summary: getmntinfo gets information about mounted file systems. Add a dedicated test for new interceptor. Based on original work by Yang Zheng. Reviewers: joerg, vitalybuka Reviewed By: vitalybuka Subscribers: tomsun.0.7, kubamracek, llvm-commits, mgorny, #sanitizers Tags: #sanitizers Differential Revision:
  2380. [hwasan] optionally right-align heap allocations Summary: ... so that we can find intra-granule buffer overflows. The default is still to always align left. It remains to be seen wether we can enable this mode at scale. Reviewers: eugenis Reviewed By: eugenis Subscribers: jfb, dvyukov, kubamracek, delcypher, #sanitizers, llvm-commits Differential Revision:
  2381. [OPENMP][NVPTX]Emit correct reduction code for teams/parallel reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches.
  2382. [MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls Patch by Kristina Bessonova! Differential Revision:
  2383. [X86] Disable Condbr_merge pass Disable Condbr_merge pass for now due to PR39658. Will reenable the pass once the bug is fixed.
  2384. Removed off-line builders clang-cmake-mips and clang-cmake-mipsel, slaves mips-kl-m001, mips-kl-m002, mips-kl-erpro001.
  2385. Revert "[PowerPC] Make no-PIC default to match GCC - LLVM" This reverts commit r347069
  2386. Revert "[PowerPC] Make no-PIC default to match GCC - CLANG" This reverts commit r347070
  2387. [MSP430] Use R_MSP430_16_BYTE type for FK_Data_2 fixup Linker fails to link example like this (simplified case from newlib sources): $ cat test.c extern const char _ctype_b[]; struct _t { char *ptr; }; struct _t T = { ((char *) _ctype_b + 3) }; $ cat ctype.c char _ctype_b[4] = { 0, 0, 0, 0 }; LD: test.o:(.data+0x0): warning: internal error: unsupported relocation error We also follow gnu toolchain here, where 2-byte relocation mapped to R_MSP430_16_BYTE, instead of R_MSP430_16. Patch by Kristina Bessonova! Differential Revision:
  2388. [WebAssembly] Default to static reloc model Differential Revision:
  2389. [codeview] Expose -gcodeview-ghash for global type hashing Summary: Experience has shown that the functionality is useful. It makes linking optimized clang with debug info for me a lot faster, 20s to 13s. The type merging phase of PDB writing goes from 10s to 3s. This removes the LLVM cl::opt and replaces it with a metadata flag. After this change, users can do the following to use ghash: - add -gcodeview-ghash to compiler flags - replace /DEBUG with /DEBUG:GHASH in linker flags Reviewers: zturner, hans, thakis, takuto.ikuta Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits Differential Revision:
  2390. [PowerPC] Make no-PIC default to match GCC - CLANG Make the default -fno-PIC on Power PC. Differential Revision:
  2391. [PowerPC] Make no-PIC default to match GCC - LLVM Set -fno-PIC as the default option. Differential Revision:
  2392. [CMake] Accept ENTITLEMENTS in add_llvm_executable and llvm_codesign Summary: Allow code-signing with entitlements. FORCE may be used to avoid an error when replacing existing signatures. Reviewers: beanz, bogner Reviewed By: beanz Subscribers: mgorny, llvm-commits, lldb-commits Differential Revision:
  2393. [SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI. Prep work for PR39467.
  2394. [X86] Add codegen tests for scalar funnel shifts
  2395. GlobalDCE: Teach isEmptyFunction() to ignore debug intrinsics. This fixes PR39669.
  2396. [AST][NFC] Pack CXXThisExpr Use the newly available space in the bit-fields of Stmt. This saves 8 bytes per CXXThisExpr.
  2397. [AST][NFC] Pack CXXNullPtrLiteralExpr Use the newly available space in the bit-fields of Stmt. This saves one pointer per CXXNullPtrLiteralExpr.
  2398. [AST][NFC] Pack CXXBoolLiteralExpr Use the newly available space in Stmt. This saves 8 bytes per CXXBoolLiteralExpr.
  2399. [CodeGen] Expose some data types and accessors from StackMaps Summary: This is for supporting custom stack map formats, where the custom printer can access the stack map data. Patch by Cherry Zhang <>. Related: Reviewers: thanm, apilipenko Reviewed By: apilipenko Subscribers: llvm-commits Differential Revision:
  2400. [InstSimplify] add tests for saturating add/sub; NFC These are baseline tests for D54532. Patch based on the original tests by: @nikic (Nikita Popov)
  2401. [OpenCL] Enable address spaces for references in C++ Added references to the addr spaces deduction and enabled CL2.0 features (program scope variables and storage class qualifiers) to work in C++ mode too. Fixed several address space conversion issues in CodeGen for references. Differential Revision:
  2402. [InstSimplify] add test to demonstrate undef matching differences; NFC This is a baseline test for D54631. Patch by: @nikic (Nikita Popov)
  2403. [X86][SSE] Move number of input limit out of resolveTargetShuffleInputs. Only combineX86ShufflesRecursively needs this limit.
  2404. [clang-tidy] Expanded a test NFC Expanded the readability-inconsistent-declaration-parameter-name-macros.cpp to check notes and added a test with pasted tokens.
  2405. [libcxx] Mention restriction on inline namespaces in LIBCXX_ABI_NAMESPACE docs I also kept the original "vague" documentation that saying that users are responsible for not breaking us. This doesn't mean anything because there's no way they can actually enforce that unless we restrict ourselves to a specific naming scheme, but I left the documentation because it acts as a good warning and gives us more leeway.
  2406. [x86] regenerate complete checks for test; NFC
  2407. [IRVerifier] Allow StructRet in statepoint Summary: StructRet attribute is not allowed in vararg calls. The statepoint intrinsic is vararg, but the wrapped function may be not. Allow calls of statepoint with StructRet arg, as long as the wrapped function is not vararg. Reviewers: thanm, anna Reviewed By: anna Subscribers: anna, llvm-commits Differential Revision:
  2408. [DWARF] Use PRIx64 instead of 'x' to format 64-bit values This is a follow-up to r346715. Use PRIx64 to formatted print of 64-bit value in the `DWARFDebugLoclists::LocationList::dump` to escape problem on big-endian hosts.
  2409. [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X` Summary: As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!), we can fold the `Z` into 'control`, and let the `BEXTR` do this too. We could just insert those 8 bits of shift amount into control, but it is better to instead zero-extend them, and 'or' them in place. We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`, and not any of the sign-extended bits. The obvious question is, is this actually legal to do? I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`: * `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.` * `A START value exceeding the operand size will not extract any bits from the second source operand.` * `Only bit positions up to (OperandSize -1) of the first source operand are extracted.` * `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.` * `The destination register is cleared if no bits are extracted.` FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases. Reviewers: RKSimon, craig.topper, spatel, andreadb Reviewed By: RKSimon, craig.topper, andreadb Subscribers: llvm-commits Differential Revision:
  2410. Remove BUILD file from google-benchmark This was removed in r336666, but accidentally re-added in r346984.
  2411. [TargetLowering] Cleanup more of the EXTEND demanded bits cases so that they match. NFCI. Use the same variable names etc.
  2412. [clangd] Truncate SymbolID to 8 bytes. Summary: This is our goal. It has a non-zero rick, but so far we haven't see any collision (externally and internally). Reviewers: sammccall Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2413. [RISCV][NFC] Define and use the new CA instruction format The RISC-V ISA manual was updated on 2018-11-07 (commit 00557c3) to define a new compressed instruction format, RVC format CA (no actual instruction encodings were changed). This patch updates the RISC-V backend to define the new format, and to use it in the relevant instructions. Differential Revision: Patch by Luís Marques.
  2414. [RISCV] Constant materialisation for RV64I This commit introduces support for materialising 64-bit constants for RV64I, making use of the RISCVMatInt::generateInstSeq helper in order to share logic for immediate materialisation with the MC layer (where it's used for the li pseudoinstruction). test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit constant tests. It would be preferable if anyext constant returns were sign rather than zero extended (see PR39092). This patch simply adds an explicit signext to the returns in imm.ll. Further optimisations for constant materialisation are possible, most notably for mask-like values which can be generated my loading -1 and shifting right. A future patch will standardise on the C++ codepath for immediate selection on RV32 as well as RV64, and then add further such optimisations to RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for codegen and li expansion. Differential Revision:
  2415. [MSP430] Add support for .refsym directive Introduces support for '.refsym' assembler directive. From GCC docs (for MSP430): '.refsym' - This directive instructs assembler to add an undefined reference to the symbol following the directive. No relocation is created for this symbol; it will exist purely for pulling in object files from archives. Patch by Kristina Bessonova! Differential Revision:
  2416. [MSP430] Add more tests for ABI and calling convention Patch by Kristina Bessonova! Differential Revision:
  2417. [clangd] Fix a compiler warning and test crashes caused in rL347038.
  2418. Introduce shard storage to auto-index. Reviewers: sammccall, ioeric Reviewed By: sammccall Subscribers: llvm-commits, mgorny, Eugene.Zelenko, ilya-biryukov, jkorous, arphaman, cfe-commits Differential Revision:
  2419. [DAGCombine] Fix non-deterministic debug output PR37970 reported non-deterministic debug output, this was caused by iterating through a set and not a a vector. bugzilla: Differential Revision:
  2420. [clangd] Initial clang-tidy diagnostics support. Summary: This runs checks over a restricted subset of the TU: - preprocessor callbacks just receive the truncated PP events that occur when a preamble is used. - ASTMatchers run only over the top-level decls in the main-file This patch just turns on one simple check (bugprone-sizeof-expression) with no configuration. Configuration is complex enough to warrant a separate patch This depends on a patch allowing traversal to be restricted to a scope. Reviewers: hokein Subscribers: srhines, mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2421. [clang] - Simplify tools::SplitDebugName. This should be NFC change. SplitDebugName recently started to accept the `Output` that can be used to simplify the logic a bit, also it seems that code in SplitDebugName that uses OPT_fdebug_compilation_dir is simply dead. Differential revision:
  2422. [LegalizeVectorTypes] Teach WidenVecRes_Convert to turn ANY_EXTEND into ANY_EXTEND_VECTOR_INREG when the input and output types need to be widened to the same width. If we don't do it here, DAGCombine will just end up creating it from the scalar any_extend+build_vector so might as well save a step.
  2423. [ThinLTO] Internalize readonly globals An attempt to recommit r346584 after failure on OSX build bot. Fixed cache key computation in ThinLTOCodeGenerator and added test case
  2424. [X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening. By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get.
  2425. [X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization.
  2426. AMDGPU: Fix analyzeBranch failing with pseudoterminators If a block had one of the _term instructions used for gluing exec modifying instructions to the end of the block, analyzeBranch would fail, preventing the verifier from catching a broken successor list.
  2427. [CMake] Support cross-compiling with Fuchsia toolchain build When second stage is being cross-compiled for a different platform we need to build enough of first stage runtimes to get a working compiler. Differential Revision:
  2428. [CMake] Support cross-compiling with multi-stage builds When using multi-stage builds, we would like support cross-compilation. Example is 2-stage build when the first stage is compiled for host while the second stage is compiled for the target. Normally, the second stage would be also used for compiling runtimes, but that's not possible when cross-compiling, so we use the first stage compiler instead. However, we still want to use the second stage paths. To do so, we set the -resource-dir of the first stage compiler to point to the resource directory of the second stage. We also need compiler tools that support the target architecture. These tools are not guaranteed to be present on the host, but in case of multi-stage build, we can build these tools in the first stage. Differential Revision:
  2429. [compiler-rt] Use exact spelling when building for default target When building for default target only, use exact target spelling when deriving the name for the per-target runtime directory. This is necessary for AArch32 where the CMake build by default rewrites the architecture which leads to unexpected results. Differential Revision:
  2430. [CMake] Use the correct spelling for armv7 in Fuchsia's toolchain We need to explicitly specify the architecture version. Differential Revision:
  2431. [Clang][Sema]Choose a better candidate in overload function call if there is a compatible vector conversion instead of ambiguous call error There are 2 function variations with vector type parameter. When we call them with argument of different vector type we would prefer to choose the variation with implicit argument conversion of compatible vector type instead of incompatible vector type. For example, typedef float __v4sf __attribute__((__vector_size__(16))); void f(vector float); void f(vector signed int); int main { __v4sf a; f(a); } Here, we'd like to choose f(vector float) but not report an ambiguous call error. Differential revision:
  2432. [NativePDB] Rewrite the PdbSymUid to use our own custom namespacing scheme. Originally we created our 64-bit UID scheme by using the first byte as sort of a "tag" to represent what kind of symbol this was, and we re-used the PDB_SymType enumeration for this. For native pdb support, this is not really the right abstraction layer, because what we really want is something that tells us *how* to find the symbol. This means, specifically, is in the globals stream / public stream / module stream / TPI stream / etc, and for whichever one it is in, where is it within that stream? A good example of why the old namespacing scheme was insufficient is that it is more or less impossible to create a uid for a field list member of a class/struction/union/enum that tells you how to locate the original record. With this new scheme, the first byte is no longer a PDB_SymType enum but a new enum created specifically to identify where in the PDB this record lives. This gives us much better flexibility in what kinds of symbols the uids can identify.
  2433. [VFS] Update unittest to fix Windows buildbot. Buildbot is failing because it doesn't like paths in VFS, make them more Windows-friendly. Follow up to r347009.
  2434. Revert r347014 "[X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization." Apparently I failed to update this after turnign sign extend to any extend.
  2435. [X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization.
  2436. Added missing whitespace in the link.
  2437. [VFS] Implement `RedirectingFileSystem::getRealPath`. It fixes the case when Objective-C framework is added as a subframework through a symlink. When parent framework infers a module map and fails to detect a symlink, it would add a subframework as a submodule. And when we parse module map for the subframework, we would encounter an error like > error: umbrella for module 'WithSubframework.Foo' already covers this directory By implementing `getRealPath` "an egregious but useful hack" in `ModuleMap::inferFrameworkModule` works as expected. LLVM commit is r347009. rdar://problem/45821279 Reviewers: bruno, benlangmuir, erik.pilkington Reviewed By: bruno Subscribers: hiraditya, dexonsmith, JDevlieghere, cfe-commits, llvm-commits Differential Revision:
  2438. [X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply. We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend. This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate.
  2439. [X86] Update a couple comments to remove a mention of a sign extending that no longer happens. NFC
  2440. [VFS] Implement `RedirectingFileSystem::getRealPath`. It fixes the case when Objective-C framework is added as a subframework through a symlink. When parent framework infers a module map and fails to detect a symlink, it would add a subframework as a submodule. And when we parse module map for the subframework, we would encounter an error like > error: umbrella for module 'WithSubframework.Foo' already covers this directory By implementing `getRealPath` "an egregious but useful hack" in `ModuleMap::inferFrameworkModule` works as expected. rdar://problem/45821279 Reviewers: bruno, benlangmuir, erik.pilkington Reviewed By: bruno Subscribers: hiraditya, dexonsmith, JDevlieghere, cfe-commits, llvm-commits Differential Revision:
  2441. [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction.
  2442. [CUDA] updated CompileCudaWithLLVM.rst Differential Revision:
  2443. [analyzer] ConversionChecker: handle floating point Extend the alpha.core.Conversion checker to handle implicit converions where a too large integer value is converted to a floating point type. Each floating point type has a range where it can exactly represent all integers; we emit a warning when the integer value is above this range. Although it is possible to exactly represent some integers which are outside of this range (those that are divisible by a large enough power of 2); we still report cast involving those, because their usage may lead to bugs. (For example, if 1<<24 is stored in a float variable x, then x==x+1 holds.) Patch by: Donát Nagy! Differential Revision:
  2444. [WebAssembly] Change type of wake count to unsigned int Summary: We discussed this at the Nov 12th CG meeting, and decided to use the unsigned semantics for the wake count. Corresponding spec change: Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, sunfish, jfb, cfe-commits Differential Revision:
  2445. Re-apply r346985: [ADT] Drop llvm::Optional clang-specific optimization for trivially copyable types Remove a test case that was added with the optimization we are now removing.
  2446. [WebAssembly] Split BBs after throw instructions Summary: `throw` instruction is a terminator in wasm, but BBs were not splitted after `throw` instructions, causing machine instruction verifier to fail. This patch - Splits BBs after `throw` instructions in WasmEHPrepare and adding an unreachable instruction after `throw`, which will be deleted in LateEHPrepare pass - Refactors WasmEHPrepare into two member functions - Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass to match that of WasmEHPrepare pass, which is newly added. Now `eraseBBsAndChildren` does not delete BBs with remaining predecessors. - Fixes style nits, making static function names conform to clang-tidy - Re-enables the test temporarily disabled by rL346840 && rL346845 Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2447. [AMDGPU] NFC Test commit
  2448. AMDHSA: More code object v3 fixes: - Make sure IsaInfo::hasCodeObjectV3 returns true only for AMDHSA - Update assembler metadata tests to use v2 by default
  2449. [clang-tidy] Fix reference to -[NSError init] in AvoidNSErrorInitCheck.h
  2450. Remove myself as owner of clang-query. I haven't been involved with the project for years, so it's probably best for someone else to be the code owner. Differential Revision:
  2451. [CMake] Explicitly list Linux targets for Fuchsia toolchain Not all Linux targets use the ${arch}-linux-gnu spelling, so instead specify the list of Linux explicitly. Differential Revision:
  2452. Fix parens warning in assert in ASTMatchFinder Change-Id: Ie34f9c6846b98fba87449e73299519fc2346bac1
  2453. [X86] Remove ANY_EXTEND special case from canReduceVMulWidth Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused. I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation? Differential Revision:
  2454. [AMDGPU] Update code object metadata format documentation * Add amdhsa prefix to names to allow other tools to use the metadata without collision. * Make names consistent. * Simplify structure. * Change note record ID. * Switch from YAML to MsgPack format. * Document metadata assembler directive. Patch By: t-tye (Tony Tye) Differential Revision:
  2455. Revert "[ADT] Drop llvm::Optional clang-specific optmization for trivially copyable types" This reverts commit r346985. It looks like one of the unittests also needs to be updated, reverting while I investigate.
  2456. Disable filesystem benchmark when libstdc++ doesn't support it
  2457. [ADT] Drop llvm::Optional clang-specific optmization for trivially copyable types Summary: This fixes ABI mismatches between llvm compiled with clang and llvm compiled with gcc (PR39427). Reviewers: bkramer, sylvestre.ledru, mgorny, hans Reviewed By: bkramer, hans Subscribers: dexonsmith, kristina, llvm-commits Differential Revision:
  2458. Upgrade Google Benchmark library to ToT
  2459. [X86] Minor cleanup to getExtendInVec. NFCI Use unsigned to calculate the subvector index to avoid a cast. Remove an unnecessary condition and replace it with a stronger assert. Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue.
  2460. [InstCombine] adjust rotate direction in tests; NFC Copy/paste errors - all of the changed tests rotated left before.
  2461. [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us. In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs. Differential Revision:
  2462. [WebAssembly] Fix return type of nextByte Summary: The old return type did not allow for correct error reporting and was causing a compiler warning. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2463. [BinaryFormat] Add MsgPackTypes Add data structure to represent MessagePack "documents" and convert to/from both MessagePack and YAML encodings. Differential Revision:
  2464. [InstCombine] add tests for funnel shift (rotate) canonicalization; NFC
  2465. [X86] Guess that a CPU is Icelake it if reports support for AVX512VBMI2.
  2466. [LTO] Load sample profile in LTO link step. Summary: Load sample profile in LTO link step. ThinLTO calls populateModulePassManager to load the profile Reviewers: tejohnson, davidxl, danielcdh Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision:
  2467. [TTI] Reduction costs only need to include a single extract element cost We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision:
  2468. [AST] Store the string data in StringLiteral in a trailing array of chars Use the newly available space in the bit-fields of Stmt and store the string data in a trailing array of chars after the trailing array of SourceLocation. This cuts the size of StringLiteral by 2 pointers. Also refactor slightly StringLiteral::Create and StringLiteral::CreateEmpty so that StringLiteral::Create is just responsible for the allocation, and the constructor is responsible for doing all the initialization. This match what is done for the other classes in general. This patch should have no other functional changes apart from this. A concern was raised during review about the interaction between this patch and serialization abbreviations. I believe however that there is currently no abbreviation defined for StringLiteral. The only statements/expressions which have abbreviations are currently DeclRefExpr, IntegerLiteral, CharacterLiteral and ImplicitCastExpr. Differential Revision: Reviewed By: dblaikie, rjmccall
  2469. [InstCombine] fix rotate narrowing bug for non-pow-2 types
  2470. [AST][NFC] Various NFCs in StringLiteral Factored out of D54166 ([AST] Store the string data in StringLiteral in a trailing array of chars): * For-range loops in containsNonAscii and containsNonAsciiOrNull. * Comments and style fixes. * int -> unsigned in mapCharByteWidth since TargetInfo::getCharWidth and friends return an unsigned, and StringLiteral manipulates and stores CharByteWidth as an unsigned.
  2471. [InstCombine] add rotate narrowing tests with odd types; NFC There's a potential miscompile here. It's unlikely in the real world because this transform is guarded with shouldChangeType(), but this test file doesn't include a standard data-layout for some reason (despite including a custom 1), so we can see the bug.
  2472. [SLPVectorizer][X86] Regenerate reduction minmax tests and cleanup check prefixes
  2473. [SLPVectorizer][X86] Regenerate reduction tests and add PR37731 test Cleanup check prefixes
  2474. [X86] Fix MCNullStreamer support for modules with a CodeView flag This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag. The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks. Committed on behalf of @eush (Eugene Sharygin) Differential Revision:
  2475. [clang-tidy] Update checks to play nicely with limited traversal scope added in r346847 Summary: (See D54204 for original review) Reviewers: hokein Subscribers: xazax.hun, cfe-commits Differential Revision:
  2476. [InstSimplify] delete shift-of-zero guard ops around funnel shifts This is a problem seen in common rotate idioms as noted in: Note that we are not canonicalizing standard IR (shifts and logic) to the intrinsics yet. (Although I've written this before...) I think this is the last step before we enable that transform. Ie, we could regress code by doing that transform without this simplification in place. In PR34924, I questioned whether this is a valid transform for target-independent IR, but I convinced myself this is ok. If we're speculating a funnel shift by turning cmp+br into select, then SimplifyCFG has already determined that the transform is justified. It's possible that SimplifyCFG is not taking into account profile or other metadata, but if that's true, then it's a bug independent of funnel shifts. Also, we do have CGP code to restore a guard like this around an intrinsic if it can't be lowered cheaply. But that isn't necessary for funnel shift because the default expansion in SelectionDAGBuilder includes this same cmp+select. Differential Revision:
  2477. [RISCV] Mark C.EBREAK instruction as having side effects C.EBREAK was defined with hasSideEffects = 0, which is incorrect and inconsistent with the non-compressed instruction form. This patch corrects this oversight. This wouldn't cause codegen issues, as compressed instructions are only ever generated by converting the non-compressed form as an MCInst. But having correct flags is still worthwhile. Differential Revision: Patch by Luís Marques.
  2478. [RISCV] Mark FREM as Expand Mark the FREM SelectionDAG node as Expand, which is necessary in order to support the frem IR instruction on RISC-V. This is expanded into a library call. Adds the corresponding test. Previously, this would have triggered an assertion at instruction selection time. Differential Revision: Patch by Luís Marques.
  2479. [AST][NFC] Re-add comment in BinaryOperator which was removed by r346954
  2480. Start adding the supporting code to perform out-of-process allocator enumeration. Summary: This patch introduces the local portion (`LocalAddressSpaceView`) of the `AddressSpaceView` abstraction and modifies the secondary allocator so that the `ForEachChunk()` method (and its callees) would work in the out-of-process case when `AddressSpaceView` is `RemoteAddressSpaceView`. The `AddressSpaceView` abstraction simply maps pointers from a target process to a pointer in the local process (via its `Load()` method). For the local (in-process) case this is a no-op. For the remote (out-of-process) case this is not a no-op. The implementation of the out-of-process `RemoteAddressSpaceView` is not included in this patch and will be introduced later. This patch is considerably simpler than the `ObjectView` abstraction used in previous patches but lacks the type safety and stricter memory management of the `ObjectView` abstraction. This patch does not introduce any tests because with `LocalAddressSpaceView` it should be a non functional change and unit tests already cover the secondary allocator. When `RemoteAddressSpaceView` is landed tests will be added to ensure that it functions as expected. rdar://problem/45284065 Reviewers: kcc, kubamracek, dvyukov, vitalybuka, cryptoad, george.karpenkov, morehouse Subscribers: #sanitizers, llvm-commits Differential Revision:
  2481. [clangd] global-symbol-builder => clangd-indexer
  2482. [AST] Pack BinaryOperator Use the newly available space in the bit-fields of Stmt. This saves 8 bytes per BinaryOperator. Differential Revision: Reviewed By: dblaikie
  2483. [AST] Pack MemberExpr Use the newly available space in the bit-fields of Stmt to store some data from MemberExpr. This saves one pointer per MemberExpr. Differential Revision: Reviewed By: dblaikie
  2484. [AST][NFC] Move the friend decls to the top of MemberExpr The norm is to have them at the top, and having them at the bottom is painful for the reader.
  2485. [AST] Pack UnaryOperator Use the newly available space in the bit-fields of Stmt to store some data from UnaryOperator. This saves 8 bytes per UnaryOperator. Differential Revision: Reviewed By: dblaikie
  2486. Fix warning about unused variable [NFC]
  2487. Add missed files from prev. commit
  2488. [MSP430] Add MC layer Reapply r346374 with the fixes for modules build. Original summary: This change implements assembler parser, code emitter, ELF object writer and disassembler for the MSP430 ISA. Also, more instruction forms are added to the target description. Patch by Michael Skvortsov!
  2489. [clangd] Fix no results returned for global symbols in dexp Summary: For symbols in global namespace (without any scope), we need to add global scope "" to the fuzzy request. Reviewers: ioeric Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2490. [llvm-objdump] Use `auto` declaration in typecasting Summary: According to `MaskRay`, use `auto` for type inference, according to coding standards. Delete some comments, because these comments can be easily inferred from codes. Reviewers: jhenderson, MaskRay Reviewed By: jhenderson Subscribers: llvm-commits Differential Revision:
  2491. Revert "Introduce shard storage to auto-index." This reverts commit 6dd1f24aead10a8d375d0311001987198d26e900.
  2492. Revert "clang-format" This reverts commit 0a37e9c3d88a2e21863657df2f7735fb7e5f746e.
  2493. Revert "Address comments" This reverts commit 19a39b14eab2b5339325e276262b177357d6b412.
  2494. Revert "Address comments." This reverts commit b43c4d1c731e07172a382567f3146b3c461c5b69.
  2495. Address comments.
  2496. Address comments
  2497. clang-format
  2498. Introduce shard storage to auto-index. Reviewers: sammccall, ioeric Subscribers: ilya-biryukov, jkorous, arphaman, cfe-commits Differential Revision:
  2499. [RISCV] Introduce the RISCVMatInt::generateInstSeq helper Logic to load 32-bit and 64-bit immediates is currently present in RISCVAsmParser::emitLoadImm in order to support the li pseudoinstruction. With the introduction of RV64 codegen, there is a greater benefit of sharing immediate materialisation logic between the MC layer and codegen. The generateInstSeq helper allows this by producing a vector of simple structs representing the chosen instructions. This can then be consumed in the MC layer to produce MCInsts or at instruction selection time to produce appropriate SelectionDAG node. Sharing this logic means that both the li pseudoinstruction and codegen can benefit from future optimisations, and that this logic can be used for materialising constants during RV64 codegen. This patch does contain a behaviour change: addi will now be produced on RV64 when no lui is necessary to materialise the constant. In that case addiw takes x0 as the source register, so is semantically identical to addi. Differential Revision:
  2500. [X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization. This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb.
  2501. [X86] Add -x86-experimental-vector-widening-legalization versions of shuffle-vs-trunc tests.
  2502. propagate __config_site includes when building benchmarks
  2503. [WebAssembly] Renumber SIMD bitwise instructions Summary: Changed to match Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2504. Cosmetic, NFC.
  2505. NFC cleanup: Prefer make_unique over reset(new T())
  2506. Fix combining pragma __debug dump & parser_crash with -E Previously these would be transformed into annotation tokens and the preprocessor would then assume they were real tokens with source locations and assert/UB. Other pragmas that produce annotation tokens aren't a problem because they aren't handled if the parser isn't hooked up - ParsePragma.cpp registers those handlers & isn't run for pure preprocessing. So they're treated as unknown pragmas & printed verbatim by the preprocessor. Perhaps these pragmas should be treated the same way? But they got mixed in with other __debug pragmas that do need to be handled during preprocessing. The third __debug pragma that produces an annotation token is 'captured' - which had its own fix for this issue - by not inserting the annotation token in the first place if it detected that it was in preprocessing mode. I've removed that fix (from Lex/Pragma.cpp) in favor of the more general one in Frontend/PrintPreprocessedOutput.cpp.
  2507. Rewrite-imports on crash: Simplify handling -frewrite-imports already implies -frewrite-includes (it piggy-backs on/extends the implementation) so there's no need to conditionally pass -frewrite-includes when already using -frewrite-imports (& especially I don't think these would want to be different between crash reporting and not crash reporting)
  2508. Stmt bits: Make ExprBits relative to StmtBits Seems like it makes it a bit easier to read/validate/update in the future.
  2509. AMDGPU: Fix check lines in fdot2 test: GCN900 -> GFX900
  2510. [commit-test] Add blank line for test/tools/llvm-objdump/symbol-table-elf.test Summary: Test commit Reviewers: Higuoxing Reviewed By: Higuoxing Subscribers: llvm-commits, Higuoxing Differential Revision:
  2511. AMDGPU: Enable code object v3 for AMDHSA only Differential Revision:
  2512. Work around C++03 decltype limitations
  2513. [X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization. The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening.
  2514. CGDecl::emitStoresForConstant fix synthesized constant's name Summary: The name of the synthesized constants for constant initialization was using mangling for statics, which isn't generally correct and (in a yet-uncommitted patch) causes the mangler to assert out because the static ends up trying to mangle function parameters and this makes no sense. Instead, mangle to `"__const." + FunctionName + "." + DeclName`. Reviewers: rjmccall Subscribers: dexonsmith, cfe-commits Differential Revision:
  2515. Get tests compiling with -Wunused-local-typedef
  2516. [MachineOutliner][NFC] Check if CandidatesForRepeatedSeq < 2 There's no reason to call getOutliningCandidateInfo with a single candidate.
  2517. [libcxx] [test] Fix Clang -Wunused-local-typedef warnings. C++11's [hash.requirements] never required these typedefs from users.
  2518. [libcxx] [test] Include <cassert> for assert(). This fixes compiler errors with MSVC's STL.
  2519. [libcxx] [test] Fix MSVC warning C4800. This was implicitly converting [1, 3] to bool, which triggers an MSVC warning. The test should just pass `true`, which is simpler, has the same behavior, and avoids the warning. (This is a library test, not a compiler test, and the conversion happens before calling `push_back`, so passing [1, 3] isn't interesting in any way. This resembles a previous change to stop passing `1 == 1` in the `vector<bool>` tests.)
  2520. [X86] Remove unused variable
  2521. [X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization. On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead. There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue.
  2522. [codeview] Make "clang -g" emit codeview by default when targetting MSVC Summary: If you're using the Microsoft ABI, chances are that you want PDBs and codeview debug info. Currently, everyone has to remember to specific -gcodeview by default, when it would be nice if the standard -g option did the right thing by default. Also, do some related cleanup of -cc1 options. When targetting the MS C++ ABI, we probably shouldn't pass -debugger-tuning=gdb. We were also passing -gcodeview twice, which is silly. Reviewers: smeenai, zturner Subscribers: aprantl, JDevlieghere, llvm-commits Differential Revision:
  2523. Attempt to show progress bar in benchmark tests
  2524. Exclude check-cxx-benchmarks from the global test target.
  2525. [X86] Update masked expandload/compressstore test names
  2526. [InstSimplify] add more tests for funnel shift with select; NFC The cases are just different enough that we should have complete tests to avoid bugs from typos in the code.
  2527. [MachineOutliner][NFC] Don't compute liveness if X16/X17/NZCV are unused Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block, and also not live out. If this holds for all MBBs, then we can avoid checking for liveness on that candidate. Furthermore, if it holds for an individual candidate's MBB, then we can avoid checking for liveness on that candidate.
  2528. Remove unused getMDNodeFwdRefOrNull interfaces (NFC) Summary: Followup from D53596/r346891. Remove the getMDNodeFwdRefOrNull interface to the MDLoader since it is no longer used. Also improve error messages when the internal implementation is used within the MDLoader. Reviewers: steven_wu Subscribers: llvm-commits Differential Revision:
  2529. [X86][SSE] Add SSE2/SSE42 masked load/store tests Now that the load/store tests are split the impact of running the tests on multiple (illegal) targets is a lot less impactful
  2530. Bias physical register immediate assignments The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision:
  2531. [c++20] Implement P0482R6: enable -fchar8_t by default in C++20 mode. This unfortunately results in a substantial breaking change when switching to C++20, but it's not yet clear what / how much we should do about that. We may want to add a compatibility conversion from u8 string literals to const char*, similar to how C++98 provided a compatibility conversion from string literals to non-const char*, but that's not handled by this patch. The feature can be disabled in C++20 mode with -fno-char8_t.
  2532. [ThinLTO] Fix a crash in lazy loading of Metadata This is a revised version of D41474. When the debug location is parsed in BitcodeReader::parseFunction, the scope and inlinedAt MDNodes are obtained via MDLoader->getMDNodeFwdRefOrNull(), which will create a forward ref if they were not yet loaded. Specifically, if one of these MDNodes is in the module level metadata block, and this is during ThinLTO importing, that metadata block is lazily loaded. Most places in that invoke getMDNodeFwdRefOrNull have a corresponding call to resolveForwardRefsAndPlaceholders which will take care of resolving them. E.g. places that call getMetadataFwdRefOrLoad, or at the end of parsing a function-level metadata block, or at the end of the initial lazy load of module level metadata in order to handle invocations of getMDNodeFwdRefOrNull for named metadata and global object attachments. However, the calls for the scope/inlinedAt of debug locations are not backed by any such call to resolveForwardRefsAndPlaceholders. To fix this, change the scope and inlinedAt parsing to instead use getMetadataFwdRefOrLoad, which will ensure the forward refs to lazily loaded metadata are resolved. Fixes PR35472.
  2533. Attempt to add check-cxx-benchmarks rule for libc++ This patch attempts to make certain libc++ builders run the benchmarks using the newly added libc++ LIT based benchmark targets.
  2534. [X86] Split masked load/store test files
  2535. Rename cxx-benchmark-unittests target and convert to LIT. This patch renames the cxx-benchmark-unittests to check-cxx-benchmarks and converts the target to use LIT in order to make the tests run faster and provide better output. In particular this runs each benchmark in a suite one by one, allowing more parallelism while ensuring output isn't garbage with multiple threads. Additionally, it adds the CMake flag '-DLIBCXX_BENCHMARK_TEST_ARGS=<list>' to specify what options are passed when running the benchmarks.
  2536. [X86] Update masked load/store test names
  2537. AMDGPU: Additional pattern for i16 median3 matching min(max(a, b), max(min(a, b), c)) Differential Revision:
  2538. Mark @llvm.trap cold A call to @llvm.trap can be expected to be cold (i.e. unlikely to be reached in a normal program execution). Outlining paths which unconditionally trap is an important memory saving. As the hot/cold splitting pass (imho) should not treat all noreturn calls as cold, explicitly mark @llvm.trap cold so that it can be outlined. Split out of Differential Revision:
  2539. [Support] Teach YAMLIO about polymorphic types Add support for "polymorphic" types to YAMLIO. PolymorphicTraits can dynamically switch between other traits (Scalar, Map, or Sequence). When inputting, the PolymorphicTraits type is told which type to become, and when outputting the PolymorphicTraits type is asked which type it currently is. Also add support for TaggedScalarTraits to allow dynamically differentiating between multiple scalar types using YAML tags. Serialize empty maps as "{}" and empty sequences as "[]", so that types are preserved when round-tripping PolymorphicTraits. This change has equivalent semantics, but may break e.g. tests which compare output verbatim. Differential Revision:
  2540. [ThinLTO] Update handling of vararg functions to match inliner Summary: Previously we marked all vararg functions as non-inlinable in the function summary, which prevented their importing. However, the corresponding inliner restriction was loosened in r321940/r342675 to only apply to functions calling va_start. Adjust the summary flag computation to match. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision:
  2541. [AST] Fix typo in MicrosoftMangle Correct the spelling from Artifical to Artificial. Differential Revision:
  2542. [InstSimplify] add tests for funnel shift with select; NFC
  2543. [WebAssembly] Add support for dylink section in object format See Differential Revision:
  2544. [X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs. Differential Revision:
  2545. [libcxx] [test] Fix running tests on macOS with python3 Summary: The result of subprocess.check_output() is bytes in python3 which we need to convert to str(). Simplify this by using the executeCommand() helper. Reviewers: ldionne, EricWF Reviewed By: ldionne Subscribers: christof, libcxx-commits Differential Revision:
  2546. [InstCombine] Remove a couple of asserts based on incorrect assumptions Summary: These asserts are based on the assumption that the order of true/false operands in a select and those in the compare would always be the same. This fixes PR39595. Reviewers: craig.topper, spatel, dmgreen Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision:
  2547. [clangd] Delete unused includes.
  2548. [InstCombine] fix formatting for matchBSwap(); NFC We should have a similar function for matching rotate and/or funnel shift, so tidy up the related existing call.
  2549. [VPlan, SLP] Use SmallPtrSet for Candidates. This slightly improves the candidate handling in getBest().
  2550. [SimplifyCFG] Regenerate preserve-branchweights.ll test. NFC Regenerate this test using in preparation for an upcomming commit, to make it not depend on the names of instructions.
  2551. [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue
  2552. Reverted D52835 to fix review comments
  2553. [Diagnostics] Check integer to floating point number implicit conversions Summary: GCC already catches these situations so we should handle it too. GCC warns in C++ mode only (does anybody know why?). I think it is useful in C mode too. Reviewers: rsmith, erichkeane, aaron.ballman, efriedma, xbolva00 Reviewed By: xbolva00 Subscribers: efriedma, craig.topper, scanon, cfe-commits Differential Revision:
  2554. [AST][NFC] Order the bit-field classes of Stmt like in Reorder the bit-field classes and the members of the anonymous union so that they both match the order in There is already a fair amount of them, and this is not going to improve. Therefore lets try to keep some order here. Strictly NFC.
  2555. Document how to comment an actual parameter. Differential Revision:
  2556. [VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle. The caller should take care of only calling it with debug enabled.
  2557. [TTI] Pull out repeated 'ConcreteTTI' static_casts. NFCI.
  2558. [VPlan] Update ifdef.
  2559. [VPlan, SLP] Add simple SLP analysis on top of VPlan. This patch adds an initial implementation of the look-ahead SLP tree construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha, Luís F. W. Góes'. It returns an SLP tree represented as VPInstructions, with combined instructions represented as a single, wider VPInstruction. This initial version does not support instructions with multiple different users (either inside or outside the SLP tree) or non-instruction operands; it won't generate any shuffles or insertelement instructions. It also just adds the analysis that builds an SLP tree rooted in a set of stores. It does not include any cost modeling or memory legality checks. The plan is to integrate it with VPlan based cost modeling, once available and to only apply it to operations that can be widened. A follow-up patch will add a support for replacing instructions in a VPlan with their SLP counter parts. Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev Reviewed By: rengolin Differential Revision:
  2560. Adding myself as the code owner for clang-query as discussed in
  2561. [CostModel] Add generic expansion funnel shift cost support Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.
  2562. [clangd] Replace StringRef in SymbolLocation with a char pointer. Summary: This would save us 8 bytes per ref, and buy us ~40MB in total for llvm index (from ~300MB to ~260 MB). The char pointer must be null-terminated, and llvm::StringSaver guarantees it. Reviewers: sammccall Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2563. [llvm-objdump] Improve ELF file type checking statements (D54509)
  2564. [X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments. The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate.... Differential Revision:
  2565. [AST] Allow limiting the scope of common AST traversals (getParents, RAV). Summary: The goal is to allow analyses such as clang-tidy checks to run on a subset of the AST, e.g. "only on main-file decls" for interactive tools. Today, these become "problematically global" by running RecursiveASTVisitors rooted at the TUDecl, or by navigating up via ASTContext::getParent(). The scope is restricted using a set of top-level-decls that RecursiveASTVisitors should be rooted at. This also applies to the visitor that populates the parent map, and so the top-level-decls are considered to have no parents. This patch makes the traversal scope a mutable property of ASTContext. The more obvious way to do this is to pass the top-level decls to relevant functions directly, but this has some problems: - it's error-prone: accidentally mixing restricted and unrestricted scopes is a performance trap. Interleaving multiple analyses is common (many clang-tidy checks run matchers or RAVs from matcher callbacks) - it doesn't map well to the actual use cases, where we really do want *all* traversals to be restricted. - it involves a lot of plumbing in parts of the code that don't care about traversals. This approach was tried out in D54259 and D54261, I wanted to like it but it feels pretty awful in practice. Caveats: to get scope-limiting behavior of RecursiveASTVisitors, callers have to call the new TraverseAST(Ctx) function instead of TraverseDecl(TU). I think this is an improvement to the API regardless. Reviewers: klimek, ioeric Subscribers: mgorny, cfe-commits Differential Revision:
  2566. [WebAssembly] Make sure event-section XFAILs for build options rL346840 temporarily marked event-section.ll as XFAIL because it was failing for builds with LLVM_ENABLE_EXPENSIVE_CHECKS turned on, but to make sure it XFAILs even without LLVM_ENABLE_EXPENSIVE_CHECKS on we need this `-verify-machineinstrs` flag, which was missing in the previous commit.
  2567. Print newline after banner for ModulePass Before this commit, `llc -print-after-all` would print something like: *** IR Dump After Pre-ISel Intrinsic Lowering ***; ModuleID = ... Emit a newline such that ModuleID appears on a line by its own.
  2568. Recommit r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site). The underlying problem causing the expensive-check failure was fixed in rL346769.
  2569. [WebAssembly] Temporarily disable event-section.ll This test is failing in builds with LLVM_ENABLE_EXPENSIVE_CHECKS after rL346825 not because of the patch but due to a pre-existing codegen problem. Marking this as XFAIL temporarily until the bug is fixed.
  2570. [OpenCL] Fix invalid address space generation for clk_event_t Summary: Addrspace(32) was generated when putting 0 in clk_event_t * event_ret parameter for enqueue_kernel function. Patch by Viktoria Maksimova Reviewers: Anastasia, yaxunl, AlexeySotkin Reviewed By: Anastasia, AlexeySotkin Subscribers: cfe-commits Differential Revision:
  2571. [Clang] - Add '-gsplit-dwarf[=split,=single]' version for '-gsplit-dwarf' option. The DWARF5 specification says(Appendix F.1): "The sections that do not require relocation, however, can be written to the relocatable object (.o) file but ignored by the linker or they can be written to a separate DWARF object (.dwo) file that need not be accessed by the linker." The first part describes a single file split DWARF feature and there is no way to trigger this behavior atm. Fortunately, no many changes are required to keep *.dwo sections in a .o, the patch does that. Differential revision:
  2572. [clangd] Improve code completion for ObjC methods Summary: Previously code completion did not work well for Objective-C methods which contained multiple arguments as clangd did not expect to see multiple typed-text chunks when handling code completion. Note that even with this change, we do not consider selector fragments from previous arguments to be part of the signature (although we could in the future). Reviewers: sammccall Reviewed By: sammccall Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, jfb, kadircet, cfe-commits Differential Revision:
  2573. [clang-tidy] Avoid C arrays check Summary: [[ | PR39224 ]] As discussed, we can't always do the transform automatically due to that array-to-pointer decay of C array. In order to detect whether we can do said transform, we'd need to be able to see all usages of said array, which is, i would say, rather impossible if e.g. it is in the header. Thus right now no fixit exists. Exceptions: `extern "C"` code. References: * [[ | CPPCG ES.27: Use std::array or stack_array for arrays on the stack ]] * [[ | CPPCG SL.con.1: Prefer using STL array or vector instead of a C array ]] * HICPP `4.1.1 Ensure that a function argument does not undergo an array-to-pointer conversion` * MISRA `5-2-12 An identifier with array type passed as a function argument shall not decay to a pointer` Reviewers: aaron.ballman, JonasToth, alexfh, hokein, xazax.hun Reviewed By: JonasToth Subscribers: Eugene.Zelenko, mgorny, rnkovacs, cfe-commits Tags: #clang-tools-extra Differential Revision:
  2574. [X86] Add -x86-experimental-vector-widening command lines to pmulh.ll I've only added sse2 and sse4.1 variants as I'm only interested in the two v4i16 tests and I don't expect that to different with AVX other than a v prefix.
  2575. Correctly instantiate `iterator_adaptor_base` when defining `pointer_iterator` The definition of `pointer_iterator` omits what should be a `iterator_traits::<>::iterator_category` parameter from `iterator_adaptor_base`. As a result, iterators based on `pointer_iterator` always have defaulted value types and the wrong iterator category. The definition of `pointee_iterator` just a few lines above does this correctly. This resolves [[ | bug 39617 ]]. Patch by Dylan MacKenzie! Reviewers: dblaikie Differential Revision:
  2576. Converted _getClangCMakeBuildFactory to use LLVMBuildFactory.
  2577. Generate better names for automatic schedulers.
  2578. Added monitoring changes in lnt and test-suite projects.
  2579. [HIP] Fix device only compilation Fix a bug causing host code being compiled when --cude-device-only is set. Differential Revision:
  2580. [CMake] Include clang-apply-replacements in Fuchsia toolchain This is needed for Differential Revision:
  2581. [libcxx] [test] Strip trailing whitespace. NFC.
  2582. [WebAssembly] Add support for the event section Summary: This adds support for the 'event section' specified in the exception handling proposal. (This was named 'exception section' first, but later renamed to 'event section' to take possibilities of other kinds of events into consideration. But currently we only store exception info in this section.) The event section is added between the global section and the export section. This is for ease of validation per request of the V8 team. This patch: - Creates the event symbol type, which is a weak symbol - Makes 'throw' instruction take the event symbol '__cpp_exception' - Adds relocation support for events - Adds WasmObjectWriter / WasmObjectFile (Reader) support - Adds obj2yaml / yaml2obj support - Adds '.eventtype' printing support Reviewers: dschuff, sbc100, aardappel Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision:
  2583. [PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding, which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel. Differential Revision:
  2584. Revert r346810 "Preserve loop metadata when splitting exit blocks" It broke the Windows self-host:
  2585. [HeaderSearch] loadSubdirectoryModuleMaps should respect -working-directory Include search paths can be relative paths. The loadSubdirectoryModuleMaps function should account for that and respect the -working-directory parameter given to Clang. rdar://46045849 Differential Revision:
  2586. [CodeGen] Fix forward scan in MachineBasicBlock::computeRegisterLiveness. The scan was incorrectly skipping the first instruction, so a register could appear to be dead when it was actually live. This eventually leads to a machine verifier failure and miscompile in arm-ldst-opt. Differential Revision:
  2587. [CMake] Passthrough CFLAGS when checking the compiler-rt path This is needed when cross-compiling for a different target since CFLAGS may contain additional flags like -resource-dir which change the location in which compiler-rt builtins are found. Differential Revision:
  2588. Complete reverting r346191
  2589. Complete reverting r346191
  2590. [MachineOutliner][NFC] Use flags set in all candidates to check for calls If we keep track of if the ContainsCalls bit is set in the MBB flags for each candidate, then we have a better chance of not checking the candidate for calls at all. This saves quite a few checks in some CTMark tests (~200 in Bullet, for example.)
  2591. Make dsymutil more robust when parsing load commands. rdar://problem/45883463
  2592. [InstCombine] fold funnel shift amount based on demanded bits The shift amount of a funnel shift is modulo the scalar bitwidth: we can use demanded bits analysis on that operand to simplify it when we have a power-of-2 bitwidth. This is another step towards canonicalizing {shift/shift/or} to the intrinsics in IR. Differential Revision:
  2593. Make the ExpandTilde unit test expect "\" (not "/") on Win32
  2594. Add cxx-benchmark-unittests target This patch adds the cxx-benchmark-unittests target so we can start getting test coverage on the benchmarks, including building with sanitizers. Because we're only looking for test-coverage, the benchmarks run for the shortest time possible, and in parallel. The target is excluded from all by default. It only builds and runs the libcxx configurations of the benchmarks, and not any versions built against the systems native standard library.
  2595. Preserve loop metadata when splitting exit blocks LoopUtils.cpp contains a utility that splits an loop exit block, so that the new block contains only edges coming from the loop. In the case of nested loops, the exit path for the inner loop might also be the back-edge of the outer loop. The new block which is inserted on this path, is now a latch for the outer loop, and it needs to hold the loop metadata for the outer loop. (The test case gives a more concrete view of the situation.) Patch by Chang Lin (clin1) Differential Revision:
  2596. [MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo We already determine a bunch of information about an MBB in getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating things that must be false/true. The first thing we can easily check is if an outlined sequence could ever contain calls. There's no reason to walk over the outlined range, checking for calls, if we already know that there are no calls in the block containing the sequence.
  2597. Fix "use of" uninitialized memory in benchmark. An argument to DoNotOptimize was not fully initialized, which caused msan to complain.
  2598. [InstCombine] canonicalize rotate patterns with cmp/select The cmp+branch variant of this pattern is shown in: ...and as discussed there, we probably can't transform that without a rotate intrinsic. We do have that now via funnel shift, but we're not quite ready to canonicalize IR to that form yet. The case with 'select' should already be transformed though, so that's this patch. The sequence with negation followed by masking is what we use in the backend and partly in clang (though that part should be updated). %cmp = icmp eq i32 %shamt, 0 %sub = sub i32 32, %shamt %shr = lshr i32 %x, %shamt %shl = shl i32 %x, %sub %or = or i32 %shr, %shl %r = select i1 %cmp, i32 %x, i32 %or => %neg = sub i32 0, %shamt %masked = and i32 %shamt, 31 %maskedneg = and i32 %neg, 31 %shl2 = lshr i32 %x, %masked %shr2 = shl i32 %x, %maskedneg %r = or i32 %shl2, %shr2
  2599. OpenCL: Don't warn on v printf modifier This avoids spurious warnings, but could use a lot of work. For example the number of vector elements is not verified, and the passed value type is not checked. Fixes bug 39486
  2600. Mark #2184 as complete; the tests are fine. (I thought that they were wrong before)
  2601. [lsan] [FIXUP] Fixup for After the change, the tests started failing, as skipped sections can be equal in size to kMaxSegName. Changing `<` to `<=` to address the off-by-one problem.
  2602. [MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates Since we never outline anything with fewer than 2 occurrences, there's no reason to compute cost model information if there's less than that.
  2603. [Driver] Support g++ headers in include/g++ ray's gcc installation puts C++ headers in PREFIX/include/g++ without indicating a gcc version at all. Typically this is because the version is encoded somewhere in PREFIX. Differential Revision:
  2604. [AST] Revert r346793 and r346781 This somehow breaks the msan bots. Revert while I figure it out.
  2605. [AMDGPU] combine extractelement into several selects An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision:
  2606. [NFC] Mark LWG3128 and LWG3132 as requiring no work Those LWG issues were adopted in San Diego and require no work on our side.
  2607. [MemorySSA] Create query after checking if instruction is a fence. The alternative is checking if I is a fence in the Query constructor, so as to not attempt to get a non-existent MemoryLocation.
  2608. [AsmPrinter] Fix DebugInfo/X86/gnu-public-names.ll after rL346790
  2609. Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed. Differential Revision:
  2610. [AST][NFC] Order the bit-field classes of Stmt like in Reorder the bit-field classes and the members of the anonymous union so that they both match the order in There is already a fair amount of them, and this is not going to improve. Therefore lets try to keep some order here. Strictly NFC.
  2611. [lsan] [NFC] Change ARRAY_SIZE to internal_strnlen Calling ARRAY_SIZE on a char* will not actually compute it's size, but just the pointer size. A new Clang warning enabled by default warns about this. Replaced the call with internal_strnlen. Differential Revision:
  2612. [MS Demangler] Print public:, protected:, private: if set in FunctionClass or a variable's StorageClass. undname prints them, and the information is in the decorated name, so we probably shouldn't lose it when undecorating. I spot-checked a few of the funnier-looking outputs, and undname has the same output. Differential Revision:
  2613. [AsmPrinter] Rename a comment of .debug_gnu_pubnames entry Summary: The comment refers to the field as "Kind:". However, in gdb, names it "attributes", gdb/dwarf2read.c:dw2_symtab_iter_next refers to the whole value as "cu_index_and_attrs" Change it to `Attributes:` for consistency. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: aprantl, JDevlieghere, arphaman, llvm-commits Differential Revision:
  2614. DebugInfo: Add a driver flag for DWARF debug_ranges base address specifier use. Summary: This saves a lot of relocations in optimized object files (at the cost of some cost/increase in linked executable bytes), but gold's 32 bit gdb-index support has a bug ( ) so we can't switch to this unconditionally. (& even if it weren't for that bug, one might argue that some users would want to optimize in one direction or the other - prioritizing object size or linked executable size) Differential Revision:
  2615. DebugInfo: Add a CU metadata attribute for use of DWARF ranges base address specifiers Summary: Ranges base address specifiers can save a lot of object size in relocation records especially in optimized builds. For an optimized self-host build of Clang with split DWARF and debug info compression in object files, but uncompressed debug info in the executable, this change produces about 18% smaller object files and 6% larger executable. While it would've been nice to turn this on by default, gold's 32 bit gdb-index support crashes on this input & I don't think there's any perfect heuristic to implement solely in LLVM that would suffice - so we'll need a flag one way or another (also possible people might want to aggressively optimized for executable size that contains debug info (even with compression this would still come at some cost to executable size)) - so let's plumb it through. Differential Revision:
  2616. [NativePDB] Improved support for nested type reconstruction. In a previous patch, we pre-processed the TPI stream in order to build the reverse mapping from nested type -> parent type so that we could accurately reconstruct a DeclContext hierarchy. However, there were some issues. An LF_NESTTYPE record is really just a typedef, so although it happens to be used to indicate the name of the nested type and referring to the global record which defines the type, it is also used for every other kind of nested typedef. When we rebuild the DeclContext hierarchy, we want it to be as accurate as possible, which means that if we have something like: struct A { struct B {}; using C = B; }; We don't want to create two CXXRecordDecls in the AST each with the exact same definition. We just want to create one for B and then define C as an alias to B. Previously, however, it would not be able to distinguish between the two cases and it would treat A::B and A::C as being two classes each with separate definitions. We address the first half of improving the pre-processing logic so that only actual definitions are treated this way. Later, in a followup patch, we can handle the case of nested typedefs since we're already going to be enumerating the field list anyway and this patch introduces the general framework for distinguishing between the two cases. Differential Revision:
  2617. Add fneg instruction to syntax highlighting lists
  2618. [SelectionDAG][X86] Relax restriction on the width of an input to *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision:
  2619. [llvm-objcopy] Rename --keep to --keep-section. Summary: llvm-objcopy/strip support `--keep` (for sections) and `--keep-symbols` (for symbols). For consistency and clarity, rename `--keep` to `--keep-section`. In fact, for GNU compatability, -K is --keep-symbol, so it's weird that the alias `-K` is not the same as the short-ish `--keep`. Reviewers: jakehehrlich, jhenderson, alexshap, MaskRay, espindola Reviewed By: jakehehrlich, MaskRay Subscribers: emaste, arichardson, llvm-commits Differential Revision:
  2620. [AST][NFC] Style fixes for UnaryOperator In preparation for the patch which will move some data to the bit-fields of Stmt. In particular, rename the private variable "Val" -> "Operand" since the substatement is the operand of the unary operator. Run clang-format on UnaryOperator. NFC otherwise.
  2621. Fix UB in string.bench.cpp. The usage of aligned_storage failed to pass the alignment it wanted, which caused it to have a larger size and alignment that the std::string's it was intended to store. This patch manually specifies the alignment, as well as cleaning up type alias bugs.
  2622. [WebAssembly] Fix broken assumption that all bitcasts are to functions types Specifically, we can bitcast to void. Fixes PR39591 Differential Revision:
  2623. [FileSystem] Add expand_tilde function In D54435 there was some discussion about the expand_tilde flag for real_path that I wanted to expose through the VFS. The consensus is that these two things should be separate functions. Since we already have the code for this I went ahead and added a function expand_tilde that does just that. Differential revision:
  2624. [IR] Add a dedicated FNeg IR Instruction The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision:
  2625. [WebAssembly] Mark immediates.ll as XFAILed on MIPS hosts Usually MIPS hosts uses a legacy (non IEEE 754-2008) encoding for NaNs. Tests like `nan_f32` failed in attempt to compare hard-coded IEEE 754-2008 NaN value and a legacy NaN value provided by a system.
  2626. Remove duplicate entry for issue 3134
  2627. Update status for issue 3122
  2628. [AST][NFC] Pack DeclRefExpr Move the SourceLocation to the bit-fields of Stmt + clang-format. This saves one pointer per DeclRefExpr but otherwise NFC.
  2629. [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater. This patch updates DuplicateInstructionsInSplitBetween to update a DTU instead of applying updates to the DT directly. Given that there only are 2 users, also updated them in this patch to avoid churn. I slightly moved the code in CallSiteSplitting around to reduce the places where we have to pass in DTU. If necessary, I could split those changes in a separate patch. This fixes missing DT updates when dealing with musttail calls in CallSiteSplitting, by using DTU->deleteBB. Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki Reviewed By: NutshellySima
  2630. Revert "[ThinLTO] Internalize readonly globals" This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2.
  2631. [NFC][libcxx] Mark P1006R1 as complete
  2632. Implement P0972R0: <chrono> zero(), min(), and max() should be noexcept. Reviewed as
  2633. [NFC][libcxx] Mark P1006 as implemented in LLVM 8.0 It was implemented in
  2634. [libcxx] Implement, constexpr in pointer_traits Summary: P1006 adds support for constexpr in the specialization of pointer_traits for raw pointers. This is necessary in order to use pointer_traits in the upcoming constexpr containers. We expect P1006 to be voted into the working draft for C++20 at the San Diego meeting. Reviewers: mclow.lists, EricWF Subscribers: christof, dexonsmith, libcxx-commits Differential Revision:
  2635. [libcxx] GNU/Hurd uses BSD-based interfaces, but does not (and won't) provide <sys/sysctl.h> Reviewed as Thanks to sthibaul for the patch.
  2636. [InstCombine] add tests for funnel shift demanded bits; NFC
  2637. Fix uninitialized variable. Flags variable was not initialized and later used (both isMBBSafeToOutlineFrom implementations assume it's initialized), which breaks test/CodeGen/AArch64/machine-outliner.mir. under memory sanitizer: MemorySanitizer: use-of-uninitialized-value #0 in llvm::AArch64InstrInfo::getOutliningType(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&, unsigned int) const llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5494:9 #1 in (anonymous namespace)::InstructionMapper::convertToUnsignedVec(llvm::MachineBasicBlock&, llvm::TargetInstrInfo const&) llvm/lib/CodeGen/MachineOutliner.cpp:772:19 #2 in (anonymous namespace)::MachineOutliner::populateMapper((anonymous namespace)::InstructionMapper&, llvm::Module&, llvm::MachineModuleInfo&) llvm/lib/CodeGen/MachineOutliner.cpp:1543:14 #3 in (anonymous namespace)::MachineOutliner::runOnModule(llvm::Module&) llvm/lib/CodeGen/MachineOutliner.cpp:1645:3 #4 in (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1744:27 #5 in llvm::legacy::PassManagerImpl::run(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1857:44 #6 in compileModule(char**, llvm::LLVMContext&) llvm/tools/llc/llc.cpp:597:8
  2638. [CostModel][X86] Fix constant vector XOP rights shifts We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs.
  2639. [VectorUtils] Use namespace for InterleaveGroup template specialization.
  2640. [VPlan] VPlan version of InterleavedAccessInfo. This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision:
  2641. [NFC] Move storage of dispatch-version to GlobalDecl As suggested by Richard Smith, and initially put up for review here:, this patch removes a hack that was used to ensure that proper target-feature lists were used when emitting cpu-dispatch (and eventually, target-clones) implementations. As a part of this, the GlobalDecl object is proliferated to a bunch more locations. Originally, this was put up for review (see above) to get acceptance on the approach, though discussion with Richard in San Diego showed he approved of the approach taken here. Thus, I believe this is acceptable for Review-After-commit Differential Revision: Change-Id: I0a0bd673340d334d93feac789d653e03d9f6b1d5
  2642. [clang-format] Do not treat the asm clobber [ as ObjCExpr Summary: The opening square of an inline asm clobber was being annotated as an ObjCExpr. This caused, amongst other things, the ObjCGuesser to guess header files containing that pattern as ObjC files. Reviewers: benhamilton Reviewed By: benhamilton Subscribers: cfe-commits Differential Revision:
  2643. [TTI] Make TargetTransformInfo::getOperandInfo static. NFCI. It has no member dependencies and this makes it easier to reuse in other cost analysis code.
  2644. [CostModel][X86] Add more cost tests for funnel shifts Added full uniform/constant coverage for funnel shifts + rotates
  2645. Fix comment for XOP rotates. NFCI.
  2646. Add bracket that was lost in rL346727 and has been causing buildbot failures for some time.
  2647. Fix .cfi_restore with register numbers > 64 Summary: DW_CFA_restore can only encode register numbers up to 64 (6 bits unsigned int). For regsiter numbers > 64 we have to use DW_CFA_restore_extended instead which uses a ULEB128 value. I discovered this problem in the out-of-tree CHERI target since we use DWARF register number 89 for our return capability register. Reviewers: probinson, dblaikie, aprantl, espindola Reviewed By: dblaikie Subscribers: JohnReagan, emaste, JDevlieghere, llvm-commits Differential Revision:
  2648. Fix modules build of AVRAsmParser.cpp Summary: Without this change I get the following error: lib/Target/AVR/ error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant] Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: llvm-commits Differential Revision:
  2649. UserManual: Tweak the /Zc:dllexportInlines- docs some Addressing comments on
  2650. [SystemZ] Increase the number of VLREPs If a loaded value is replicated it is best to combine these two operations into a VLREP (load and replicate), but isel will not produce this if the load has other users as well. This patch handles this by putting the other users of the load to use the REPLICATE 0-element instead of the load. This way the load has only the REPLICATE node as user, and we get a VLREP. Review: Ulrich Weigand
  2651. [X86] Add more tests for -x86-experimental-vector-widening-legalization I'm looking into whether we can make this the default legalization strategy. Adding these tests to help cover the changes that will be necessary. This patch adds copies of some tests with the command line switch enabled. By making copies its easier to compare the two legalization strategies. I've also removed RUN lines from some of these tests that already had -x86-experimental-vector-widening-legalization
  2652. Attempt to make benchmarks fall back to -std=c++1z when C++17 isn't supported. The benchmarks currently require C++17, however Clang 3.9 doesn't support -std=c++17 while still supporting all the C++17 features needed to compile the benchmarks. This patch makes the benchmark build attempt to fall back to -std=c++1z when -std=c++17 isn't supported. See
  2653. Add emplace tests for multiset/unordered_multiset. This patch adds tests to ensure that multiset/unordered_multiset's emplace method correctly constructs the elements without any intervening constructions.
  2654. [FileCheck] fixing docs buildbot - use proper code-block type
  2655. Fix PR39619 - iterator_traits isn't SFINAE-friendly enough. Thanks to Eric for the report
  2656. [clang-cl] Do not allow using both /Zc:dllexportInlines- and /fallback flag Summary: /Zc:dllexportInlines with /fallback may cause unexpected linker error. It is better to disallow compile rather than warn for this combination. Reviewers: hans, thakis Reviewed By: hans Subscribers: cfe-commits, llvm-commits Differential Revision:
  2657. CMake: Deprecate using llvm-config to detect llvm installation Summary: clang currently uses llvm-config to determine the installation paths for llvm's headers and binaries. clang is also using LLVM's cmake files to determine other information about the LLVM build, like LLVM_LIBDIR_SUFFIX, LLVM_VERSION_*, etc. Since the installation paths are also available via the cmake files, we can simplify the code by only relying on information from cmake about the LLVM install and dropping the use of llvm-config altogether. In addition to simplifying the code, the cmake files have more accurate information about the llvm installation paths. llvm-config assumes that the lib, bin, and cmake directories are always located in the same place relative to the path of the llvm-config executable. This can be wrong if a user decides to install headers, binaries or libraries to a non-standard location: e.g. static libraries installed to /usr/lib/llvm6.0/ This patch takes the first step towards dropping llvm-config by removing the automatic detection of llvm-config (users can still manually supply a path to llvm-config by passing -DLLVM_CONFIG=/usr/bin/llvm-config to cmake) and adding a deprecation warning when users try to use this option. Reviewers: chandlerc, beanz, mgorny, chapuni Subscribers: mehdi_amini, dexonsmith, cfe-commits Differential Revision:
  2658. CMake: Replace open-coded find_package Reviewers: beanz, mgorny Reviewed By: mgorny Subscribers: cfe-commits, chapuni, llvm-commits Differential Revision:
  2659. [BuildingAJIT] Fixing the build by inserting a forgotten paren.
  2660. [commit test] Add blank line to test/tools/llvm-objdump/full-contents.test
  2661. [DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision:
  2662. [BuildingAJIT] Clang-format chapters 1 and 2.
  2663. [BuildingAJIT] Update chapter 2 to use the ORCv2 APIs.
  2664. [FileCheck] fixing small formatting error in docs
  2665. [libObject] Fix getDesc for Elf_Note_Impl This change fixes a bug in Elf_Note_Impl in which Elf_Word was used where uint8_t should have been used.
  2666. [FileCheck] fixing typo in assert
  2667. [FileCheck] introduce CHECK-COUNT-<num> repetition directive In some cases it is desirable to match the same pattern repeatedly many times. Currently the only way to do it is to copy the same check pattern as many times as needed. And that gets pretty unwieldy when its more than count is big. Introducing CHECK-COUNT-<num> directive which acts like a plain CHECK directive yet matches the same pattern exactly <num> times. Extended FileCheckType to a struct to add Count there. Changed some parsing routines to handle non-fixed length of directive (all currently existing directives were fixed-length). The code is generic enough to allow future support for COUNT in more than just PlainCheck directives. See motivating example for this feature in Reviewed By: chandlerc, dblaikie Differential Revision:
  2668. [MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner Turns out it's way simpler to do this check with one LRU. Instead of maintaining two, just keep one. Check if each of the registers is available, and then check if it's a live out from the block. If it's a live out, but available in the block, we know we're in an unsafe case.
  2669. Introduce DebugCounter into ConstProp pass Summary: This patch introduces DebugCounter into ConstProp pass at per-transformation level. It will provide an option to skip first n or stop after n transformations for the whole ConstProp pass. This will make debug easier for the pass, also providing chance to do transformation level bisecting. Reviewers: davide, fhahn Reviewed By: fhahn Subscribers: llozano, george.burgess.iv, llvm-commits Differential Revision:
  2670. [InstCombine] add rotate variants that include select; NFC
  2671. [MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom Instead of returning Flags, return true if the MBB is safe to outline from. This lets us check for unsafe situations, like say, in AArch64, X17 is live across a MBB without being defined in that MBB. In that case, there's no point in performing an instruction mapping.
  2672. [llvm-objcopy] Don't copy Config when processing --keep
  2673. [InstCombine] narrow width of rotate patterns, part 3 This is a longer variant for the pattern handled in rL346713 This one includes zexts. Eventually, we should canonicalize all rotate patterns to the funnel shift intrinsics, but we need a bit more infrastructure to make sure the vectorizers handle those intrinsics as well as the shift+logic ops. Name: narrow rotateright %neg = sub i8 0, %shamt %rshamt = and i8 %shamt, 7 %rshamtconv = zext i8 %rshamt to i32 %lshamt = and i8 %neg, 7 %lshamtconv = zext i8 %lshamt to i32 %conv = zext i8 %x to i32 %shr = lshr i32 %conv, %rshamtconv %shl = shl i32 %conv, %lshamtconv %or = or i32 %shl, %shr %r = trunc i32 %or to i8 => %maskedShAmt2 = and i8 %shamt, 7 %negShAmt2 = sub i8 0, %shamt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shl2 = lshr i8 %x, %maskedShAmt2 %shr2 = shl i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2
  2674. [DWARF] Do not use PRIx32 for printing uint64_t values The `DWARFDebugAddrTable::dump` routine prints 32/64-bits addresses. These values are stored in a vector of `uint64_t` independently of their original sizes. But `format` function gets format string with PRIx32 suffix in case of 32-bit address size. At least on MIPS 32-bit targets that leads to incorrect output. This patch changes formats strings and always use PRIx64 to print `uint64_t` values. Differential Revision:
  2675. Convert a condition into an assertion per post-review feedback; NFC intended.
  2676. [InstCombine] narrow width of rotate patterns, part 2 (PR39624) The sub-pattern for the shift amount in a rotate can take on several different forms, and there's apparently no way to canonicalize those without seeing the entire rotate sequence. This is the form noted in: %zx = zext i8 %x to i32 %maskedShAmt = and i32 %shAmt, 7 %shl = shl i32 %zx, %maskedShAmt %negShAmt = sub i32 0, %shAmt %maskedNegShAmt = and i32 %negShAmt, 7 %shr = lshr i32 %zx, %maskedNegShAmt %rot = or i32 %shl, %shr %r = trunc i32 %rot to i8 => %truncShAmt = trunc i32 %shAmt to i8 %maskedShAmt2 = and i8 %truncShAmt, 7 %shl2 = shl i8 %x, %maskedShAmt2 %negShAmt2 = sub i8 0, %truncShAmt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shr2 = lshr i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2
  2677. [GC][NFC] Simplify code now that we only have one safepoint kind This is the NFC follow up to exploit the semantic simplification from r346701
  2678. [InstCombine] refactor code for matching shift amount of a rotate; NFC As shown in existing test cases and with: ...we're missing at least 2 more patterns for rotate narrowing.
  2679. Use a data structure better suited for large sets in SimplificationTracker. Summary: D44571 changed SimplificationTracker to use SmallSetVector to keep phi nodes. As a result, when the number of phi nodes is large, the build time performance suffers badly. When building for power pc, we have a case where there are more than 600.000 nodes, and it takes too long to compile. In this change, I partially revert D44571 to use SmallPtrSet, which does an acceptable job with any number of elements. In the original patch, having a deterministic iteration order was mentioned as a motivation, however I think it only applies to the nodes already matched in MatchPhiSet method, which I did not touch. Reviewers: bjope, skatkov Reviewed By: bjope, skatkov Subscribers: llvm-commits Differential Revision:
  2680. [Sema] Make sure we substitute an instantiation-dependent default template argument Fixes Differential revision:
  2681. [X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387) This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision:
  2682. Fix the 'fixit' for inline namespace replacement. I'd neglected to add to the fixit for r346677. Richard Smith mentioned this in a review-after-commit, so fixing it here. Change-Id: I77e612be978d4eedda8d5bbd60b812b88f875cda
  2683. AMDGPU: Adding more median3 patterns min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision:
  2684. [InstCombine] add more tests for rotate narrowing; NFC
  2685. [GC docs] Update the gcroot documentation to reflect recent simplifcations to GCStrategy configurability
  2686. [GC] Remove so called PreCall safepoints Remove another bit of unused configuration potential from GCStrategy. It's not entirely clear what the intention here was, but from the docs, it sounds like this may have been subsumed by patchable call support. Note: This change is deliberately small to make it clear that while implemented, there's nothing using the option. A following NFC will do most of the simplifications.
  2687. [WebAssembly] Added WasmAsmParser. Summary: This is to replace the ELFAsmParser that WebAssembly was using, which so far was a stub that didn't do anything, and couldn't work correctly with wasm. This new class is there to implement generic directives related to wasm as a binary format. Wasm target specific directives are still parsed in WebAssemblyAsmParser as before. The two classes now cooperate more correctly too. Also implemented .result which was missing. Any unknown directives will now result in errors. Reviewers: dschuff, sbc100 Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits Differential Revision:
  2688. PR39628 Treat all non-zero values as 'true' in bool compound-assignment in constant evaluation, not just odd values.
  2689. [GC][InstCombine] Fix a potential iteration issue Noticed via inspection. Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops. May be a minor compile time problem today.
  2690. [X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not.
  2691. [NFC] Fix formatting in inline nested namespace definition. Apparently my invocation of clang-format in VIM didn't get this right, but the patch-version DID. This patch just runs CF on this file. Change-Id: Ied462a2d921cbb813fa427740d3ef6e97959b56d
  2692. NFC: DebugInfo: Reduce scope of DebugOffset to simplify code This was being used as a sort of indirect out parameter from shouldDump - seems simpler to use it as the actual result of the call. (this does mean using a pointer to an Optional & actually using all 3 states (null, None, and present) which is, admittedly, a tad subtle - but given the limited scope, seems OK to me - open to discussion though, if others feel strongly about it)
  2693. [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z Sometimes after basic block placement we end up with a code like: sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch. This is essentially a single s_cbranch_execz instruction when moved into a single new basic block. Differential Revision:
  2694. [InstCombine] regenerate checks; NFC
  2695. [CostModel][X86] Add funnel shift rotation special case costs When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.
  2696. [clang-format] Support breaking consecutive string literals for TableGen Summary: clang-format can get confused by string literals in TableGen: it knows that strings can be broken up, but doesn't seem to understand how that can be indented across line breaks, and arranges them in a weird triangular pattern. Take this output example from `clang-format tools/llvm-objcopy/` (which has now been formatted in rL345896 with this patch applied): ``` defm keep_global_symbols : Eq< "keep-global-symbols", "Reads a list of symbols from <filename> and " "runs as if " "--keep-global-symbol=<symbol> " "is set for each one. " "<filename> " "contains one " "symbol per line " "and may contain " "comments " "beginning " "with" " '#'" ". " "Lead" "ing " ``` Reviewers: alexshap, MaskRay, djasper Reviewed By: MaskRay Subscribers: krasimir, mgorny, cfe-commits Differential Revision:
  2697. Fix MachineInstr::findRegisterUseOperandIdx subreg checks The function only checks that instruction reads a super-register containing requested physical register. In case if a sub-register if being read that is also a use of a super-reg, so added the check. In particular MI->readsRegister() is broken because of the missing check. The resulting check is essentially regsOverlap(). Differential Revision:
  2698. [llvm-readelf] Make llvm-readelf more compatible with GNU readelf. Summary: This change adds a bunch of options that GNU readelf supports. There is one breaking change when invoked as `llvm-readobj`, and three breaking changes when invoked as `llvm-readelf`: - Add --all (implies --file-header, --program-headers, etc.) - [Breaking] -a is --all instead of --arm-attributes - Add --file-header as an alias for --file-headers - Replace --sections with --sections-headers, keeping --sections as an alias for it - Add --relocs as an alias for --relocations - Add --dynamic as an alias for --dynamic-table - Add --segments as an alias for --program-headers - Add --section-groups as an alias for --elf-section-groups - Add --dyn-syms as an alias for --dyn-symbols - Add --syms as an alias for --symbols - Add --histogram as an alias for --elf-hash-histogram - [Breaking] When invoked as `llvm-readelf`, -s is --symbols instead of --sections - [Breaking] When invoked as `llvm-readelf`, -t is no longer an alias for --symbols Reviewers: MaskRay, phosek, mcgrathr, jhenderson Reviewed By: MaskRay, jhenderson Subscribers: sbc100, aheejin, edd, jhenderson, silvas, echristo, compnerd, kristina, javed.absar, kristof.beyls, llvm-commits, Bigcheese Differential Revision:
  2699. [CostModel][X86] Add SHLD/SHRD scalar funnel shift costs The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level
  2700. [MachineOutliner][NFC] Early exit pruning when candidates don't share an MBB There's no way they can overlap in this case. This can save a few iterations when the candidate is close to the beginning of a MachineBasicBlock. It's particularly useful when the average length of a MachineBasicBlock in the program is small.
  2701. [MachineOutliner][NFC] Put suffix tree in buildCandidateList It's only used there, so it doesn't make much sense to have it in runOnModule.
  2702. [analyzer] Drastically simplify the tblgen files used for checkers Interestingly, only about the quarter of the emitter file is used, the DescFile entry hasn't ever been touched [1], and the entire concept of groups is a mystery, so I removed them. [1] Differential Revision:
  2703. Revert "Add a test checking clang-tidy can find libc++ on Mac" This reverts commit r346653.
  2704. Implement P1094R2 (nested inline namespaces) As approved for the Working Paper in San Diego, support annotating inline namespaces with 'inline'. Change-Id: I51a654e11ffb475bf27cccb2458768151619e384
  2705. [clang-tidy] fix ARM tests, because int and long have same width
  2706. Revert "Make clang-based tools find libc++ on MacOS" This breaks the LLDB bots.
  2707. [DWARFv5] Emit split type units in .debug_info.dwo. Differential Revision:
  2708. [clangd] Don't show all refs results if -name is ambiguous in dexp. Reviewers: ioeric Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2709. [CostModel][X86] Add some initial cost tests for funnel shifts Still need to add full uniform/constant coverage but this is enough to check basic fshl/fshr cost handling
  2710. [clangd] Allow symbols from AnyScope in dexp. Summary: We should allow symbols from any scope in dexp results, othewise `find StringRef` doesn't return any results (llvm::StringRef). Reviewers: ioeric Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2711. [clang-tidy] new check: bugprone-too-small-loop-variable The new checker searches for those for loops which has a loop variable with a "too small" type which means this type can't represent all values which are part of the iteration range. For example: ``` int main() { long size = 300000; for( short int i = 0; i < size; ++i) {} } ``` The short type leads to infinite loop here because it can't store all values in the `[0..size]` interval. In a real use case, size means a container's size which depends on the user input. Which means for small amount of objects the algorithm works, but with a larger user input the software will freeze. The idea of the checker comes from the LibreOffice project, where the same check was implemented as a clang compiler plugin, called `LoopVarTooSmall` (LLVM licensed). The idea is the same behind this check, but the code is different because of the different framework. Patch by ztamas. Reviewers: alexfh, hokein, aaron.ballman, JonasToth, xazax.hun, whisperity Reviewed By: JonasToth, whisperity Differential Revision:
  2712. [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector
  2713. [SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions Improve getCastInstrCost() by respecting the different types of Src and Dst for vector integer <-> fp conversions. This means that extracting from integer becomes more expensive (by the extraction penalty), and the extraction from fp becomes cheaper (no longer has a false extraction penalty). Review: Ulrich Weigand
  2714. [CostModel] Add more realistic SK_InsertSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.
  2715. [VectorUtils] add funnel-shifts to the list of vectorizable intrinsics This just identifies the intrinsics as candidates for vectorization. It does not mean we will attempt to vectorize under normal conditions (the test file is forcing vectorization). The cost model must be fixed to show that the transform is profitable in general. Allowing vectorization with these intrinsics is required to avoid potential regressions from canonicalizing to the intrinsics from generic IR:
  2716. [VectorUtils] reorder list of vectorizable intrinsics; NFC We need to add funnel-shifts to this list, so clean up the random order before it gets worse.
  2717. Revert rL346644, rL346642: the added test test/CodeGen/code-coverage-filter.c is failing under windows
  2718. [LoopVectorize] add tests for funnel shifts; NFC
  2719. Fix unused variable warning. NFCI.
  2720. [CostModel] Add more realistic SK_ExtractSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.
  2721. [RISCV] Support .option relax and .option norelax This extends the .option support from D45864 to enable/disable the relax feature flag from D44886 During parsing of the relax/norelax directives, the RISCV::FeatureRelax feature bits of the SubtargetInfo stored in the AsmParser are updated appropriately to reflect whether relaxation is currently enabled in the parser. When an instruction is parsed, the parser checks if relaxation is currently enabled and if so, gets a handle to the AsmBackend and sets the ForceRelocs flag. The AsmBackend uses a combination of the original RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the ForceRelocs flag to determine whether to emit relocations for symbol and branch diffs. Diff relocations should therefore only not be emitted if the relax flag was not set on the command line and no instruction was ever parsed in a section with relaxation enabled to ensure correct diffs are emitted. Differential Revision: Patch by Lewis Revill.
  2722. [DAGCombiner] Fix load-store forwarding of indexed loads. Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision:
  2723. Add a test checking clang-tidy can find libc++ on Mac Reviewers: sammccall, arphaman, EricWF Reviewed By: sammccall Subscribers: christof, cfe-commits Differential Revision:
  2724. Make clang-based tools find libc++ on MacOS Summary: When they read compiler args from compile_commands.json. This change allows to run clang-based tools, like clang-tidy or clangd, built from head using the compile_commands.json file produced for XCode toolchains. On MacOS clang can find the C++ standard library relative to the compiler installation dir. The logic to do this was based on resource dir as an approximation of where the compiler is installed. This broke the tools that read 'compile_commands.json' and don't ship with the compiler, as they typically change resource dir. To workaround this, we now use compiler install dir detected by the driver to better mimic the behavior of the original compiler when replaying the compilations using other tools. Reviewers: sammccall, arphaman, EricWF Reviewed By: sammccall Subscribers: ioeric, christof, kadircet, cfe-commits Differential Revision:
  2725. [llvm-mca] Correctly update the resource strategy for processor resources with multiple units. When looking at the tests committed by Roman at r346587, I noticed that numbers reported by the resource pressure for PdAGU01 were wrong. In particular, according to the aut-generated CHECK lines in tests memcpy-like-test.s and store-throughput.s, resource pressure for PdAGU01 was not uniformly distributed among the two AGEN pipes. It turns out that the reason why pressure was not correctly distributed, was because the "resource selection strategy" object associated with PdAGU01 was not correctly updated on the event of AGEN pipe used. As a result, llvm-mca was not simulating a round-robin pipeline allocation for PdAGU01. Instead, PdAGU1 was always prioritized over PdAGU0. This patch fixes the issue; now processor resource strategy objects for resources declaring multiple units, are correctly notified in the event of "resource used".
  2726. [newpm] Fix r346645: Missing consume of the Error return by the pipeline parser
  2727. [clangd] Remember to serialize AnyScope in FuzzyFindRequest json.
  2728. Add an OptimizerLast EP Summary: It turns out that we need an OptimizerLast PassBuilder extension point after all. I missed the relevance of this EP the first time. By legacy PM magic, function passes added at this EP get added to the last _Function_ PM, which is a feature we lost when dropping this EP for the new PM. A key difference between this and the legacy PassManager's OptimizerLast callback is that this extension point is not triggered at O0. Extensions to the O0 pipeline should append their passes to the end of the overall pipeline. Differential Revision:
  2729. [GCOV] fix test after patch rL346642 Summary: Test is failing under windows, so fix it. Should fix: Reviewers: marco-c Reviewed By: marco-c Subscribers: cfe-commits, sylvestre.ledru, marco-c Differential Revision:
  2730. [LICM] Hoist guards from non-header blocks This patch relaxes overconservative checks on whether or not we could write memory before we execute an instruction. This allows us to hoist guards out of loops even if they are not in the header block. Differential Revision: Reviewed By: fedor.sergeev
  2731. [Clang] Add options -fprofile-filter-files and -fprofile-exclude-files to filter the files to instrument with gcov Summary: These options are taking regex separated by colons to filter files. - if both are empty then all files are instrumented - if -fprofile-filter-files is empty then all the filenames matching any of the regex from exclude are not instrumented - if -fprofile-exclude-files is empty then all the filenames matching any of the regex from filter are instrumented - if both aren't empty then all the filenames which match any of the regex in filter and which don't match all the regex in filter are instrumented - this patch is a follow-up of Reviewers: marco-c, vsk Reviewed By: marco-c, vsk Subscribers: cfe-commits, sylvestre.ledru Differential Revision:
  2732. [GCOV] Add options to filter files which must be instrumented. Summary: When making code coverage, a lot of files (like the ones coming from /usr/include) are removed when post-processing gcno/gcda so finally they doen't need to be instrumented nor to appear in gcno/gcda. The goal of the patch is to be able to filter the files we want to instrument, there are several advantages to do that: - improve speed (no overhead due to instrumentation on files we don't care) - reduce gcno/gcda size - it gives the possibility to easily instrument only few files (e.g. ones modified in a patch) without changing the build system - need to accept this patch to be enabled in clang: Reviewers: marco-c, vsk Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru Differential Revision:
  2733. Release notes: Mention clang-cl's /Zc:dllexportInlines- flag
  2734. clang-cl: Add documentation for /Zc:dllexportInlines- Differential revision:
  2735. [clangd] Fix compile on very old glibc
  2736. [SystemZ] Replicate the load with most uses in buildVector() Iterate over all elements and count the number of uses among them for each used load. Then make sure to REPLICATE the load which has the most uses in order to minimize the number of needed element insertions. Review: Ulrich Weigand
  2737. [llvm-objdump] add more constraints for tests Patch by Higuoxing (Xing) Reviewers: jhenderson Reviewed By: jhenderson Differential Revision:
  2738. Fix compatibility with z3-4.8.1 With z3-4.8.1: ../tools/clang/lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp:49:40: error: 'Z3_get_error_msg_ex' was not declared in this scope ../tools/clang/lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp:49:40: note: suggested alternative: 'Z3_get_error_msg' Formerly used Z3_get_error_msg_ex() as one could find in z3-4.7.1 states: "Retained function name for backwards compatibility within v4.1" And it is implemented only as a forwarding call: return Z3_get_error_msg(c, err); Differential Revision:
  2739. Update to-do list with new work from WG21 meeting in San Diego
  2740. Support Swift in platform availability attribute Summary: This adds support for Swift platform availability attributes. It's largely a port of the changes made to for Swift availability attributes. Specifically, and . The implementation of attribute_availability_swift is a little different and additional tests in test/Index/availability.c were added. Reviewers: manmanren, friss, doug.gregor, arphaman, jfb, erik.pilkington, aaron.ballman Reviewed By: aaron.ballman Subscribers: aaron.ballman, ColinKinloch, jrmuizel, cfe-commits Differential Revision:
  2741. [GC] Remove unused configuration variable The custom root mechanism didn't actually do anything. ShadowStackGC, the only one which used it, just removed the gcroots before they reached the normal lowering in SelectionDAG. As a result, the state flag had no value.
  2742. [GC] Minor style modernization
  2743. [NFC] Reformat std::optional tests
  2744. [NFC] Fix typo in <tuple>
  2745. [CodeGen][CXX]: Fix no_destroy CG bug under specific circumstances Summary: Class with no user-defined destructor that has an inherited member that has a non-trivial destructor and a non-default constructor will attempt to emit a destructor despite being marked as __attribute((no_destroy)) in which case it would trigger an assertion due to an incorrect assumption. In addition this adds missing test coverage for IR generation for no_destroy. (Note that here use of no_destroy is synonymous with its global flag counterpart `-fno-c++-static-destructors` being enabled) Differential Revision:
  2746. [IPSCCP] Delete two forward declarations Summary: Use forward declaration as the reviewer is in favor of #include and delete a redundant declaration of Function. Reviewers: fhahn Reviewed By: fhahn Subscribers: llvm-commits Differential Revision:
  2747. [llvm-nm] Use WithColor for error reporting Use helpers from Support/WithError.h to print errors.
  2748. [llvm-objdump] Use WithColor for error reporting Use helpers from Support/WithError.h to print errors.
  2749. [llvm-undname] Use WithColor for error reporting Use helpers from Support/WithError.h to print errors.
  2750. [GCRoot] Remove some unneccessary complexity The GCStrategy provides three configuration options were are largely redundant. 1) Support for conditionally lowering gcread and gcwrite to loads and stores. This is redundant since any GC which wished to use these abstractions would lower them out of existance before the built in lowering anyways. As such, there's no need to have the lowering being conditional. 2) Conditional initialization for allocas marked via gcroot. Semantically, roots have to be initialized before first potential use. Arguably, the frontend really should have responsibility for that, but the old API allowed the frontend to ignore this detail. Only one builtin GC used the non-initializing mode. Since no one to my knowledge actually uses the ErlangGC strategy, I decide the slight pessimization was worth the simplicity. If that turns out to be problematic, we can always improve the insertion algorithm to detect more existing initializing stores.
  2751. [IPSCCP] Use forward declaration.
  2752. [IPSCCP,PM] Add missing #include in rL346618
  2753. [IPSCCP,PM] Preserve PDT in the new pass manager. Reviewers: kuhar, chandlerc, NutshellySima, brzycki Reviewed By: NutshellySima, brzycki Differential Revision:
  2754. [MC] Fix 3 objdump tests after rL346610
  2755. [DWARF] Change pubnames to use DWARFSection instead of StringRef Summary: The debug_info_offset values in .debug_{,gnu_}pub{name,types} may be relocated. Change it to DWARFSection so that we can get relocated values. Reviewers: ruiu, dblaikie, grimar, JDevlieghere Reviewed By: JDevlieghere Subscribers: aprantl, JDevlieghere, llvm-commits Differential Revision:
  2756. [llvm][test] Update tests using objdump Update tests using llvm-objdump since check strings don't match anymore due to the extra `O` in place. This is a followup for rL346610.
  2757. [llvm-objdump] Add symbol 'O' for object data Improve compatibility with GNU objdump by showing `O` next to global symbol names, instead of a blank space. Patch by Higuoxing (Xing). Reviewers: MaskRay Differential Revision:
  2758. [x86] auto-generate complete checks; NFC
  2759. [clangd] Make ClangdFuzzer compile again.
  2760. Make initializeOutputStream() return false on error and true on success. As discussed in Differential Revision:
  2761. [X86] Use DAG.getConstant instead of getZeroVector.
  2762. [Support] Make error banner optional in logAllUnhandledErrors In a lot of places an empty string was passed as the ErrorBanner to logAllUnhandledErrors. This patch makes that argument optional to simplify the call sites.
  2763. [X86] Replace calls to getOnesVector/getZeroVector with getConstant. getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization.
  2764. [llvm-cxxdump] Use error reporting helpers from support This patch makes llvm-cxxdump use the error reporting helpers from Support/WithColor.h
  2765. Pass the function type instead of the return type to FunctionDecl::Create Fix places where the return type of a FunctionDecl was being used in place of the function type FunctionDecl::Create() takes as its T parameter the type of function that should be created, not the return type. Passing in the return type looks to have been copypasta'd around a bit, but the number of correct usages outweighs the incorrect ones so I've opted for keeping what T is the same and fixing up the call sites instead. This fixes a crash in Clang when attempting to compile the following snippet of code with -fblocks -fsanitize=function -x objective-c++ (my original repro case): void g(void(^)()); void f() { __block int a = 0; g(^(){ a++; }); } as well as the following which only requires -fsanitize=function -x c++: void f(char * buf) { __builtin_os_log_format(buf, ""); } Patch by: Ben (bobsayshilol) Differential revision:
  2766. [DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an SDNode*. NFC Removes the need to call getNode internally and to recreate an SDValue after the call.
  2767. [InstCombine] simplify code for merging stores; NFCI
  2768. [x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision:
  2769. [InstCombine] auto-generate full checks; NFC
  2770. Fix DragonFlyBSD linkage issue. environ global failed on LTO linkage step.
  2771. [X86] Remove unused variable
  2772. [cxx_status] Update for San Diego motions.
  2773. [X86] Remove apparently unneeded code from combineVSZext. No lit tests fail with this code removed. This is a pre-commit for D54346.
  2774. [CostModel][X86] SK_ExtractSubvector costs must only be tested for vector types (PR39615)
  2775. [GC] Rename a header for consistency
  2776. [X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465) There are two AGU units, and per 1cy, there can be either two loads, or a load and a store; but not two stores, or two loads and a store. Additionally, loads shouldn't affect the store scheduler and vice versa. (but *should* affect the PdEX scheduler.) Required rL346545. Fixes
  2777. [python] Support PathLike filenames and directories Python 3.6 introduced a file system path protocol (PEP 519[1]). The standard library APIs accepting file system paths now accept path objects too. It could be useful to add this here as well for convenience. [1] Authored by: jstasiak (Jakub Stasiak) Differential Revision:
  2778. [NFC][MCA][BdVer2] Add bdver2 runline into register-file-statistics.s test Missed this one by accident when adding the initial version in rL345463 / rL345462
  2779. [ThinLTO] Internalize readonly globals This patch allows internalising globals if all accesses to them (from live functions) are from non-volatile load instructions Differential revision:
  2780. [clang]: Fix misapplied patch in 346582.
  2781. Correct naming conventions and 80 col rule violation in CGDeclCXX.cpp. NFC. Differential Revision:
  2782. [X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an any_extend of the remainder from an 8-bit sdivrem. The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them.
  2783. [X86] Add a test case to show scalarized vector srem to demonstrate unnecessary instructions. NFC After the division %ah is being sign extended to move it to lower byte of a register while avoiding a partial register read. We then zero extend the low byte to the full 32 bit register. But we don't use any of the zero extended bits. In the DAG the zero extend was really an any_extend so the sign extend should have been enough.
  2784. Correct atexit(3) support in MSan/NetBSD Summary: The NetBSD specific implementation of cxa_atexit() does not preserve the 2nd argument if dso is equal to NULL. Changes: - Split paths of handling intercepted __cxa_atexit() and atexit(3). This affects all supported Operating Systems. - Add a local stack-like structure to hold the __cxa_atexit() context. atexit(3) is documented in the C standard as calling callback from the earliest to the oldest entry. This path also fixes potential ABI problem of passing an argument to a function from the atexit(3) callback mechanism. - Allow usage of global vars with ctors in interceptors. This allows to use Vector without automatic cleaning up the structures. This code has been modeled after TSan implementation for the same functions. Sponsored by <The NetBSD Foundation> Reviewers: joerg, dvyukov, eugenis, vitalybuka, kcc Reviewed By: vitalybuka Subscribers: delcypher, devnexen, llvm-commits, #sanitizers Tags: #sanitizers Differential Revision:
  2785. Fix DragonFlyBSD build Reviewers: rnk, thakis Reviewed By: krytarowski Differential Revision:
  2786. RegAllocFast: Further cleanups; NFC
  2787. test/CodeGen/X86: Relax test case No need to hardcode register or expecting totally unnecessary spills from the allocator.
  2788. [X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH. This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems. While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own.
  2789. [WebAssembly] Update bleeding-edge cpu features Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision:
  2790. [GC] Simplify linking of GC builtin GC strategies
  2791. [ARM64] [Windows] Handle funclets This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling. Patch by Sanjin Sijaric, with some fixes by me. Differential Revision:
  2792. [libcxx] Provide thread annotations for shared_mutex shared_mutex was introduced in C++17 but its implementation currently doesn't use Clang's thread annotations like regular mutex. This change adds those. Differential Revision:
  2793. Fix ClangFormat issue of recognizing ObjC subscript as C++ attributes when message target is a result of a C-style method. Summary: The issue is that for array subscript like: ``` arr[[Foo() bar]]; ``` ClangFormat will recognize it as C++11 attribute syntax and put a space between 'arr' and first '[', like: ``` arr [[Foo() bar]]; ``` Now it is fixed. Tested with: ``` ninja FormatTests ``` Reviewers: benhamilton Reviewed By: benhamilton Subscribers: cfe-commits Differential Revision:
  2794. [AVR] Reorder the CHECK lines in directmem.ll to match current trunk In r346432 ("[DAGCombine] Improve alias analysis for chain of independent stores"), the order of ldi/sts blocks changed. The new IR is equivalent to the old IR. This patch updates the test to fix the test suite.
  2795. [SelectionDAG] Fix a -Wparentheses warning from gcc in an assert. NFC gcc wants parentheses around the logical OR since there is a logical AND for the string.
  2796. [ARM] Add MemOperand to LDRcp to enable DCE. LDRcp should be deleted when the dest register is dead in register coalescing. Without MemOp, dead LDRcp will cause dead constant pool value which references to non-existing label. Patch by Yin Ma. Differential Revision:
  2797. [JumpThreading] Fix exponential time algorithm computing known values. ComputeValueKnownInPredecessors has a "visited" set to prevent infinite loops, since a value can be visited more than once. However, the implementation didn't prevent the algorithm from taking exponential time. Instead of removing elements from the RecursionSet one at a time, we should keep around the whole set until ComputeValueKnownInPredecessors finishes, then discard it. The testcase is synthetic because I was having trouble effectively reducing the original. But it's basically the same idea. Instead of failing, we could theoretically cache the result instead. But I don't think it would help substantially in practice. Differential Revision:
  2798. Re-land r343606 "[winasan] Unpoison the stack in NtTerminateThread" This change was reverted because it caused some nacl tests in chromium to fail. I attempted to reproduce those problems locally, but I was unable to. Let's reland this and let Chromium's test infrastructure discover any problems.
  2799. Revert "Exclude wasm target from Windows packaging due to PR39448" Summary: This reverts r346122 now that the failing tests have been disabled. Depends on D54353. Reviewers: aheejin, dschuff Subscribers: fedor.sergeev, sunfish, llvm-commits Differential Revision:
  2800. [WebAssembly] Disable custom NaN payload tests Summary: These tests fail on 32-bit builds because NaN payload bits in floating point immediates are not necessarily preserved through compilation. This is because the MC layer uses native doubles to store these values. The tests will be reenabled once this problem has been fixed or deleted if we decide we don't care about lowering payload bits. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2801. [hwasan] Add entire report to abort message on Android. Summary: When reporting a fatal error, collect and add the entire report text to android_set_abort_message so that it can be found in the tombstone. Reviewers: kcc, vitalybuka Subscribers: srhines, kubamracek, llvm-commits Differential Revision:
  2802. Revert "Revert rL346454: Fix a use-after-free introduced by r344915." This un-reverts commit 346454 with a relaxed CHECK for Windows.
  2803. [clang-tidy] fix PR39583 - ignoring ParenCast for string-literals in pro-bounds-array-to-pointer-decay Summary: The fix to the issue that `const char* p = ("foo")` is diagnosed as decay is to ignored the ParenCast. Resolves PR39583 Reviewers: aaron.ballman, alexfh, hokein Reviewed By: aaron.ballman Subscribers: nemanjai, xazax.hun, kbarton, cfe-commits Differential Revision:
  2804. [ASTMatchers] overload ignoringParens for Expr Summary: This patch allows fixing PR39583. Reviewers: aaron.ballman, sbenza, klimek Reviewed By: aaron.ballman Subscribers: cfe-commits Tags: #clang Differential Revision:
  2805. [X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs.
  2806. [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode. If the statements between target|teams|distribute directives does not require execution in master thread, like constant expressions, null statements, simple declarations, etc., such construct can be xecuted in SPMD mode.
  2807. Branch/tag all projects with a single commit in release-tagging script. This change updates the release script to use svnmucc to create all the branches with one commit. This will ensure that the git tag won't bounce around if the git migration runs in-between separate commits creating a branch. Additionally, update the list of projects to include all of the projects in the monorepo, plus test-suite. Differential Revision:
  2808. Revert rL346454: Fix a use-after-free introduced by r344915. r344915 added a call to ApplyDebugLocation to the sanitizer check function emitter. Some of the sanitizers are emitted in the function epilogue though and the LexicalScopeStack is emptied out before. By detecting this situation and early-exiting from ApplyDebugLocation the fallback location is used, which is equivalent to the return location. rdar://problem/45859802 ........ Causes EXPENSIVE_CHECKS build bot failures:
  2809. Use the correct address space when bitcasting func pointer to int pointer When we cast a function pointer to an int pointer, at some pointer later it gets bitcasted back to a function and called. In backends that have a nonzero program memory address space specified in the data layout, the old code would lose the address space data. When LLVM later attempted to generate the bitcast from i8* to i8(..)* addrspace(1), it would fail because the pointers are not in the same address space. With this patch, the address space of the function will carry on to the address space of the i8* pointer. This is because all function pointers in Harvard architectures need to be assigned to the correct address space. This has no effect to any in-tree backends except AVR.
  2810. Allow a double-underscore spelling of Clang attributes using double square bracket syntax. This matches a similar behavior with GCC accepting [[gnu::__attr__]] as a alias for [[gnu::attr]] in that clang attributes can now be spelled with two leading and trailing underscores. I had always intended for this to work, but missed the critical bit. We already had an existing test in test/Preprocessor/has_attribute.cpp for [[clang::__fallthrough__]] but using that spelling would still give an "unknown attribute" diagnostic.
  2811. [AArch64] Support HiSilicon's TSV110 processor Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls Reviewed By: kristof.beyls Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision:
  2812. [llvm-mca] Account for buffered resources when analyzing "Super" resources. This was noticed when working on PR3946. By construction, a group cannot be used as a "Super" resource. That constraint is enforced by method `SubtargetEmitter::ExpandProcResource()`. A Super resource S can be part of a group G. However, method `SubtargetEmitter::ExpandProcResource()` would not update the number of consumed resource cycles in G based on S. In practice, this is perfectly fine because the resource usage is correctly computed for processor resource units. However, llvm-mca should still check if G is a buffered resource. Before this patch, llvm-mca didn't correctly check if S was part of a group that defines a buffer. So, the instruction descriptor was not correctly set. For now, the semantic change introduced by this patch doesn't affect any of the upstream scheduling models. However, it will allow to make some progress on PR3946.
  2813. [MS demangler] Use a slightly shorter unmangling for mangled strings. Before: const wchar_t * {L"%"} Now: L"%" See also PR39593. Differential Revision:
  2814. [Hexagon] Fix some -Wunused-function with LLVM_DUMP_METHOD and -Wunused-variable
  2815. Fix a nondeterminism in the debug info for VLA size expressions. The artificial variable describing the array size is supposed to be called "__vla_expr", but this was implemented by retrieving the name of the associated alloca, which isn't a reliable source for the name, since nonassert compilers may drop names from LLVM IR. rdar://problem/45924808
  2816. [SystemZ] Add a couple of missing tests A few fp128 tests were omitted from test/CodeGen/SystemZ/fp-round-01.ll since in early days, LLVM couldn't handle implicitly generated library calls to functions with long double arguments on SystemZ. This deficiency was actually long since fixed, but those tests are still missing. This patch adds the missing tests. NFC.
  2817. [DWARFv5] Emit normal type units in .debug_info comdats. Differential Revision:
  2818. [X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded. This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch.
  2819. [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
  2820. [Hexagon] Fix unused variable warning in release builds
  2821. [HIP] Remove useless sections in linked files clang-offload-bundler creates __CLANG_OFFLOAD_BUNDLE__* sections in the bundles, which get into the linked files. These sections are useless after linking. They waste disk space and cause confusion for clang when directly linked with other object files, therefore should be removed. Differential Revision:
  2822. [WebAssembly] Hotfix of WebAssemblyInstructionTableSize after rL346465
  2823. [TTI] Flip vector types in getShuffleCost SK_ExtractSubvector call For SK_ExtractSubvector, the default 'Ty' type is the source operand type and 'SubTy' is the destination subvector type I got this the wrong way around when I added rL346510
  2824. [AMDGPU] Cleanup optimize-if-exec-masking.mir test. NFC.
  2825. [Hexagon] Implement noreturn optimization Eliminate the stack frame in functions with the noreturn nounwind attributes, and when the noreturn-stack-elim target feature is enabled. This reduces the code and stack space needed for noreturn functions. Differential Revision:
  2826. Add total function byte size and inline function byte size to "llvm-dwarfdump --statistics" Differential Revision:
  2827. [DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision:
  2828. [llvm-strings] Fix whitespaces to match strings output. Summary: The current implementation prepends a space on every line, making it difficult to compare against GNU strings. The space appears to have come from handling --radix in rL292707. The space is for making sure there's a space between the radix and the value; however the space is still emitted even when there is no radix. This change fixes that so the space is only emitted when there is a radix. Reviewers: jhenderson Reviewed By: jhenderson Subscribers: llvm-commits, compnerd Differential Revision:
  2829. [AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx This only covers AMDGPU BE, hopefully all occurrences. Differential Revision:
  2830. Driver: Make -fsanitize=shadow-call-stack compatible with -fsanitize-minimal-runtime. Differential Revision:
  2831. [clangd] Fix clang-tidy warnings.
  2832. [Hexagon] Place globals with explicit .sdata section in small data Both -fPIC and -G0 disable placement of globals in small data section, but if a global has an explicit section assigmnent placing it in small data, it should go there anyway.
  2833. Type safe version of MachinePassRegistry Previous version used type erasure through a `void* (*)()` pointer, which triggered gcc warning and implied a lot of reinterpret_cast. This version should make it harder to hit ourselves in the foot. Differential revision:
  2834. Introduce the _Clang scoped attribute token. Currently, we only accept clang as the scoped attribute identifier for double square bracket attributes provided by Clang, but this has the potential to conflict with user-defined macros. To help alleviate these concerns, this introduces the _Clang scoped attribute identifier as an alias for clang. It also introduces a warning with a fixit on the off chance someone attempts to use __clang__ as the scoped attribute (which is a predefined compiler identification macro).
  2835. Use the correct address space when emitting the ctor function list This patch modifies clang so that, if compiling for a target that explicitly specifies a nonzero program memory address space, the constructor list global will have the same address space as the functions it contains. AVR is the only in-tree backend which has a nonzero program memory address space. Without this, the IR verifier would always fail if a constructor was used on a Harvard architecture backend. This has no functional change to any in-tree backends except AVR.
  2836. [docs][statepoints] Reformulate open issues list Some have been partially resolved, so update that. And restructure to make it easie to find and search.
  2837. Fix -Wsign-compare warning
  2838. [llvm-cov] Remove "default:" label in the switch covering all enum values. Summary: Fixing the build breakage: Reviewers: vsk, allevato, Dor1s Reviewed By: Dor1s Subscribers: llvm-commits Differential Revision:
  2839. [docs][statepoint] Expand a bit on problems with mixing references and raw pointers since it keeps coming up in discussions
  2840. [Power9] Allow gpr callee saved spills in prologue to vectors registers Currently in llvm, CalleeSavedInfo can only assign a callee saved register to stack frame index to be spilled in the prologue. We would like to enable spilling gprs to vector registers. This patch adds the capability to spill to other registers aside from just the stack. It also adds the changes for power9 to spill gprs to volatile vector registers when they are available. This happens only for leaf functions when using the option -ppc-enable-pe-vector-spills. Differential Revision:
  2841. [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks.
  2842. [docs][statepoint] tweak a title
  2843. Revert "[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested." This reverts commit r345972. Need to update the description + possibly to update the patch itself after discussion with Eric Christofer.
  2844. [OPENMP][NVPTX]Allow to use shared memory for the target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory.
  2845. [llvm-cov] Add lcov tracefile export format. Summary: lcov tracefiles are used by various coverage reporting tools and build systems (e.g., Bazel). It is a simple text-based format to parse and more convenient to use than the JSON export format, which needs additional processing to map regions/segments back to line numbers. It's a little unfortunate that "text" format is now overloaded to refer specifically to JSON for export, but I wanted to avoid making any breaking changes to the UI of the llvm-cov tool at this time. Patch by Tony Allevato (@allevato). Reviewers: Dor1s, vsk Reviewed By: Dor1s, vsk Subscribers: mgorny, llvm-commits Differential Revision:
  2846. [SystemZ] Avoid inserting same value after replication A minor improvement of buildVector() that skips creating an INSERT_VECTOR_ELT for a Value which has already been used for the REPLICATE. Review: Ulrich Weigand
  2847. [clangd] Don't treat top-level decls as "local" if they are from the preamble. Summary: These get passed to HandleTopLevelDecl() if they happen to have been deserialized for any reason. We don't want to treat them as part of the main file. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision:
  2848. AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcnt Relevant for
  2849. Revert "[VFS] Add "expand tilde" argument to getRealPath." This reverts commit r346453. This is a complex change to a widely-used interface, and was not reviewed.
  2850. [ARM] Don't promote i1 types in ARM CGP Now that we have mixed type sizes, i1 values need to be explicitly handled as we want to avoid promoting these values. Differential Revision:
  2851. [x86] try to form broadcast before widening shuffle elements I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem. Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded. Differential Revision:
  2852. [RISCV] Avoid unnecessary XOR for seteq/setne 0 Differential Revision: Patch by James Clarke.
  2853. [RISCV] Update test/CodeGen/RISCV/calling-conv.ll after rL346432 The DAGCombiner changes led to a different schedule.
  2854. [MIPS GlobalISel] narrowScalar G_CONSTANT Legalize s64 G_CONSTANT using narrowScalar on MIPS 32. Differential Revision:
  2855. [Hexagon] Handle Hexagon's SHF_HEX_GPREL section flag
  2856. [llvm-exegesis] Fix unit tests on PowerPC/AArch64. We were comparing char*s and not contents. Introduced in rL346489.
  2857. Revert r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site). This cause a failure with EXPENSIVE_CHECKS
  2858. [clang-cl] Add warning for /Zc:dllexportInlines- when the flag is used with /fallback Summary: This is followup of Reviewers: hans, thakis Reviewed By: hans Subscribers: cfe-commits, llvm-commits Differential Revision:
  2859. [X86] Add Subtarget to more lowerVectorShuffle functions. NFCI. This will be necessary for an update to D54267
  2860. [llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision:
  2861. [clangd] Make TestTU build with preamble, and fix the fallout. Our testing didn't reflect reality: live clangd almost always uses a preamble, and sometimes the preamble behaves differently. This patch fixes a common test helper to be more realistic. Preamble doesn't preserve information about which tokens come from the command-line (this gets inlined into a source file). So remove logic that attempts to treat symbols with such names differently. A SymbolCollectorTest tries to verify that locals in headers are not indexed, with preamble enabled this is only meaningful for locals of auto-typed functions (otherwise the bodies aren't parsed). Tests were relying on the fact that the findAnyDecl helper actually did expose symbols from headers. Resolve by making all these functions consistently able to find symbols in headers/preambles.
  2862. [llvm-mca] Use a small vector for instructions in the EntryStage. Use a simple SmallVector to track the lifetime of simulated instructions. An ordered map was not needed because instructions are already picked in program order. It is also much faster if we avoid searching for already retired instructions at the end of every cycle. The new policy only triggers a "garbage collection" when the number of retired instructions becomes significantly big when compared with the total size of the vector. While working on this, I noticed that instructions were correctly retired, but their internal state was not updated (i.e. there was no transition from the EXECUTED state, to the RETIRED state). While this was not a problem for the views, it prevented the EntryStage from correctly garbage collecting already retired instructions. That was a bad oversight, and this patch fixes it. The observed speedup on a debug build of llvm-mca after this patch is ~6%. On a release build of llvm-mca, the observed speedup is ~%15%.
  2863. [IPSCCP,PM] Preserve DT in the new pass manager. After D45330, Dominators are required for IPSCCP and can be preserved. This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis. Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima Reviewed By: chandlerc, kuhar, NutshellySima Differential Revision:
  2864. [Tooling] Avoid diagnosing missing input files in an edge-case where it's incorrect.
  2865. [SelectionDAG] swap select_cc operands to enable folding The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision:
  2866. [CallSiteSplitting] Only record conditions up to the IDom(call site). We can stop recording conditions once we reached the immediate dominator for the block containing the call site. Conditions in predecessors of the that node will be the same for all paths to the call site and splitting is not beneficial. This patch makes CallSiteSplitting dependent on the DT anlysis. because the immediate dominators seem to be the easiest way of finding the node to stop at. I had to update some exiting tests, because they were checking for conditions that were true/false on all paths to the call site. Those should now be handled by instcombine/ipsccp. Reviewers: davide, junbuml Reviewed By: junbuml Differential Revision:
  2867. [X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX. Summary: Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources. After HSW, it also has zero latency. This fixes PR35606. To reproduce: Uops: llvm-exegesis -mode=uops -opcode-name=VZEROUPPER Latency: echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=- echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=- Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision:
  2868. [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG. In SimplifyCFG when given a conditional branch that goes to BB1 and BB2, the hoisted common terminator instruction in the two blocks, caused debug line records associated with subsequent select instructions to become ambiguous. It causes the debugger to display unreachable source lines. Differential Revision:
  2869. [ARM] Enable mixed types in ARM CGP Previously, during the search, all values had to have the same 'TypeSize', which is equal to number of bits of the integer type of the icmp operand. All values in the tree had to match this size; meaning that, if we searched from i16, we wouldn't accept i8s. A change in type size requires zext and truncs to perform the casts so, to allow mixed narrow types, the handling of these instructions is now slightly different: - we allow casts if their result or operand is <= TypeSize. - zexts are sinks if their result > TypeSize. - truncs are still sinks if their operand == TypeSize. - truncs are still sources if their result == TypeSize. The transformation bails on finding an icmp that operates on data smaller than the current TypeSize. Differential Revision:
  2870. [ARM] Small reorganisation in ARMParallelDSP A few code movement things: - AreSymmetrical is now a method of BinOpChain. - Created a lambda in CreateParallelMACPairs to reduce loop nesting. - A Reduction object now gets pasted in a couple of places instead, including CreateParallelMACPairs so it doesn't need to return a value. I've also added RecordSequentialLoads, which is run before the transformation begins, and caches the interesting loads. This can then be queried later instead of cross checking many load values. Differential Revision:
  2871. [XRay] Add a test for function id encoding/decoding (NFC) Increase test coverage for function enter/exit encoding/decoding.
  2872. [XRay] Add a static assertion on size of metadata payload (NFC) This change adds a static check to ensure that all data metadata record payloads don't go past the available buffers in Metadata records.
  2873. [XRay] Fix enter function tracing for record unwriting Summary: Before this change, we could run into a situation where we may try to undo tail exit records after writing metadata records before a function enter event. This change rectifies that by resetting the tail exit counter after writing the metadata records. Reviewers: mboerger Subscribers: llvm-commits Differential Revision:
  2874. [XRay] Add atomic fences around non-atomic reads and writes Summary: We need these fences to ensure that other threads attempting to read bytes in the buffer will see thw writes committed before the extents are updated. Without these, the writes can be un-committed by the time the buffer extents counter is updated -- the fences should ensure that the records written into the log have completed by the time we observe the buffer extents from different threads. Reviewers: mboerger Subscribers: jfb, llvm-commits Differential Revision:
  2875. [XRay] Improve FDR trace handling and error messaging Summary: This change covers a number of things spanning LLVM and compiler-rt, which are related in a non-trivial way. In LLVM, we have a library that handles the FDR mode even log loading, which uses C++'s runtime polymorphism feature to better faithfully represent the events that are written down by the FDR mode runtime. We do this by interpreting a trace that's serliased in a common format agreed upon by both the trace loading library and the FDR mode runtime. This library is under active development, which consists of features allowing us to reconstitute a higher-level event log. This event log is used by the conversion and visualisation tools we have for interpreting XRay traces. One of the tools we have is a diagnostic tool in llvm-xray called `fdr-dump` which we've been using to debug our expectations of what the FDR runtime should be writing and what the logical FDR event log structures are. We use this fairly extensively to reason about why some non-trivial traces we're generating with FDR mode runtimes fail to convert or fail to parse correctly. One of these failures we've found in manual debugging of some of the traces we've seen involve an inconsistency between the buffer extents (a record indicating how many bytes to follow are part of a logical thread's event log) and the record of the bytes written into the log -- sometimes it turns out the data could be garbage, due to buffers being recycled, but sometimes we're seeing the buffer extent indicating a log is "shorter" than the actual records associated with the buffer. This case happens particularly with function entry records with a call argument. This change for now updates the FDR mode runtime to write the bytes for the function call and arg record before updating the buffer extents atomically, allowing multiple threads to see a consistent view of the data in the buffer using the atomic counter associated with a buffer. What we're trying to prevent here is partial updates where we see the intermediary updates to the buffer extents (function record size then call argument record size) becoming observable from another thread, for instance, one doing the serialization/flushing. To do both diagnose this issue properly, we need to be able to honour the extents being set in the `BufferExtents` records marking the beginning of the logical buffers when reading an FDR trace. Since LLVM doesn't use C++'s RTTI mechanism, we instead follow the advice in the documentation for LLVM Style RTTI ( We then rely on this RTTI feature to ensure that our file-based record producer (our streaming "deserializer") can honour the extents of individual buffers as we interpret traces. This also sets us up to be able to eventually do smart skipping/continuation of FDR logs, seeking instead to find BufferExtents records in cases where we find potentially recoverable errors. In the meantime, we make this change to operate in a strict mode when reading logical buffers with extent records. Reviewers: mboerger Subscribers: hiraditya, llvm-commits, jfb Differential Revision:
  2876. [NFC] Add utility function for SafetyInfo updates for moveBefore
  2877. [PowerPC] [Clang] [AltiVec] The second parameter of vec_sr function should be modulo the number of bits in the element The second parameter of vec_sr function is representing shift bits and it should be modulo the number of bits in the element like what vec_sl does now. This is actually required by the ABI: Each element of the result vector is the result of logically right shifting the corresponding element of ARG1 by the number of bits specified by the value of the corresponding element of ARG2, modulo the number of bits in the element. The bits that are shifted out are replaced by zeros. Differential Revision:
  2878. [llvm-rc] Support joined or separate spelling for /fo flag CMake invokes rc using the joined spelling which appears to be supported by Microsoft's rc implementation, so we should support it as well. Differential Revision:
  2879. [COFF, ARM64] Add support for MSVC buffer security check Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision:
  2880. Fix test from r346439 to also work on Windows due to path separator differences.
  2881. Remove unused c'tor.
  2882. [WebAssembly] Read prefixed opcodes as ULEB128s Summary: Depends on D54126. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2883. [WebAssembly][NFC] Reorder SIMD section Summary: Reorders the sections in the SIMD tablegen file to roughly match the new opcode ordering. Depends on D54126. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2884. [WebAssembly] Renumber and LEB128-encode SIMD opcodes Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2885. [WebAssembly] Lower select for vectors Summary: Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2886. Ignore implicit things like ConstantExpr.
  2887. [not] Improve error reporting consistency. Makes `not` use WithColor from Support so it prints 'error' in color when applicable.
  2888. Use correct parameter name in comment.
  2889. Compound literals, enums, et al require const expr Summary: Compound literals, enums, file-scoped arrays, etc. require their initializers and size specifiers to be constant. Wrap the initializer expressions in a ConstantExpr so that we can easily check for this later on. Reviewers: rsmith, shafik Reviewed By: rsmith Subscribers: cfe-commits, jyknight, nickdesaulniers Differential Revision:
  2890. Fix a use-after-free introduced by r344915. r344915 added a call to ApplyDebugLocation to the sanitizer check function emitter. Some of the sanitizers are emitted in the function epilogue though and the LexicalScopeStack is emptied out before. By detecting this situation and early-exiting from ApplyDebugLocation the fallback location is used, which is equivalent to the return location. rdar://problem/45859802
  2891. [VFS] Add "expand tilde" argument to getRealPath. Add an optional argument to expand tildes in the path to mirror llvm's implementation of the corresponding function.
  2892. [hwasan] Remove dead code.
  2893. Attempt to enable -Wconversion
  2894. [llvm-rc] Support absolute filenames in manifests CMake generate manifests that contain absolute filenames and these currently result in assertion error. This change ensures that we handle these correctly. Differential Revision:
  2895. [docs][statepoint] Document explicitly provided stack slots Functionality for this was added a while ago, though never documented or extensively tested. Document it with an explicit warning.
  2896. [docs][statepoints] add a section spelling out simplifications for non-relocating GCs
  2897. [docs] Add some subsections to make it possible to find portions of the statepoint overview
  2898. [WebAssembly] Fix LowerEmscriptenEHSjLj when there's only longjmp Summary: The pass incorrectly assumed if there's a longjmp declaration in the module, there is also a setjmp function declaration. Fixed it, and now the pass only converts longjmp and does not do any other transformation when there's no setjmp declaration in the module. Fixes PR39562. Reviewers: jgravelle-google, sbc100 Subscribers: dschuff, sunfish, llvm-commits Differential Revision:
  2899. Handle builders which could be assigned to automatic schedulers.
  2900. [ARM64] [Windows] Improve error reporting for unsupported SEH unwind. Use report_fatal_error instead of crashing or miscompiling. (It's currently easier than it should be to hit this case because we don't reuse codes across epilogs.)
  2901. [Frontend/Modules] Show diagnostics on prebuilt module configuration mismatch too The current version only emits the below error for a module (attempted to be loaded) from the `prebuilt-module-path`: ``` error: module file blabla.pcm cannot be loaded due to a configuration mismatch with the current compilation [-Wmodule-file-config-mismatch] ``` With this change, if the prebuilt module is used, we allow the proper diagnostic behind the configuration mismatch to be shown. ``` error: POSIX thread support was disabled in PCH file but is currently enabled error: module file blabla.pcm cannot be loaded due to a configuration mismatch with the current compilation [-Wmodule-file-config-mismatch] ``` (A few lines later an error is emitted anyways, so there is no reason not to complain for configuration mismatches if a config mismatch is found and kills the build.) Reviewed By: dblaikie Tags: #clang Differential Revision:
  2902. [LoopInterchange] Support reductions across inner and outer loop. This patch adds logic to detect reductions across the inner and outer loop by following the incoming values of PHI nodes in the outer loop. If the incoming values take part in a reduction in the inner loop or come from outside the outer loop, we found a reduction spanning across inner and outer loop. With this change, ~10% more loops are interchanged in the LLVM test-suite + SPEC2006. Fixes Reviewers: mcrosier, efriedma, karthikthecool, davide, hfinkel, dmgreen Reviewed By: efriedma Differential Revision:
  2903. [SelectionDAG] Assert on the width of DemandedElts argument to computeKnownBits for all vector typed operations not just build_vector. Fix AArch64 unit test that fails with the assertion added.
  2904. [LTO] Drop non-prevailing definitions only if linkage is not local or appending Summary: This fixes PR 37422 In ELF, non-weak symbols can also be non-prevailing. In this particular PR, the __llvm_profile_* symbols are non-prevailing but weren't getting dropped - causing multiply-defined errors with lld. Also add a test, strong_non_prevailing.ll, to ensure that multiple copies of a strong symbol are dropped. To fix the test regressions exposed by this fix, - do not mark prevailing copies for symbols with 'appending' linkage. There's no one prevailing copy for such symbols. - fix the prevailing version in dead-strip-fulllto.ll - explicitly pass exported symbols to llvm-lto in fumcimport.ll and funcimport_var.ll Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, srhines, llvm-commits Differential Revision:
  2905. [X86] Regenerate loaduse test
  2906. [x86] use shuffles for scalar insertion into high elements of a constant vector As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly. Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm: // If the vector is wider than 128 bits, extract the 128-bit subvector, insert // into that, and then insert the subvector back into the result. ...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering. Differential Revision:
  2907. [DAGCombine] Improve alias analysis for chain of independent stores. FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision:
  2908. [NativePDB] Higher fidelity reconstruction of AST from Debug Info. In order to accurately put a type into the correct location in the AST we construct from debug info, we need to be able to determine what DeclContext (namespace, global, nested class, etc) that it goes into. PDB doesn't contain this mapping. It does, however, contain the reverse mapping. That is, for a given class type T, you can determine all classes Q1, Q2, ..., Qn that are nested inside of T. We need to know, for a given class type Q, what type T is it nested inside of. This patch builds this map as a pre-processing step when we first load the PDB by scanning every type. Initial tests show that while this can be slow in debug builds of LLDB, it is quite fast in release builds (less than 2 seconds for a ~1GB PDB, and it only needs to happen once). Furthermore, having this pre-processing step in place allows us to repurpose it for building up other kinds of indexing to it down the line. For the time being, this gives us very accurate reconstruction of the DeclContext hierarchy. Differential Revision:
  2909. [x86] add RUNs for AVX1; NFC Differences in splat-ability might be reason to differentiate some cases.
  2910. [NFC][BdVer2] Load and store throughput tests: also check sched stats (PR39465) As noted by Andrea Di Biagio in both the loads and stores occupy both the store and load queues. This is clearly wrong.
  2911. [llvm-mca] Partially revert r346417. Restored the llvm:: namespace qualifier on make_unique. This removes the ambiguity with make_unique.
  2912. Add test case for the regression caused by r344696 (That change has since been reverted.) Reduced from
  2913. InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe Summary: When the 3rd argument to these intrinsics is zero, lowering them to shift instructions produces poison values, since we end up with shift amounts equal to the number of bits in the shifted value. This means we can only lower these intrinsics if we can prove that the 3rd argument is not zero. Reviewers: arsenm Reviewed By: arsenm Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits Differential Revision:
  2914. [CodeExtractor] Mark functions noreturn when applicable This eliminates the outlining penalty for llvm.trap/unreachable, because callers no longer have to emit cleanup/ret instructions after calling an outlined `noreturn` function. rdar://45523626
  2915. Introduce `sanitizer_malloc_introspect_t` for Darwin which is a sub-class of Darwin's `malloc_introspection_t` and use it when setting up the malloc zone. Summary: Currently `sanitizer_malloc_introspection_t` just adds a version field which is used to version the allocator ABI. The current allocator ABI version is returned by the new `GetMallocZoneAllocatorEnumerationVersion()` function. The motivation behind this change is to allow external processes to determine the allocator ABI of a sanitized process. rdar://problem/45284065 Reviewers: kubamracek, george.karpenkov, vitalybuka Subscribers: #sanitizers, llvm-commits Differential Revision:
  2916. [llvm-mca] PR39261: Rename FetchStage to EntryStage. This fixes PR39261. FetchStage is a misnomer. It causes confusion with the frontend fetch stage, which we don't currently simulate. I decided to rename it into EntryStage mainly because this is meant to be a "source" stage for all pipelines. Differential Revision:
  2917. [clang-tidy] Untangle layering in ClangTidyDiagnosticConsumer somewhat. NFC Summary: Clang's hierarchy is CompilerInstance -> DiagnosticsEngine -> DiagnosticConsumer. (Ownership is optional/shared, but this structure is fairly clear). Currently ClangTidyDiagnosticConsumer *owns* the DiagnosticsEngine: - this inverts the hierarchy, which is confusing - this means ClangTidyDiagnosticConsumer() mutates the passed-in context, which is both surprising and limits flexibility - it's not possible to use a different DiagnosticsEngine with ClangTidy This means a little bit more code in the places ClangTidy is used standalone, but more flexibility in using ClangTidy with other diagnostics configurations. Reviewers: hokein Subscribers: xazax.hun, cfe-commits Differential Revision:
  2918. [llvm-mca] Remove unneeded namespace qualifier. NFC.
  2919. [docs] Clarify ELF section naming for StackMaps and fix a typo
  2920. [clang-tidy] fix test after r346414
  2921. [Tooling] Produce diagnostics for missing input files. Summary: This was disabled way back in 2011, in the dark times before Driver was VFS-aware. Also, make driver more VFS-aware :-) This breaks one ClangTidy test (we improved the error message), will fix when submitting. Reviewers: ioeric Subscribers: cfe-commits, alexfh Differential Revision:
  2922. Fix bitcast to address space cast for coerced load/stores Coerced load/stores through memory do not take into account potential address space differences when it creates its bitcasts. Patch by David Salinas. Differential Revision:
  2923. [dsymutil] Copy the LC_BUILD_VERSION load command into the companion binary. LC_BUILD_VERSION contains platform information that is useful for LLDB to match up dSYM bundles with binaries. This patch copies the load command over into the dSYM. rdar://problem/44145175 rdar://problem/45883463 Differential Revision:
  2924. [PowerPC][llvm-exegesis] Add a PowerPC target This is patch to add PowerPC target to llvm-exegesis. The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes. Differential Revision:
  2925. Revert "[MSP430] Add MC layer" This commit broke the module buildbots. Error: lib/Target/MSP430/ error: redundant namespace 'llvm' [-Wmodules-import-nested-redundant] ^
  2926. [Profile] The test for gcov-fork seems to be ok on arm Summary: Remove the XFAIL for arm since it seems to be ok Reviewers: marco-c Reviewed By: marco-c Subscribers: javed.absar, kristof.beyls, delcypher, chrib, llvm-commits, #sanitizers, sylvestre.ledru Differential Revision:
  2927. [OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping. The base pointer for the lambda mapping must point to the lambda capture placement and pointer must point to the captured variable itself. Patch fixes this problem.
  2928. [SystemZ] Bugfix in shouldCoalesce() It was discovered in randomized testing that the SystemZ implementation of shouldCoalesce() could be caused to crash when subreg liveness was enabled. This was because an undef use of the virtual register was copied outside current MBB at the point of shouldCoalesce() being called. For more details, see This patch changes the check for MBB locality from livein/liveout checks to do checks for all instructions of both intervals being inside MBB. This avoids the cases with dead defs / undef uses outside MBB, which are not affecting liveness in/out of MBB. The original test case included as a reduced .mir test case. Review: Ulrich Weigand
  2929. [docs] Clarify expectations for stack map sections and AOT compilers
  2930. [NFC][BdVer2] Tests for load and store throughput (PR39465) During review it was noted that while it appears that the Piledriver can do two [consecutive] loads per cycle, it can only do one store per cycle. It was suggested that the sched model incorrectly models that, but it was opted to fix this afterwards. These tests show that the two consecutive loads are modelled correctly, and one consecutive stores is not modelled incorrectly. Unless i'm missing the point.
  2931. [LLD] Fix Microsoft precompiled headers cross-compile on Linux Differential revision:
  2932. [X86][SSE] Add PR39387 shuffle test case
  2933. [ARM] Enable spilling of the hGPR register class in Thumb2 Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and loadRegToStackSlot() to allow the GPR class or any of its sub-classes (including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12. Differential Revision:
  2934. [llvm-exegesis][NFC] Add missing header guard + cosmetics. Reviewers: gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Differential Revision:
  2935. [X86][AVX] Tidyup prefixes and regenerate interleaved tests Share common AVX prefix and split off AVX2OR512 prefix instead
  2936. Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes." This reverts accidental commit rL346394.
  2937. Return "[IndVars] Smart hard uses detection" The patch has been reverted because it ended up prohibiting propagation of a constant to exit value. For such values, we should skip all checks related to hard uses because propagating a constant is always profitable. Differential Revision:
  2938. Adding Yvan as release test backup for Diana Thanks for offering to help, Yvan! :)
  2939. [llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes.
  2940. clang-cl: Add "/clang:" pass-through arg support. The clang-cl driver disables access to command line options outside of the "Core" and "CLOption" sets of command line arguments. This filtering makes it impossible to pass arguments that are interpreted by the clang driver and not by either 'cc1' (the frontend) or one of the other tools invoked by the driver. An example driver-level flag is the '-fno-slp-vectorize' flag, which is processed by the driver in Clang::ConstructJob and used to set the cc1 flag "-vectorize-slp". There is no negative cc1 flag or -mllvm flag, so it is not currently possible to disable the SLP vectorizer from the clang-cl driver. This change introduces the "/clang:" argument that is available when the driver mode is set to CL compatibility. This option works similarly to the "-Xclang" option, except that the option values are processed by the clang driver rather than by 'cc1'. An example usage is: clang-cl /clang:-fno-slp-vectorize /O2 test.c Another example shows how "/clang:" can be used to pass a flag where there is a conflict between a clang-cl compat option and an overlapping clang driver option: clang-cl /MD /clang:-MD /clang:-MF /clang:test_dep_file.dep test.c In the previous example, the unprefixed /MD selects the DLL version of the msvc CRT, while the prefixed -MD flag and the -MF flags are used to create a make dependency file for included headers. One note about flag ordering: the /clang: flags are concatenated to the end of the argument list, so in cases where the last flag wins, the /clang: flags will be chosen regardless of their order relative to other flags on the driver command line. Patch by Neeraj K. Singh! Differential revision:
  2941. [OpenCL] Add support of cl_intel_device_side_avc_motion_estimation extension Summary: Documentation can be found at Patch by Kristina Bessonova Reviewers: Anastasia, yaxunl, shafik Reviewed By: Anastasia Subscribers: arphaman, sidorovd, AlexeySotkin, krisb, bader, asavonic, cfe-commits Differential Revision:
  2942. [MSP430] Fix encodeInstruction() for big endian hosts Reviewers: asl Subscribers: llvm-commits Differential Revision:
  2943. [LSR] Combine unfolded offset into invariant register LSR reassociates constants as unfolded offsets when the constants fit as immediate add operands, which currently prevents such constants from being combined later with loop invariant registers. This patch modifies GenerateCombinations() to generate a second formula which includes the unfolded offset in the combined loop-invariant register. This commit fixes a bug in the original patch (committed at r345114, reverted at r345123). Differential Revision:
  2944. [SCEV][NFC] Verify IR in isLoop[Entry,Backedge]GuardedByCond We have a lot of various bugs that are caused by misuse of SCEV (in particular in LV), all of them can simply be described as "we ask SCEV to prove some fact on invalid IR". Some of examples of those are PR36311, PR37221, PR39160. The problem is that these failues manifest differently (what we saw was failure of various asserts across SCEV, but there can also be miscompiles). This patch adds an assert into two SCEV methods that strongly rely on correctness of the IR and are involved in known failues. This will at least allow us to have a clear indication of what was wrong in this case. This patch also fixes a unit test with incorrect IR that fails this verification. Differential Revision: Reviewed By: fhahn
  2945. [bindings/go] Add Go bindings to LLVMGetIndices Summary: This instruction is useful for inspecting extractvalue/insertvalue in IR. Unlike most other operations, indices cannot be inspected using the generic Value.Opcode() function so a specialized function needs to be added. Reviewers: whitequark, pcc Reviewed By: whitequark Subscribers: llvm-commits Differential Revision:
  2946. [OCaml] Fix incorrect use of CAMLlocal in nested blocks Summary: The OCaml manual states: > Local variables of type value must be declared with one of the > CAMLlocal macros. [...] These macros must be used at the beginning > of the function, not in a nested block. This patch moves several instances of CAMLlocal macros from nested blocks to the function beginning. Reviewers: whitequark Reviewed By: whitequark Subscribers: CodaFi, llvm-commits Differential Revision:
  2947. [MergeFuncs] Improve ordering of equal functions Summary: MergeFunctions currently tries to process strong functions before weak functions, because weak functions can simply call strong functions, while a strong/weak function cannot call a weak function (a backing strong function is needed). This patch additionally tries to process external functions before local functions, because we definitely have to keep the external function, but may be able to drop the local one (and definitely can if it is also unnamed_addr). Unfortunately, this exposes an existing bug in the implementation: The FnTree and FNodesInTree structures can currently go out of sync in the case where two weak functions are merged, because the function in FnTree/FNodesInTree is RAUWed. This leaves it behind in FnTree (this is intended, as it is the strong backing function which should be used for further merges), while it is replaced in FNodesInTree (this is not intended). This is fixed by switching FNodesInTree from using a ValueMap to using a DenseMap of AssertingVH. This exposes another minor issue: Currently FNodesInTree is not cleared after MergeFunctions finishes running. Currently, this is potentially dangerous (e.g. if something else wants to RAUW a function with a non-function), but at the very least it is unnecessary/inefficient. After the change to use AssertingVH it becomes more problematic, because there are certainly passes that remove functions. This issue is fixed by clearing FNodesInTree at the end of the pass. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision:
  2948. [MergeFuncs] Call removeUsers() prior to unnamed_addr RAUW Summary: For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities. Fix this by calling removeUsers() prior to RAUW. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision:
  2949. [WebAssembly] Add V128 to WebAssemblyInstrInfo::copyPhysReg Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision:
  2950. Revert "Reorder FindPythonInterp so that config-ix can use PYTHON_EXECUTABLE" This reverts commit rL346367 due to test error in compiler-rt.
  2951. Many builders checkout LNT and test-suite to a separate 'test' directory, outside of the common LLVM source code tree. This change is to support this use case.
  2952. [sancov] Put .SCOV* sections into the right comdat groups on COFF Avoids linker errors about relocations against discarded sections. This was uncovered during the Chromium clang roll here: After this change, Chromium's libGLESv2 links successfully for me. Reviewers: metzman, hans, morehouse Differential Revision:
  2953. NFC: DebugInfo: Track the origin CU rather than just the base address for range lists Turns out knowing more than just the base address might be useful - specifically a future change to respect a DICompileUnit flag for the use of base address specifiers in DWARF < 5.
  2954. [MachineOutliner][NFC] Only map blocks which have adjacent legal instructions If a block doesn't have any ranges of adjacent legal instructions, then it can't have outlining candidates. There's no point in mapping legal isntructions in situations like this. I noticed this reduces the size of the suffix tree in sqlite3 for AArch64 at -Oz by about 3%.
  2955. [clang] Set CMP0075 to new Make the check_include_file* macros honor CMAK