SuccessChanges

Summary

  1. Prepare for multi-exit LFTR [NFC] This change does the plumbing to wire an ExitingBB parameter through the LFTR implementation, and reorganizes the code to work in terms of a set of individual loop exits. Most of it is fairly obvious, but there's one key complexity which makes it worthy of consideration. The actual multi-exit LFTR patch is in D62625 for context. Specifically, it turns out the existing code uses the backedge taken count from before a IV is widened. Oddly, we can end up with a different (more expensive, but semantically equivelent) BE count for the loop when requerying after widening. For the nestedIV example from elim-extend, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) This is the only test in tree which seems sensitive to this difference. The actual result of using the wider BETC on this example is that we actually produce slightly better code. :) In review, we decided to accept that test change. This patch is structured to preserve the old behavior, but a separate change will immediate follow with the behavior change. (I wanted it separate for problem attribution purposes.) Differential Revision: https://reviews.llvm.org/D62880
  2. [ELF][llvm-objdump] Treat dynamic tag values as virtual addresses instead of offsets The ELF gABI requires the tag values of DT_REL, DT_RELA and DT_JMPREL to be treated as virtual addresses. They were treated as offsets. Fixes PR41832. Differential Revision: https://reviews.llvm.org/D62972
  3. [RISCV] Replace map with set in getReqFeatures Summary: Use a set in getReqFeatures() in RISCVCompressInstEmitter instead of a map because the index we save is not needed. This also fixes bug 41666. Reviewers: llvm-commits, apazos, asb, nickdesaulniers Reviewed By: asb Subscribers: Jim, nickdesaulniers, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna Tags: #llvm Differential Revision: https://reviews.llvm.org/D61412
  4. [docs] Add 'git llvm revert' to getting started guide Summary: This documents `git llvm revert rNNNNNN` in the getting started guide for broader visibility. Reviewers: jyknight, mehdi_amini Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63023
  5. [llvm-mca] Enable bottleneck analysis when flag -all-views is specified. Bottleneck Analysis is one of the many views available in llvm-mca. Therefore, it should be enabled when flag -all-views is passed in input to the tool.
  6. [FastISel] Skip creating unnecessary vregs for arguments This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen/*/swiftself.ll: we tail-call now. * llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/*: get rid of spills and copies * llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361
Revision 362971 by reames:
Prepare for multi-exit LFTR [NFC]

This change does the plumbing to wire an ExitingBB parameter through the LFTR implementation, and reorganizes the code to work in terms of a set of individual loop exits. Most of it is fairly obvious, but there's one key complexity which makes it worthy of consideration. The actual multi-exit LFTR patch is in D62625 for context.

Specifically, it turns out the existing code uses the backedge taken count from before a IV is widened. Oddly, we can end up with a different (more expensive, but semantically equivelent) BE count for the loop when requerying after widening.  For the nestedIV example from elim-extend, we end up with the following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)

This is the only test in tree which seems sensitive to this difference. The actual result of using the wider BETC on this example is that we actually produce slightly better code. :)

In review, we decided to accept that test change.  This patch is structured to preserve the old behavior, but a separate change will immediate follow with the behavior change.  (I wanted it separate for problem attribution purposes.)

Differential Revision: https://reviews.llvm.org/D62880
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/Transforms/Scalar/IndVarSimplify.cpptrunk/lib/Transforms/Scalar/IndVarSimplify.cpp
Revision 362969 by wolfgangp:
[ELF][llvm-objdump] Treat dynamic tag values as virtual addresses instead of offsets

The ELF gABI requires the tag values of DT_REL, DT_RELA and DT_JMPREL to be
treated as virtual addresses. They were treated as offsets. Fixes PR41832.

Differential Revision: https://reviews.llvm.org/D62972
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/include/llvm/Object/ELFObjectFile.htrunk/include/llvm/Object/ELFObjectFile.h
The file was added/llvm/trunk/test/tools/llvm-objdump/X86/elf-dynamic-relocs.testtrunk/test/tools/llvm-objdump/X86/elf-dynamic-relocs.test
Revision 362968 by sabuasal:
[RISCV] Replace map with set in getReqFeatures

Summary:
Use a set in getReqFeatures() in RISCVCompressInstEmitter instead of a map
because the index we save is not needed.

This also fixes bug 41666.

Reviewers: llvm-commits, apazos, asb, nickdesaulniers

Reviewed By: asb

Subscribers: Jim, nickdesaulniers, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61412
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/utils/TableGen/RISCVCompressInstEmitter.cpptrunk/utils/TableGen/RISCVCompressInstEmitter.cpp
Revision 362966 by rupprecht:
[docs] Add 'git llvm revert' to getting started guide

Summary: This documents `git llvm revert rNNNNNN` in the getting started guide for broader visibility.

Reviewers: jyknight, mehdi_amini

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63023
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/docs/GettingStarted.rsttrunk/docs/GettingStarted.rst
Revision 362964 by adibiagio:
[llvm-mca] Enable bottleneck analysis when flag -all-views is specified.

Bottleneck Analysis is one of the many views available in llvm-mca. Therefore,
it should be enabled when flag -all-views is passed in input to the tool.
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-none.strunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-none.s
The file was modified/llvm/trunk/test/tools/llvm-mca/X86/option-all-views-1.strunk/test/tools/llvm-mca/X86/option-all-views-1.s
The file was modified/llvm/trunk/test/tools/llvm-mca/X86/option-all-views-2.strunk/test/tools/llvm-mca/X86/option-all-views-2.s
The file was modified/llvm/trunk/test/tools/llvm-mca/X86/option-no-stats-1.strunk/test/tools/llvm-mca/X86/option-no-stats-1.s
The file was modified/llvm/trunk/tools/llvm-mca/llvm-mca.cpptrunk/tools/llvm-mca/llvm-mca.cpp
Revision 362963 by thegameg:
[FastISel] Skip creating unnecessary vregs for arguments

This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.

This re-enables it for FastISel with the corresponding fix.

This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.

FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.

This fix moves the replacement of registers before the emission of
copies for the live-ins.

The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).

A few tests are affected by this:

* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed

Differential Revision: https://reviews.llvm.org/D62361
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpptrunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
The file was modified/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpptrunk/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
The file was modified/llvm/trunk/test/CodeGen/AArch64/swifterror.lltrunk/test/CodeGen/AArch64/swifterror.ll
The file was modified/llvm/trunk/test/CodeGen/AArch64/swiftself.lltrunk/test/CodeGen/AArch64/swiftself.ll
The file was modified/llvm/trunk/test/CodeGen/AMDGPU/mubuf-legalize-operands.lltrunk/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll
The file was modified/llvm/trunk/test/CodeGen/ARM/cmpxchg-O0.lltrunk/test/CodeGen/ARM/cmpxchg-O0.ll
The file was modified/llvm/trunk/test/CodeGen/ARM/swifterror.lltrunk/test/CodeGen/ARM/swifterror.ll
The file was modified/llvm/trunk/test/CodeGen/Mips/atomic.lltrunk/test/CodeGen/Mips/atomic.ll
The file was modified/llvm/trunk/test/CodeGen/Mips/atomic64.lltrunk/test/CodeGen/Mips/atomic64.ll
The file was modified/llvm/trunk/test/CodeGen/Mips/atomicCmpSwapPW.lltrunk/test/CodeGen/Mips/atomicCmpSwapPW.ll
The file was modified/llvm/trunk/test/CodeGen/Mips/dsp-spill-reload.lltrunk/test/CodeGen/Mips/dsp-spill-reload.ll
The file was modified/llvm/trunk/test/CodeGen/SystemZ/swift-return.lltrunk/test/CodeGen/SystemZ/swift-return.ll
The file was modified/llvm/trunk/test/CodeGen/X86/atomic-unordered.lltrunk/test/CodeGen/X86/atomic-unordered.ll
The file was modified/llvm/trunk/test/CodeGen/X86/swifterror.lltrunk/test/CodeGen/X86/swifterror.ll
The file was modified/llvm/trunk/test/CodeGen/X86/swiftself.lltrunk/test/CodeGen/X86/swiftself.ll
The file was modified/llvm/trunk/test/DebugInfo/X86/dbg-declare-arg.lltrunk/test/DebugInfo/X86/dbg-declare-arg.ll

Summary

  1. [X86] Attempt to make the Intel core CPU inheritance a little more readable and maintainable The recently added cooperlake CPU has made our already ugly switch statement even worse. There's a CPU exclusion list around the bf16 feature in the cooper lake block. I worry that we'll have to keep adding new CPUs to that until bf16 intercepts a client space CPU. We have several other exclusion lists in other parts of the switch due to skylakeserver, cascadelake, and cooperlake not having sgx. Another for cannonlake not having clwb but having all other features from skx. This removes all these special ifs at the cost of some duplication of features and a goto. I've copied all of the skx features into either cannonlake or icelakeclient(for clwb). And pulled sklyakeserver, cascadelake, and cooperlake out of the main inheritance chain into their own chain. At the end of skylakeserver we merge back into the main chain at skylakeclient but below sgx. I think this is at least easier to follow. Differential Revision: https://reviews.llvm.org/D63018
  2. [WebAssembly] Cleanup toolchain test files. NFC. Summary: Split up long lines to improve test readability. Subscribers: dschuff, jgravelle-google, aheejin, sunfish, jfb, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D63081
Revision 362965 by ctopper:
[X86] Attempt to make the Intel core CPU inheritance a little more readable and maintainable

The recently added cooperlake CPU has made our already ugly switch statement even worse. There's a CPU exclusion list around the bf16 feature in the cooper lake block. I worry that we'll have to keep adding new CPUs to that until bf16 intercepts a client space CPU. We have several other exclusion lists in other parts of the switch due to skylakeserver, cascadelake, and cooperlake not having sgx. Another for cannonlake not having clwb but having all other features from skx.

This removes all these special ifs at the cost of some duplication of features and a goto. I've copied all of the skx features into either cannonlake or icelakeclient(for clwb). And pulled sklyakeserver, cascadelake, and cooperlake out of the main inheritance chain into their own chain. At the end of skylakeserver we merge back into the main chain at skylakeclient but below sgx. I think this is at least easier to follow.

Differential Revision: https://reviews.llvm.org/D63018
Change TypePath in RepositoryPath in Workspace
The file was modified/cfe/trunk/lib/Basic/Targets/X86.cpptrunk/lib/Basic/Targets/X86.cpp
Revision 362959 by sbc:
[WebAssembly] Cleanup toolchain test files. NFC.

Summary: Split up long lines to improve test readability.

Subscribers: dschuff, jgravelle-google, aheejin, sunfish, jfb, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D63081
Change TypePath in RepositoryPath in Workspace
The file was modified/cfe/trunk/test/Driver/wasm-toolchain.ctrunk/test/Driver/wasm-toolchain.c
The file was modified/cfe/trunk/test/Driver/wasm-toolchain.cpptrunk/test/Driver/wasm-toolchain.cpp

Summary

  1. Add unused symbol to thunk files to force wholearchive inclusion These "dynamic_runtime_thunk" object files exist to create a weak alias from 'foo' to 'foo_dll' for all weak sanitizer runtime symbols. The weak aliases are implemented as /alternatename linker options in the .drective section, so they are not actually in the symbol table. In order to force the Visual C++ linker to load the object, even with -wholearchive:, we have to provide at least one external symbol. Once we do that, it will read the .drective sections and see the weak aliases. Fixes PR42074
  2. [scudo][standalone] Introduce the thread specific data structures Summary: This CL adds the structures dealing with thread specific data for the allocator. This includes the thread specific data structure itself and two registries for said structures: an exclusive one, where each thread will have its own TSD struct, and a shared one, where a pool of TSD structs will be shared by all threads, with dynamic reassignment at runtime based on contention. This departs from the current Scudo implementation: we intend to make the Registry a template parameter of the allocator (as opposed to a single global entity), allowing various allocators to coexist with different TSD registry models. As a result, TSD registry and Allocator are tightly coupled. This also corrects a couple of things in other files that I noticed while adding this. Reviewers: eugenis, vitalybuka, morehouse, hctim Reviewed By: morehouse Subscribers: srhines, mgorny, delcypher, jfb, #sanitizers, llvm-commits Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D62258
Revision 362970 by rnk:
Add unused symbol to thunk files to force wholearchive inclusion

These "dynamic_runtime_thunk" object files exist to create a weak alias
from 'foo' to 'foo_dll' for all weak sanitizer runtime symbols. The weak
aliases are implemented as /alternatename linker options in the
.drective section, so they are not actually in the symbol table. In
order to force the Visual C++ linker to load the object, even with
-wholearchive:, we have to provide at least one external symbol. Once we
do that, it will read the .drective sections and see the weak aliases.

Fixes PR42074
Change TypePath in RepositoryPath in Workspace
The file was modified/compiler-rt/trunk/lib/sanitizer_common/sanitizer_coverage_win_dynamic_runtime_thunk.cctrunk/lib/sanitizer_common/sanitizer_coverage_win_dynamic_runtime_thunk.cc
The file was modified/compiler-rt/trunk/lib/sanitizer_common/sanitizer_win_dynamic_runtime_thunk.cctrunk/lib/sanitizer_common/sanitizer_win_dynamic_runtime_thunk.cc
Revision 362962 by cryptoad:
[scudo][standalone] Introduce the thread specific data structures

Summary:
This CL adds the structures dealing with thread specific data for the
allocator. This includes the thread specific data structure itself and
two registries for said structures: an exclusive one, where each thread
will have its own TSD struct, and a shared one, where a pool of TSD
structs will be shared by all threads, with dynamic reassignment at
runtime based on contention.

This departs from the current Scudo implementation: we intend to make
the Registry a template parameter of the allocator (as opposed to a
single global entity), allowing various allocators to coexist with
different TSD registry models. As a result, TSD registry and Allocator
are tightly coupled.

This also corrects a couple of things in other files that I noticed
while adding this.

Reviewers: eugenis, vitalybuka, morehouse, hctim

Reviewed By: morehouse

Subscribers: srhines, mgorny, delcypher, jfb, #sanitizers, llvm-commits

Tags: #llvm, #sanitizers

Differential Revision: https://reviews.llvm.org/D62258
Change TypePath in RepositoryPath in Workspace
The file was modified/compiler-rt/trunk/lib/scudo/standalone/CMakeLists.txttrunk/lib/scudo/standalone/CMakeLists.txt
The file was modified/compiler-rt/trunk/lib/scudo/standalone/internal_defs.htrunk/lib/scudo/standalone/internal_defs.h
The file was modified/compiler-rt/trunk/lib/scudo/standalone/mutex.htrunk/lib/scudo/standalone/mutex.h
The file was modified/compiler-rt/trunk/lib/scudo/standalone/quarantine.htrunk/lib/scudo/standalone/quarantine.h
The file was modified/compiler-rt/trunk/lib/scudo/standalone/tests/CMakeLists.txttrunk/lib/scudo/standalone/tests/CMakeLists.txt
The file was modified/compiler-rt/trunk/lib/scudo/standalone/tests/primary_test.cctrunk/lib/scudo/standalone/tests/primary_test.cc
The file was added/compiler-rt/trunk/lib/scudo/standalone/tests/tsd_test.cctrunk/lib/scudo/standalone/tests/tsd_test.cc
The file was added/compiler-rt/trunk/lib/scudo/standalone/tsd.htrunk/lib/scudo/standalone/tsd.h
The file was added/compiler-rt/trunk/lib/scudo/standalone/tsd_exclusive.htrunk/lib/scudo/standalone/tsd_exclusive.h
The file was added/compiler-rt/trunk/lib/scudo/standalone/tsd_shared.htrunk/lib/scudo/standalone/tsd_shared.h

Summary

  1. [libc++] Fix leading zeros in std::to_chars Summary: It is a bugfix proposal for https://bugs.llvm.org/show_bug.cgi?id=42166. `std::to_chars` appends leading zeros if input 64-bit value has 9, 10 or 11 digits. According to documentation `std::to_chars` must not append leading zeros: https://en.cppreference.com/w/cpp/utility/to_chars Changeset should not affect `std::to_chars` performance: http://quick-bench.com/CEpRs14xxA9WLvkXFtaJ3TWOVAg Unit test that `std::from_chars` supports compatibility for both `std::to_chars` outputs (previous and fixed one) already exists: https://github.com/llvm-mirror/libcxx/blob/1f60111b597e5cb80a4513ec86f79b7e137f7793/test/std/utilities/charconv/charconv.from.chars/integral.pass.cpp#L63 Reviewers: lichray, mclow.lists, ldionne, EricWF Reviewed By: lichray, mclow.lists Subscribers: zoecarver, christof, dexonsmith, libcxx-commits Differential Revision: https://reviews.llvm.org/D63047
Revision 362967 by lichray:
[libc++] Fix leading zeros in std::to_chars

Summary:
It is a bugfix proposal for https://bugs.llvm.org/show_bug.cgi?id=42166.

`std::to_chars` appends leading zeros if input 64-bit value has 9, 10 or 11 digits.
According to documentation `std::to_chars` must not append leading zeros:
https://en.cppreference.com/w/cpp/utility/to_chars

Changeset should not affect `std::to_chars` performance:
http://quick-bench.com/CEpRs14xxA9WLvkXFtaJ3TWOVAg

Unit test that `std::from_chars` supports compatibility for both `std::to_chars` outputs (previous and fixed one) already exists:
https://github.com/llvm-mirror/libcxx/blob/1f60111b597e5cb80a4513ec86f79b7e137f7793/test/std/utilities/charconv/charconv.from.chars/integral.pass.cpp#L63

Reviewers: lichray, mclow.lists, ldionne, EricWF

Reviewed By: lichray, mclow.lists

Subscribers: zoecarver, christof, dexonsmith, libcxx-commits

Differential Revision: https://reviews.llvm.org/D63047
Change TypePath in RepositoryPath in Workspace
The file was modified/libcxx/trunk/src/charconv.cpptrunk/src/charconv.cpp
The file was modified/libcxx/trunk/test/std/utilities/charconv/charconv.to.chars/integral.pass.cpptrunk/test/std/utilities/charconv/charconv.to.chars/integral.pass.cpp