Changes

Summary

  1. [BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor (details)
  2. [CodeGen] RegisterCoalescer::buildVRegToDbgValueMap - use const-ref value in for-range loop. NFCI. (details)
  3. [CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI. (details)
  4. [RISCV] Add missing op type OPERAND_UIMM2, OPERAND_UIMM3 and OPERAND_UIMM7 for verifyInstruction (details)
  5. [DSE] Track earliest escape, use for loads in isReadClobber. (details)
  6. tsan: remove expected race leftover (details)
  7. [libc++] Remove uses of _LIBCPP_HAS_NO_VARIABLE_TEMPLATES (details)
  8. [CodeGen] update test file to not run the entire LLVM optimizer; NFC (details)
  9. [gn build] (semi-manually) port 702cb7afe9de (details)
  10. [gn build] (manually) port ac191bcc99e2f (details)
  11. [gn build] Port f4abdb0c074b (details)
  12. [InstCombine] fold cast of right-shift if high bits are not demanded (2nd try) (details)
  13. [CostModel][X86] Increase i64 mul cost from 1 to 2 (details)
  14. clangd: Do not report inline overrides twice (details)
Commit 85a586501bcc5b556d34a566b9d256d56d6fc5ba by bjorn.a.pettersson
[BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor

The NFC commit e5692a564a73ef63b7b changed the logic for
DomTreeUpdates to use the range [succ_begin, succ_begin) when
looking for SuccsOfPredBB rather than using [succ_begin, succ_end).

As the commit was NFC this is identified as a typo (it has been
discussed briefly in phabricator).

The typo was found when inspecting the code, so I've got no idea if
changing back to the old range has any significant impact (such as
solving any PR:s or causing some new problems). But at least this
restores the code to the originally indented behavior.
The file was modifiedllvm/lib/Transforms/Utils/BasicBlockUtils.cpp
Commit 5cabe4d9d3226bfd0856ac031b63ab641acc08f5 by llvm-dev
[CodeGen] RegisterCoalescer::buildVRegToDbgValueMap - use const-ref value in for-range loop. NFCI.

Avoid unnecessary copies, reported by MSVC static analyzer.
The file was modifiedllvm/lib/CodeGen/RegisterCoalescer.cpp
Commit 2a5936faf0f3da9f55109ac1ed3b7b45436e3ced by llvm-dev
[CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI.

Avoid unnecessary copies, reported by MSVC static analyzer.
The file was modifiedllvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
Commit fbacf5ad385c63c73060369ac11dd535e44a37ce by jim
[RISCV] Add missing op type OPERAND_UIMM2, OPERAND_UIMM3 and OPERAND_UIMM7 for verifyInstruction

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110307
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfo.cpp
Commit 5ce89279c0986d0bcbe526dce52f91dd0c16427c by flo
[DSE] Track earliest escape, use for loads in isReadClobber.

At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b130 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,
    NewPM-O3: +0.05%
    NewPM-ReleaseThinLTO: +0.05%
    NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109844
The file was modifiedllvm/test/Transforms/DeadStoreElimination/captures-before-load.ll
The file was modifiedllvm/lib/Analysis/CaptureTracking.cpp
The file was modifiedllvm/include/llvm/Analysis/CaptureTracking.h
The file was modifiedllvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
Commit 7faf1285f2c416494aacf7ee13e70e330b2f0540 by dvyukov
tsan: remove expected race leftover

Remove nmissed_expected variable.
It's a leftover from removed "expected race" feature and is never incremented.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110321
The file was modifiedcompiler-rt/lib/tsan/rtl/tsan_rtl.cpp
The file was modifiedcompiler-rt/lib/tsan/rtl/tsan_rtl.h
Commit 1711a6ec650980764d97b3be148745e2813a3707 by Louis Dionne
[libc++] Remove uses of _LIBCPP_HAS_NO_VARIABLE_TEMPLATES

All supported compilers provide support for variable templates now.

Differential Revision: https://reviews.llvm.org/D110284
The file was modifiedlibcxx/test/libcxx/selftest/test_macros.pass.cpp
The file was modifiedlibcxx/include/ratio
The file was modifiedlibcxx/test/support/test_macros.h
The file was modifiedlibcxx/include/chrono
The file was modifiedlibcxx/include/type_traits
The file was modifiedlibcxx/include/__config
Commit c75c5c5f8f3740716c9a1c4fb1d8f7e753af2cf6 by spatel
[CodeGen] update test file to not run the entire LLVM optimizer; NFC

Clang regression tests should not break when changes are made to
the LLVM optimizer. This file broke on the 1st attempt at D110170,
so I'm trying to prevent that on another try.

Similar to other files in this directory, we make a compromise and
run -mem2reg to reduce noise by about 1000 lines out of 5000+ CHECK lines.
The file was modifiedclang/test/CodeGen/aapcs-bitfield.c
Commit cef0280a95dd969f18a5cd3c2238a60a25ef4cce by thakis
[gn build] (semi-manually) port 702cb7afe9de
The file was modifiedllvm/utils/gn/secondary/compiler-rt/lib/tsan/BUILD.gn
Commit 64f623d4c37c686e74fae6f1a3e33dd4f39e1423 by thakis
[gn build] (manually) port ac191bcc99e2f
The file was modifiedllvm/utils/gn/secondary/compiler-rt/test/BUILD.gn
Commit ac889a5262f230eaccfbe60ad2c1d1ae60280623 by llvmgnsyncbot
[gn build] Port f4abdb0c074b
The file was modifiedllvm/utils/gn/secondary/libcxx/include/BUILD.gn
Commit bb9333c3504a4a02b982526ad8264d14c6ec1ad4 by spatel
[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)

The 1st try at this was reverted because it caused an infinite loop in instcombine.
That should be fixed after:
1cd6b44f267b

(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
The file was modifiedllvm/test/Transforms/InstCombine/trunc-demand.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
Commit c931d35216a320465947a5a46b158cf71024458d by llvm-dev
[CostModel][X86] Increase i64 mul cost from 1 to 2

Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization.

Noticed while investigating PR51436.
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-overflow.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/rem.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Transforms/IndVarSimplify/X86/loop-invariant-conditions.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/slm-arith-costs.ll
Commit eb209c13cce99b1ad8d8e619bf2006f4376ed1ef by sam.mccall
clangd: Do not report inline overrides twice

... in textDocument/references.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D110324
The file was modifiedclang-tools-extra/clangd/unittests/XRefsTests.cpp
The file was modifiedclang-tools-extra/clangd/XRefs.cpp