SuccessChanges

Summary

  1. [VectorCombine] Scalarize vector load/extract. (details)
  2. [Debug-Info]update section name to match AIX behaviour; nfc (details)
  3. [AMDGPU][Libomptarget] Remove global KernelNameMap (details)
  4. [CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets (details)
  5. Revert "[VectorCombine] Scalarize vector load/extract." (details)
  6. flang: include limits (details)
  7. [LoopIdiom] 'logical right shift until zero': the value must be loop-invariant (details)
  8. [NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant (details)
  9. [debuginfo-tests] Stop using installed LLDB and remove redundancy (details)
  10. [RISCV] Prevent store combining from infinitely looping (details)
  11. [MLIR] Drop old cmake var names (details)
  12. [ARM] Fix inline memcpy trip count sequence (details)
  13. [ARM] Ensure WLS preheader blocks have branches during memcpy lowering (details)
  14. Recommit "[VectorCombine] Scalarize vector load/extract." (details)
  15. [ARM] Allow findLoopPreheader to return headers with multiple loop successors (details)
Commit 86497785d540e59eaca24bed4219ddec183cbc9b by flo
[VectorCombine] Scalarize vector load/extract.

This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.

At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.

This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4

This should complement D98240.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D100273
The file was modifiedllvm/test/Transforms/VectorCombine/X86/load.ll
The file was modifiedllvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
The file was modifiedllvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll
The file was modifiedllvm/lib/Transforms/Vectorize/VectorCombine.cpp
Commit 486d6d2b8ef7bd7313d99b5cbfa487c52d75169b by czhengsz
[Debug-Info]update section name to match AIX behaviour; nfc
The file was modifiedllvm/test/DebugInfo/Generic/array.ll
The file was modifiedllvm/test/DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll
Commit 486110eb413446dfa835d880bfd1c0d6bbe9f120 by Pushpinder.Singh
[AMDGPU][Libomptarget] Remove global KernelNameMap

KernelNameMap contains entries like "key.kd" => key which clearly
could be replaced by simple logic of removing suffix from the key.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102691
The file was modifiedopenmp/libomptarget/plugins/amdgpu/impl/system.cpp
Commit 243e58868176102484c3ff1a338342633ede7361 by llvm-dev
[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets

By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.
The file was modifiedllvm/test/Analysis/CostModel/X86/reduce-mul.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/rem.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-overflow.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith.ll
Commit 94d54155e2f38b56171811757044a3e6f643c14b by flo
Revert "[VectorCombine] Scalarize vector load/extract."

This reverts commit 86497785d540e59eaca24bed4219ddec183cbc9b.

One of the tests causes an ASAN failure.
https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio
The file was modifiedllvm/lib/Transforms/Vectorize/VectorCombine.cpp
The file was modifiedllvm/test/Transforms/VectorCombine/X86/load.ll
The file was modifiedllvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
The file was modifiedllvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll
Commit 0f140ce33d64b2a8f4f0866debf5fdd36e49b3ad by schuett
flang: include limits
The file was modifiedflang/runtime/unit.cpp
Commit aa3dac95edbfb892b6236341b431b222f7bd0926 by lebedev.ri
[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant

As per the reproducer provided by Mikael Holmén in post-commit review.
The file was modifiedllvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
The file was modifiedllvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll
Commit 32bee42719ad8f0d8e15b55dd5b2a7563a817e34 by lebedev.ri
[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant

Given that BaseX is an incoming value when coming from the preheader,
it *should* be loop-invariant, but let's just document this assumption.
The file was modifiedllvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
Commit 5c4a5daf293c1c924caa3c6faed4487682d70045 by james.henderson
[debuginfo-tests] Stop using installed LLDB and remove redundancy

The removed code just replicated what use_llvm_tool does, plus looked
for an installed LLDB on the PATH to use. In a monorepo world, it seems
likely that if people want to run the tests that require LLDB, they
should enable and build LLDB itself. If users really want to use the
installed LLDB executable, they can specify the path to the executable
as an environment variable "LLDB".

See the discussion in https://reviews.llvm.org/D95339#2638619 for
more details.

Reviewed by: jmorse, aprantl

Differential Revision: https://reviews.llvm.org/D102680
The file was modifieddebuginfo-tests/lit.cfg.py
Commit 7a211ed110a72ad453305aba250da50c965c2f8e by fraser
[RISCV] Prevent store combining from infinitely looping

RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.h
The file was modifiedllvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.cpp
Commit 587408c199e8125bb454a44b7a7b20e015f4d317 by uday
[MLIR] Drop old cmake var names

Drop old cmake variable names that were kept around so that zorg
buildbot could be migrated, which has now happened (D102977). D102976
had fixed the inconsistent names.

Differential Revision: https://reviews.llvm.org/D102997
The file was modifiedmlir/CMakeLists.txt
Commit 6cc78b9245bcc0e7a52723e2c298d290284e779b by david.green
[ARM] Fix inline memcpy trip count sequence

The trip count for a memcpy/memset will be n/16 rounded up to the
nearest integer. So (n+15)>>4. The old code was including a BIC too, to
clear one of the bits, which does not seem correct. This remove the
extra BIC.

Note that ideally this would never actually be generated, as in the
creation of a tail predicated loop we will DCE that setup code, letting
the WLSTP perform the trip count calculation. So this doesn't usually
come up in testing (and apparently the ARMLowOverheadLoops pass does not
do any sort of validation on the tripcount). Only if the generation of
the WLTP fails will it use the incorrect BIC instructions.

Differential Revision: https://reviews.llvm.org/D102629
The file was modifiedllvm/test/CodeGen/Thumb2/mve-phireg.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was modifiedllvm/test/CodeGen/Thumb2/mve-tp-loop.mir
The file was modifiedllvm/test/CodeGen/Thumb2/mve-memtp-loop.ll
Commit 53c42f7700e824d6ec394614653abd8b33d5da34 by david.green
[ARM] Ensure WLS preheader blocks have branches during memcpy lowering

This makes sure that the blocks created for lowering memcpy to loops end
up with branches, even if they fall through to the successor. Otherwise
IfCvt is getting confused with unanalyzable branches and creating
invalid block layouts.

The extra branches should be removed as the tail predicated loop is
finalized in almost all cases.
The file was addedllvm/test/CodeGen/Thumb2/mve-memtp-branch.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was modifiedllvm/test/CodeGen/Thumb2/mve-memtp-loop.ll
Commit 4e8c28b6fbec95b47c435810ab5fc3b43c2935db by flo
Recommit "[VectorCombine] Scalarize vector load/extract."

This reverts commit 94d54155e2f38b56171811757044a3e6f643c14b.

This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
The file was modifiedllvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll
The file was modifiedllvm/test/Transforms/VectorCombine/X86/load.ll
The file was modifiedllvm/lib/Transforms/Vectorize/VectorCombine.cpp
The file was modifiedllvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
Commit 543406a69b339e875c39c75f3935e27fedefc0a7 by david.green
[ARM] Allow findLoopPreheader to return headers with multiple loop successors

The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).

Differential Revision: https://reviews.llvm.org/D102747
The file was modifiedllvm/lib/CodeGen/MachineLoopInfo.cpp
The file was modifiedllvm/include/llvm/CodeGen/MachineLoopInfo.h
The file was modifiedllvm/test/CodeGen/Thumb2/mve-memtp-branch.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-memtp-loop.ll
The file was modifiedllvm/lib/Target/ARM/ARMLowOverheadLoops.cpp