Changes

Summary

  1. Revert "Revert D106562 "[clangd] Get rid of arg adjusters in CommandMangler"" (details)
  2. [SelectionDAG] Support scalable-vector splats in yet more cases (details)
  3. [Analysis] Add simple cost model for strict (in-order) reductions (details)
  4. [AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc() (details)
  5. Revert "[clangd] Avoid range-loop init-list lifetime subtleties." (details)
  6. [X86][SSE] Don't scrub address math from interleaved shuffle tests (details)
  7. [X86][AVX] Prefer vinsertf128 to vperm2f128 on AVX1 targets (details)
  8. [AArch64][SVE] Improve code generation for vector_splice for Imm == -1 (details)
  9. Fix test failures caused by 0aff1798b5721d5f95d16f465b99d357012bb8d1 (details)
  10. [SVE][AArch64] Improve code generation for vector_splice for Imm > 0 (details)
  11. [SVE] Add support for folding for select + masked loads (details)
  12. [VPlan] Use stored value from recipes for interleave groups. (details)
  13. [Inliner] Make the CallPenalty configurable (details)
  14. [NFC] Change VFShape so it contains an ElementCount rather than seperate VF and IsScalable properties. (details)
  15. [SLP]Fix costs calculations. (details)
  16. [mlir] split type conversion to two lines for GCC's sake (details)
  17. [AArch65][SVE] Remove vector_splice from AddedComplexity pattern (details)
  18. Revert "[SLP]Fix costs calculations." (details)
  19. [SVE] Fix casts to <FixedVectorType> in truncateToMinimalBitwidths (details)
Commit 0a3c7960cba15b57f679159c2bb4d20d10b86a5c by kadircet
Revert "Revert D106562 "[clangd] Get rid of arg adjusters in CommandMangler""

This reverts commit 2aa0cf19e7fe17c9eb5eb2555e10184061b933f1.
Get rid of reference to the temporary.
The file was modifiedclang-tools-extra/clangd/unittests/CompileCommandsTests.cpp
The file was modifiedclang-tools-extra/clangd/CompileCommands.cpp
The file was modifiedclang-tools-extra/clangd/Compiler.cpp
The file was modifiedclang-tools-extra/clangd/unittests/CompilerTests.cpp
The file was modifiedclang-tools-extra/clangd/CompileCommands.h
Commit f924a3d47492b7b586ccfd1333ca086a7e2d88b2 by fraser
[SelectionDAG] Support scalable-vector splats in yet more cases

This patch extends support for (scalable-vector) splats in the
DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a
variety of simple combines of constants.

Users of this function may now have to distinguish between
`BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing
with this in-tree follows the approach added for
`ISD::matchUnaryPredicate` implemented in D94501.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106575
The file was modifiedllvm/test/CodeGen/RISCV/rvv/urem-seteq-vec.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/combine-splats.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
The file was modifiedllvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
The file was modifiedllvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Commit 0aff1798b5721d5f95d16f465b99d357012bb8d1 by david.sherwood
[Analysis] Add simple cost model for strict (in-order) reductions

I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
The file was modifiedllvm/include/llvm/Analysis/TargetTransformInfoImpl.h
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
The file was modifiedllvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
The file was modifiedllvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/reduce-fmul.ll
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
The file was addedllvm/test/Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll
The file was modifiedllvm/include/llvm/Analysis/TargetTransformInfo.h
The file was modifiedllvm/include/llvm/CodeGen/BasicTTIImpl.h
The file was modifiedllvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.h
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was addedllvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
The file was modifiedllvm/lib/Analysis/TargetTransformInfo.cpp
The file was addedllvm/test/Analysis/CostModel/AArch64/reduce-fadd.ll
The file was modifiedllvm/lib/Target/ARM/ARMTargetTransformInfo.h
The file was modifiedllvm/test/Analysis/CostModel/X86/reduce-fadd.ll
Commit e6ff9179cee48096e7b2e739c9a79db62fa884bb by cullen.rhodes
[AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc()

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D106635
The file was modifiedllvm/test/MC/AArch64/shift_extend_op_w_symbol.s
The file was modifiedllvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
Commit e9274af7189333d1f50e47098d9ae30522d7193f by sam.mccall
Revert "[clangd] Avoid range-loop init-list lifetime subtleties."

This reverts commit 253b8145dedbe8d10792f44b4af7f52dbecd527f.

This doesn't actually fix anything - I should stop guessing.
See https://github.com/clangd/clangd/issues/800 for update
The file was modifiedclang-tools-extra/clangd/GlobalCompilationDatabase.cpp
Commit f64e251560203adf0258c96440c0cd637d3a43fc by llvm-dev
[X86][SSE] Don't scrub address math from interleaved shuffle tests
The file was modifiedllvm/test/CodeGen/X86/x86-interleaved-access.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleave.ll
Commit c8472db0a88701e8c1b183d6568028fefc3406c0 by llvm-dev
[X86][AVX] Prefer vinsertf128 to vperm2f128 on AVX1 targets

Splatting the lower xmm with vinsertf128 is at least as quick as vperm2f128, and a lot faster on some AMD targets.

First step towards PR50053
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/var-permute-128.ll
The file was modifiedllvm/test/CodeGen/X86/x86-interleaved-access.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleave.ll
Commit 73e4e9cd007a71fb7186933abdcae024fe65cea7 by caroline.concatto
[AArch64][SVE] Improve code generation for vector_splice for Imm == -1

This patch implements vector_splice in tablegen for:
  a) when the immediate is equal to -1 (Imm==1) and uses:
       INSR  +  LASTB
For instance :
@llvm.experimental.vector.splice(Vector_1, Vector_2, -1)
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G>
    LAST   RegLast, Vector_1                 // RegLast = D
    INSR   Res, (Vector_1 >> 1), RegLast     // Res = D + E, F, G

Differential Revision: https://reviews.llvm.org/D105633
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.h
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
The file was modifiedllvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
Commit b2a5f0029f278dadb62f9e98dec12b1840020324 by david.sherwood
Fix test failures caused by 0aff1798b5721d5f95d16f465b99d357012bb8d1
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll
Commit 0bfc26e3a4bf291f1d64610fe422c82789d752bc by caroline.concatto
[SVE][AArch64] Improve code generation for vector_splice for Imm > 0

This patch implements vector_splice in tablegen for all cases when the
Immediate is positive and lower than the known minimum value of
a scalable vector.
Vector_splice can be implemented using SVE instruction EXT.
For instance :
    @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm)
    @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>
        EXT  Vector_1, Vector_2, Imm              // Vector_1 = B, C, D + Vector_2 = E

Depends on D105633

Differential Revision: https://reviews.llvm.org/D106273
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
The file was modifiedllvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
The file was modifiedllvm/lib/Target/AArch64/SVEInstrFormats.td
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
Commit 20b0fa91c9eebc7501e280049b61e8de352f3c94 by Dylan.Fleming
[SVE] Add support for folding for select + masked loads

Add folds to instcombine to support the removal of select instruction when the masked_load is guaranteed to zero the same lanes, i.e. select(mask, mload(,,mask,0), 0) -> mload(,,mask,0).

Patch originally authored by @paulwalker-arm

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106376
The file was modifiedllvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll
The file was modifiedllvm/include/llvm/IR/PatternMatch.h
The file was modifiedllvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll
The file was addedllvm/test/Transforms/InstCombine/select-masked_load.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
Commit d995d63767624a60a5d3276f9f16d7b995435af1 by flo
[VPlan] Use stored value from recipes for interleave groups.

Instead of getting the VPValue for the stored IR values through the
current plan, use the stored value of the recipes directly.

This way, the correct VPValues are used if the store recipes have been
modified in the VPlan and the IR value is not correct any longer. This
can happen, e.g. due to D105008.
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Commit 46c03668774c27877bd96957931fafae24383e3f by simon.cook
[Inliner] Make the CallPenalty configurable

Tests with multiple benchmarks, like Embench [1], showed that the
CallPenalty magic number has the most influence on inlining decisions
when optimizing for size.

On the other hand, there was no good default value for this parameter.
Some benchmarks profited strongly from a reduced call penalty. On
example is the picojpeg benchmark compiled for RISC-V, which got 6%
smaller with a CallPenalty of 10 instead of 12. Other benchmarks
increased in size, like matmult.

This commit makes the compromise of turning the magic number constant of
CallPenalty into a configurable value. This introduces the flag
`--inline-call-penalty`. With that flag users can fine tune the inliner
to their needs.

The CallPenalty constant was also used for loops. This commit replaces
the CallPenalty constant with a new LoopPenalty constant that is now
used instead.

This is a slimmed down version of https://reviews.llvm.org/D30899

[1]: https://github.com/embench/embench-iot

Differential Revision: https://reviews.llvm.org/D105976
The file was addedllvm/test/Transforms/Inline/inline-call-penalty-option.ll
The file was modifiedllvm/include/llvm/Analysis/InlineCost.h
The file was modifiedllvm/lib/Analysis/InlineCost.cpp
Commit 8a8d01d58c14c65d6b1a40bf3335c72f6fcd1388 by paul.walker
[NFC] Change VFShape so it contains an ElementCount rather than seperate VF and IsScalable properties.

Differential Revision: https://reviews.llvm.org/D106750
The file was modifiedllvm/include/llvm/Analysis/VectorUtils.h
The file was modifiedllvm/unittests/Analysis/VectorUtilsTest.cpp
The file was modifiedllvm/unittests/Analysis/VectorFunctionABITest.cpp
The file was modifiedllvm/lib/Analysis/VFABIDemangling.cpp
Commit a053afed49897aa34e08287f91c5255efa4e5131 by a.bataev
[SLP]Fix costs calculations.

Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.

Differential Revision: https://reviews.llvm.org/D106578
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll
Commit 539437e288f2395288a46a550c4c3070c4b16101 by tpopp
[mlir] split type conversion to two lines for GCC's sake
The file was modifiedmlir/lib/Transforms/BufferDeallocation.cpp
Commit bf28111ebdb760f46f168c867f7e8453c23814ed by caroline.concatto
[AArch65][SVE] Remove vector_splice from AddedComplexity pattern

The pattern for vector_splice with Index equal or bigger than
zero was misplaced in the AddedComplexity = 1 pattern in the AArch64
tablegen file. This patch fixes it by removing vector_splice pattern
from inside AddedComplexity = 1.
The file was modifiedllvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
Commit d7cb2a07967791867c245a6e2e8e4214d69140f7 by a.bataev
Revert "[SLP]Fix costs calculations."

This reverts commit a053afed49897aa34e08287f91c5255efa4e5131 to fix
buildbots.
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll
Commit e484e1ae03325823c469684d7d1532f2aadbe98d by kerry.mclaughlin
[SVE] Fix casts to <FixedVectorType> in truncateToMinimalBitwidths

Fixes more casts to `<FixedVectorType>` for the cases where the
instruction is a Insert/ExtractElementInst.

For fixed-width, this part of truncateToMinimalBitWidths is tested by
AArch64/type-shrinkage-insertelt.ll. I attempted to write a test case for this part
of truncateToMinimalBitWidths which uses scalable vectors, but was unable to add
one. The tests in type-shrinkage-insertelt.ll rely on scalarization to create extract
element instructions for instance, which is not possible for scalable vectors.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106163
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp