SuccessChanges

Summary

  1. [X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants (details)
  2. [X86] AMD Zen 3 has fast variable per-lane shuffles (details)
  3. [mlir][linalg] Cleanup LinalgOp usage in vectorization (NFC). (details)
  4. [clangd] Fix -Wunused-variable warning (NFC) (details)
  5. [mlir][linalg] Cleanup LinalgOp usage in tiling (NFC). (details)
  6. [mlir][linalg] Cleanup LinalgOp usage in fusion (NFC). (details)
  7. [mlir][linalg] Cleanup LinalgOp usage in dependence analysis (NFC). (details)
  8. Mark test as requiring asserts. (details)
  9. [VectorCombine] Add tests with multiple noundef indices for scalarization. (details)
  10. [WebAssembly][CodeGen] IR support for WebAssembly local variables (details)
  11. [RISCV] Support vector types in combination with fastcc (details)
  12. [VectorCombine] Freeze index unless it is known to be non-poison. (details)
Commit cf9b1f7a0e9da5d019a8bea853f3cff85d808d18 by lebedev.ri
[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants

Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature,
which controls profitability of both the cross-lane and per-lane variable shuffles.
I guess, this has been fine so far.

But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`)
are as fast as as shuffles with fixed/immediate mask,
while lane-crossing shuffles, e.g. `VPERMPS` is performing worse.

So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles,
as suggested by @RKSimon, split the feature flag into two.

Differential Revision: https://reviews.llvm.org/D103274
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-128-v8.ll
The file was modifiedllvm/test/CodeGen/X86/uadd_sat_vec.ll
The file was modifiedllvm/test/CodeGen/X86/shuffle-strided-with-offset-128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-v1.ll
The file was modifiedllvm/lib/Target/X86/X86.td
The file was modifiedllvm/test/CodeGen/X86/vec_saddo.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.h
The file was modifiedllvm/test/CodeGen/X86/psubus.ll
The file was modifiedllvm/test/CodeGen/X86/shuffle-vs-trunc-256.ll
The file was modifiedllvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
The file was modifiedllvm/test/CodeGen/X86/vec_uaddo.ll
The file was modifiedllvm/test/CodeGen/X86/avx2-conversions.ll
The file was modifiedllvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
The file was modifiedllvm/test/CodeGen/X86/broadcastm-lowering.ll
The file was modifiedllvm/test/CodeGen/X86/sadd_sat_vec.ll
The file was modifiedllvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll
The file was modifiedllvm/test/CodeGen/X86/oddsubvector.ll
The file was modifiedllvm/test/CodeGen/X86/ssub_sat_vec.ll
The file was modifiedllvm/test/CodeGen/X86/shuffle-strided-with-offset-256.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v4.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc-packus.ll
The file was modifiedllvm/test/CodeGen/X86/usub_sat_vec.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/combine-sra.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
The file was modifiedllvm/test/CodeGen/X86/avx512-extract-subvector-load-store.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-128-v16.ll
The file was modifiedllvm/test/CodeGen/X86/avx2-vector-shifts.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-2.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
The file was modifiedllvm/lib/Target/X86/X86Subtarget.h
The file was modifiedllvm/test/CodeGen/X86/shuffle-strided-with-offset-512.ll
The file was modifiedllvm/test/CodeGen/X86/paddus.ll
The file was modifiedllvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
The file was modifiedllvm/test/CodeGen/X86/phaddsub.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll
The file was modifiedllvm/test/CodeGen/X86/vector-half-conversions.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-4.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc.ll
The file was modifiedllvm/test/CodeGen/X86/shuffle-of-splat-multiuses.ll
The file was modifiedllvm/test/CodeGen/X86/vec_usubo.ll
The file was modifiedllvm/test/CodeGen/X86/vec_umulo.ll
The file was modifiedllvm/test/CodeGen/X86/combine-shl.ll
The file was modifiedllvm/test/CodeGen/X86/vec_smulo.ll
The file was modifiedllvm/test/CodeGen/X86/vec_ssubo.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc-ssat.ll
The file was modifiedllvm/test/CodeGen/X86/insertelement-zero.ll
The file was modifiedllvm/test/CodeGen/X86/combine-srl.ll
The file was modifiedllvm/test/CodeGen/X86/oddshuffles.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-128-unpck.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-5.ll
The file was modifiedllvm/test/CodeGen/X86/vector-zext.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc-usat.ll
The file was modifiedllvm/test/CodeGen/X86/avx512-shuffles/broadcast-vector-int.ll
The file was modifiedllvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-3.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc-math.ll
The file was modifiedllvm/test/CodeGen/X86/avx512-trunc.ll
The file was modifiedllvm/test/CodeGen/X86/shuffle-vs-trunc-128.ll
Commit a3b8695bf5927f0a43d295dfdfeafeef4da022ea by lebedev.ri
[X86] AMD Zen 3 has fast variable per-lane shuffles

... but lane-crossing shuffles are slow.
The file was modifiedllvm/lib/Target/X86/X86.td
Commit 912ebf60b15123827299df73a7c9136f6693b487 by gysit
[mlir][linalg] Cleanup LinalgOp usage in vectorization (NFC).

Replace the uses of deprecated Structured Op Interface methods in Vectorization.cpp. This patch is based on https://reviews.llvm.org/D103394.

Differential Revision: https://reviews.llvm.org/D103410
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
Commit 5b747197f8fb83bb7c256fa6cb2010445deb0a85 by nullptr.cpp
[clangd] Fix -Wunused-variable warning (NFC)

GCC warning:
```
/llvm-project/clang-tools-extra/clangd/InlayHints.cpp: In member function ‘bool clang::clangd::InlayHintVisitor::VisitVarDecl(clang::VarDecl*)’:
/llvm-project/clang-tools-extra/clangd/InlayHints.cpp:81:15: warning: unused variable ‘AT’ [-Wunused-variable]
   81 |     if (auto *AT = D->getType()->getContainedAutoType()) {
      |               ^~

```
The file was modifiedclang-tools-extra/clangd/InlayHints.cpp
Commit c2e5226a851413464356163fc028f23653dad4cd by gysit
[mlir][linalg] Cleanup LinalgOp usage in tiling (NFC).

Replace the uses of deprecated Structured Op Interface methods in Tiling.cpp and Utils.cpp. This patch is based on https://reviews.llvm.org/D103394.

Differential Revision: https://reviews.llvm.org/D103438
The file was modifiedmlir/lib/Dialect/Linalg/Utils/Utils.cpp
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
Commit 7594f5028a11c68bcfdf631928ab44889127fab7 by gysit
[mlir][linalg] Cleanup LinalgOp usage in fusion (NFC).

Replace the uses of deprecated Structured Op Interface methods in Fusion.cpp. This patch is based on https://reviews.llvm.org/D103394.

Differential Revision: https://reviews.llvm.org/D103437
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
Commit 94643fda1346e8dab30243d02250cd44683445f2 by gysit
[mlir][linalg] Cleanup LinalgOp usage in dependence analysis (NFC).

Replace the uses of deprecated Structured Op Interface methods in DependenceAnalysis.cpp and DependenceAnalysis.h. This patch is based on https://reviews.llvm.org/D103394.

Differential Revision: https://reviews.llvm.org/D103411
The file was modifiedmlir/lib/Dialect/Linalg/Analysis/DependenceAnalysis.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h
Commit 18225d45769b8c86c8291505de90ea539ae4f445 by douglas.yung
Mark test as requiring asserts.
The file was modifiedllvm/test/Transforms/LoopVectorize/RISCV/riscv-interleaved.ll
Commit f000c4cfb66ce5f86394920db35397d618c0855a by flo
[VectorCombine] Add tests with multiple noundef indices for scalarization.
The file was modifiedllvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
Commit 82f92e35c6464e23859c29422956caaceb623967 by wingo
[WebAssembly][CodeGen] IR support for WebAssembly local variables

This patch adds TargetStackID::WasmLocal.  This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.

For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1.  SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there.  Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.

Differential Revision: https://reviews.llvm.org/D101140
The file was modifiedllvm/include/llvm/CodeGen/MIRYamlMapping.h
The file was modifiedllvm/lib/Target/AMDGPU/SIFrameLowering.cpp
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
The file was modifiedllvm/include/llvm/CodeGen/TargetFrameLowering.h
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyFrameLowering.cpp
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyISD.def
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyFrameLowering.h
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td
The file was modifiedllvm/lib/Target/RISCV/RISCVFrameLowering.cpp
The file was addedllvm/test/CodeGen/WebAssembly/ir-locals.ll
The file was addedllvm/test/CodeGen/WebAssembly/ir-locals-stackid.ll
Commit 4f500c402b7357808d9595313438f223447dcace by fraser
[RISCV] Support vector types in combination with fastcc

This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.

Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102505
The file was addedllvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.h
The file was addedllvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv-fastcc.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.cpp
Commit d4c070d801413186c5a59cede9d721e9ca099708 by flo
[VectorCombine] Freeze index unless it is known to be non-poison.

If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff5574.

This patch updates the code to freeze the index, unless it is proven to
not be poison.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D103378
The file was modifiedllvm/lib/Transforms/Vectorize/VectorCombine.cpp
The file was modifiedllvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll