Changes

Summary

  1. [ELF][NFC] Do not pass region name to expandMemoryRegion() (details)
  2. tsan: add another fork test (details)
  3. [C++20] [Coroutines] Warn for deprecated form 'for co_await' (details)
  4. Fix nits in clang-tidy's documentation (NFC) (details)
  5. [AArch64] Sink splat shuffles to lane index intrinsics (details)
  6. [clangd] IncludeCleaner: Mark possible expr resolutions as used (details)
  7. Add missing clang-tidy args in index.rst (NFC) (details)
  8. Fix various problems found by fuzzing. (details)
  9. [libc] Remove unused variable (details)
  10. [LV] Pre-commit test for D111846 (details)
  11. [MLIR][NFC] Simplex: remove repeated words in comment (details)
  12. [BPI] Look-up tables for non-loop branches. NFC. (details)
  13. [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm) (details)
  14. [mlir][linalg] Fix tile and fuse for outermost reduction. (details)
  15. [mlir] Fix unintentional mutation by VectorType/RankedTensorType::Builder dropDim (details)
  16. [LV] Drop integer poison-generating flags from instructions that need predication (details)
  17. [mlir][linalg] Add a tile and fuse on tensors pattern. (details)
  18. [mlir] Add InitializeNativeTargetAsmParser to ExecutionEngine. (details)
  19. [X86][TTI] Costmodel for AVX512DQ's VPMOVM2[DQ] / VPMOV[DQ]2M instructions (details)
  20. [X86][TTI] Finish costmodel for AVX512BW's VPMOVM2[BW] / VPMOV[BW]2M instructions (details)
  21. [DA][NFC] Update publication - add remarks (details)
  22. [AArch64][ARM] Add missing SVE/SVE2 features from Cortex-A710 (details)
  23. [mlir][linalg] Remove tile and fuse test pass (NFC). (details)
  24. Rename MlirExecutionEngine lookup to lookupPacked (details)
  25. [mlir][linalg] Always generate an extract/insert slice pair when tiling output tensors. (details)
  26. [mlir][linalg] Use getAsOpFoldResult in padding (NFC). (details)
Commit a05b694b1e1d742d7702c1774abfaf98f502f04b by ikudrin
[ELF][NFC] Do not pass region name to expandMemoryRegion()

The name can be easily got on-site.

Differential Revision: https://reviews.llvm.org/D114228
The file was modifiedlld/ELF/LinkerScript.cpp
Commit 6a3958247aeeacdbf40833151220b089f066c82f by dvyukov
tsan: add another fork test

Add a fork test that models what happens on Mac
where fork calls malloc/free inside of our atfork
callbacks.

Reviewed By: vitalybuka, yln

Differential Revision: https://reviews.llvm.org/D114250
The file was modifiedcompiler-rt/lib/tsan/rtl/tsan_rtl.cpp
The file was addedcompiler-rt/test/tsan/Linux/fork_deadlock.cpp
Commit 2ac339ef5f0feca2abe2b8a1720839c58184166c by yedeng.yd
[C++20] [Coroutines] Warn for deprecated form 'for co_await'

The form 'for co_await' is part of CoroutineTS instead of C++20.
So if we detected the use of 'for co_await' in C++20, we should emit
a warning at least.
The file was modifiedclang/include/clang/Basic/DiagnosticParseKinds.td
The file was modifiedclang/include/clang/Basic/DiagnosticGroups.td
The file was modifiedclang/lib/Parse/ParseStmt.cpp
The file was modifiedclang/test/SemaCXX/co_await-range-for.cpp
Commit 83484f8472ad7f8ab91b4e944a6f092e8f4d16a8 by mail
Fix nits in clang-tidy's documentation (NFC)

Add commas, articles, and conjunctions where missing.
The file was modifiedclang-tools-extra/docs/clang-tidy/index.rst
Commit 760d4d03d5d3fc0e0d6e4222f670e5fd068645f2 by david.green
[AArch64] Sink splat shuffles to lane index intrinsics

This teaches AArch64TargetLowering::shouldSinkOperands to sink splat
shuffles to certain neon intrinsics, so that they can make use of the
lane variants of the instructions that are available.

Differential Revision: https://reviews.llvm.org/D112994
The file was modifiedllvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AArch64/sinksplat.ll
Commit b5f20372a82f72f03d47181b87fb55f62772324f by kbobyrev
[clangd] IncludeCleaner: Mark possible expr resolutions as used

Fixes: https://github.com/clangd/clangd/issues/934

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D114287
The file was modifiedclang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp
The file was modifiedclang-tools-extra/clangd/IncludeCleaner.cpp
Commit a82942dd07ea652081f8f293b73801323a4dbbe9 by mail
Add missing clang-tidy args in index.rst (NFC)

The RST docs have gone out of sync with the command-line args that the
clang-tidy program actually supports.
The file was modifiedclang-tools-extra/docs/clang-tidy/index.rst
Commit 84bf5e328664db2e744c4651c52d2460b1733d09 by klimek
Fix various problems found by fuzzing.

1. IndexTokenSource::getNextToken cannot return nullptr; some code was
still written assuming it can; make getNextToken more resilient against
incorrect input and fix its call-sites.

2. Change various asserts that can happen due to user provided input to
conditionals in the code.
The file was modifiedclang/lib/Format/UnwrappedLineParser.cpp
The file was modifiedclang/lib/Format/TokenAnnotator.cpp
The file was modifiedclang/lib/Format/WhitespaceManager.cpp
The file was modifiedclang/lib/Format/ContinuationIndenter.cpp
Commit 2f1c037bbdc4a949e83466d6b315002d71c67731 by gchatelet
[libc] Remove unused variable
The file was modifiedlibc/src/__support/str_to_float.h
Commit a7027bb7997184fd1e6d2ba370ebd4f109a6e737 by diegocaballero
[LV] Pre-commit test for D111846

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D112054
The file was addedllvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
Commit d92aabc33666e83612c93e7c9c5c454510ba9b07 by arjunpitchanathan
[MLIR][NFC] Simplex: remove repeated words in comment
The file was modifiedmlir/include/mlir/Analysis/Presburger/Simplex.h
Commit 4d21b64464ac548ec8442bc0d2a7e984ba78bd88 by sjoerd.meijer
[BPI] Look-up tables for non-loop branches. NFC.

This adds and uses look-up tables for non-loop branch probabilities, which have
have probabilities directly encoded into the tables for the different condition
codes. Compared to having this logic inlined in different functions, as it used
to be the case, I think this is compacter and thus also easier to check/cross
reference. This also adds a test for pointer heuristics that was missing.

Differential Revision: https://reviews.llvm.org/D114009
The file was addedllvm/test/Analysis/BranchProbabilityInfo/pointer_heuristics.ll
The file was modifiedllvm/lib/Analysis/BranchProbabilityInfo.cpp
Commit a9e236bed835c58be381dadb973a1db0681e4795 by nicolas.vasilache
[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)

This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Reviewed By: ftynse, dcaballe

Differential Revision: https://reviews.llvm.org/D114335
The file was modifiedmlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp
The file was modifiedmlir/test/lib/Dialect/Vector/CMakeLists.txt
The file was modifiedmlir/include/mlir/Dialect/X86Vector/Transforms.h
The file was addedmlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
The file was modifiedutils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel
The file was modifiedmlir/test/Dialect/Vector/vector-transpose-lowering.mlir
The file was modifiedmlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
Commit 0ccc44cec067abbc702d5d3afb44e0395c55820d by gysit
[mlir][linalg] Fix tile and fuse for outermost reduction.

Tile and fuse failed if the outermost tile loop is a reduction dimension. Add the necessary check to handle outermost reductions and introduce a test case to verify the change.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114012
The file was modifiedmlir/test/Dialect/Linalg/tile-and-fuse-on-tensors.mlir
The file was modifiedmlir/include/mlir/Dialect/Linalg/Utils/Utils.h
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
Commit 789c88e80e878ed866a2d8cfe29c7fd36082274c by nicolas.vasilache
[mlir] Fix unintentional mutation by VectorType/RankedTensorType::Builder dropDim

Differential Revision: https://reviews.llvm.org/D113933
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
The file was modifiedmlir/include/mlir/IR/BuiltinTypes.h
The file was modifiedmlir/lib/Dialect/Vector/VectorTransforms.cpp
Commit 4348cd42c385e71b63e5da7e492172cff6a79d7b by diegocaballero
[LV] Drop integer poison-generating flags from instructions that need predication

This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact`
and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are
guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop
is linearized, these flags may lead to generating a poison value that is effectively used as the base address
of the widen load/store. The fix drops all the integer poison-generating flags from instructions that
contribute to the address computation of a widen load/store whose original instruction was in a basic block
that needed predication and is not predicated after vectorization.

Reviewed By: fhahn, spatel, nlopes

Differential Revision: https://reviews.llvm.org/D111846
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/sve-masked-loadstore.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.h
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/x86-pr39099.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll
Commit e3d386ea27336edc04ae4fd324ab4337b9f3cf16 by gysit
[mlir][linalg] Add a tile and fuse on tensors pattern.

Add a pattern to apply the new tile and fuse on tensors method. Integrate the pattern into the CodegenStrategy and use the CodegenStrategy to implement the tests.

Depends On D114012

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114067
The file was modifiedmlir/test/lib/Dialect/Linalg/TestLinalgCodegenStrategy.cpp
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/LinalgStrategyPasses.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/Transforms/CodegenStrategy.h
The file was modifiedmlir/include/mlir/Dialect/Linalg/Utils/Utils.h
The file was modifiedmlir/test/Dialect/Linalg/tile-and-fuse-sequence-on-tensors.mlir
The file was modifiedmlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
The file was modifiedmlir/test/Dialect/Linalg/tile-and-fuse-on-tensors.mlir
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/Passes.td
The file was modifiedmlir/include/mlir/Dialect/Linalg/Passes.h
Commit 050cc1cd6e6882eadba6e5ea7b588ca0b8aa1b12 by nicolas.vasilache
[mlir] Add InitializeNativeTargetAsmParser to ExecutionEngine.

This is required to allow python to work with lowerings that use inline_asm.

Differential Revision: https://reviews.llvm.org/D114338
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
The file was modifiedmlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp
The file was modifiedmlir/lib/ExecutionEngine/CMakeLists.txt
Commit 8d09dd61c381b9c037da0c172b7b4592d9503d2c by lebedev.ri
[X86][TTI] Costmodel for AVX512DQ's VPMOVM2[DQ] / VPMOV[DQ]2M instructions

Much like the VPMOVM2[BW] / VPMOV[BW]2M from AVX512BW,
these either sign-extent the mask register into a vector,
or pack the mask from vector register.

Apparently, we didn't even have MCA tests for these,
added in rG2f364f6f0d3a2420ca78cbd80abb186657180e05,
so i'm just guessing that their perf characteristics
are optimal.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114314
The file was modifiedllvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/trunc.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/extend.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit 704d92607d26e696daba596b72cb70effe79a872 by lebedev.ri
[X86][TTI] Finish costmodel for AVX512BW's VPMOVM2[BW] / VPMOV[BW]2M instructions

Apparently my methodology was suboptimal, and not only did miss all the +VL tuples,
i also missed some plain tuples. I believe, this adds everything missing.
Indeed, these manual costmodels are just not okay long-term.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114334
The file was modifiedllvm/test/Analysis/CostModel/X86/shuffle-replication-i1.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/trunc.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/extend.ll
Commit 56db1c072c92be36fb1d76aa30487ad62dc58ea8 by simon.moll
[DA][NFC] Update publication - add remarks

Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis.  Fix phrasing, formatting. Add comments on reducible loop limitation.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D114146
The file was modifiedllvm/lib/Analysis/SyncDependenceAnalysis.cpp
The file was modifiedllvm/lib/Analysis/DivergenceAnalysis.cpp
Commit 955c72c35caf68fe4e2f026da67c6fdcd31d01ad by bradley.smith
[AArch64][ARM] Add missing SVE/SVE2 features from Cortex-A710

Differential Revision: https://reviews.llvm.org/D114169
The file was modifiedllvm/include/llvm/Support/AArch64TargetParser.def
The file was modifiedllvm/unittests/Support/TargetParserTest.cpp
Commit f7751a3a4218229c59adced4964831f7a57d256d by gysit
[mlir][linalg] Remove tile and fuse test pass (NFC).

Remove the tile and fuse test pass that has been replaced by codegen strategy.

Depends On D114067

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114068
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/Passes.td
The file was modifiedmlir/include/mlir/Dialect/Linalg/Passes.h
Commit 106f3074996c69ab732c6371d5ad6b25fcfd4fa5 by tpopp
Rename MlirExecutionEngine lookup to lookupPacked

The purpose of the change is to make clear whether the user is
retrieving the original function or the wrapper function, in line with
the invoke commands. This new functionality is useful for users that
already have defined their own packed interface, so they do not want the
extra layer of indirection, or for users wanting to the look at the
resulting primary function rather than the wrapper function.

All locations, except the python bindings now have a `lookupPacked`
method that matches the original `lookup` functionality. `lookup`
still exists, but with new semantics.

- `lookup` returns the function with a given name. If `bool f(int,int)`
is compiled, `lookup` will return a reference to `bool(*f)(int,int)`.
- `lookupPacked` returns the packed wrapper of the function with the
given name. If `bool f(int,int)` is compiled, `lookupPacked` will return
`void(*mlir_f)(void**)`.

Differential Revision: https://reviews.llvm.org/D114352
The file was modifiedmlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp
The file was modifiedmlir/lib/Bindings/Python/ExecutionEngineModule.cpp
The file was modifiedmlir/lib/ExecutionEngine/ExecutionEngine.cpp
The file was modifiedmlir/lib/ExecutionEngine/JitRunner.cpp
The file was modifiedmlir/include/mlir/ExecutionEngine/ExecutionEngine.h
The file was modifiedmlir/include/mlir-c/ExecutionEngine.h
Commit 32c43241e716280d3443d684416826b1e7e5781b by gysit
[mlir][linalg] Always generate an extract/insert slice pair when tiling output tensors.

Adapt tiling to always generate an extract/insert slice pair for output tensors even if the tensor is not tiled. Having an explicit extract/insert slice pair simplifies followup transformations such as padding and bufferization. In particular, it makes read and written iteration argument slices explicit.

Depends On D114067

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114085
The file was modifiedmlir/test/Dialect/Linalg/tile-and-fuse-on-tensors.mlir
The file was modifiedmlir/test/Dialect/Linalg/fusion-tensor-pattern.mlir
The file was modifiedmlir/lib/Dialect/Linalg/Utils/Utils.cpp
Commit 247a1a55eb6a58199006565d594c6f6c6b58b736 by gysit
[mlir][linalg] Use getAsOpFoldResult in padding (NFC).

After padding, we introduce a ExtractSliceOp to get the final unpadded result. This revision uses getAsOpFoldResult to compute the size of the unpadded result, which guarantees the result type has a partially static shape if some of the sizes of the unpadded result are statically known. At the moment, we rely on canonicalization to cleanup the types after padding.

Depends On D114085

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114153
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Transforms.cpp