Changes

Summary

  1. [NFC] Rename Context->CtxI in SCEV for uniformity reasons (details)
  2. [Polly] Don't generate inter-iteration noalias metadata. (details)
  3. [SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block (details)
  4. [OpAsmParser] Add a parseCommaSeparatedList helper and beef up Delimeter. (details)
  5. BPF: make 32bit register spill with 64bit alignment (details)
  6. [SCEV] Generalize implication when signedness of FoundPred doesn't matter (details)
  7. [GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts. (details)
Commit a06db78fd99014993b62b99c305c7b374c1579fc by mkazantsev
[NFC] Rename Context->CtxI in SCEV for uniformity reasons
The file was modifiedllvm/include/llvm/Analysis/ScalarEvolution.h
The file was modifiedllvm/lib/Analysis/ScalarEvolution.cpp
Commit cad9f98a2ad98fecf663e9ce39502b8e43676fc9 by llvm-project
[Polly] Don't generate inter-iteration noalias metadata.

This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.

Rhe implemention had multiple issues:

* The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.

* This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.

* Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.

* Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.

* The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.

A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.

This effectively reverts D30606 and D35761.
The file was modifiedpolly/include/polly/CodeGen/IRBuilder.h
The file was modifiedpolly/lib/CodeGen/IRBuilder.cpp
The file was modifiedpolly/test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll
The file was removedpolly/test/ScheduleOptimizer/pattern-matching-based-opts_10.ll
The file was modifiedpolly/test/ScheduleOptimizer/pattern-matching-based-opts_5.ll
The file was modifiedpolly/lib/Transform/MatmulOptimizer.cpp
The file was modifiedpolly/test/ScheduleOptimizer/pattern-matching-based-opts_14.ll
The file was modifiedpolly/lib/CodeGen/IslNodeBuilder.cpp
The file was modifiedpolly/test/ScheduleOptimizer/ensure-correct-tile-sizes.ll
The file was modifiedpolly/test/ScheduleOptimizer/pattern-matching-based-opts_13.ll
The file was modifiedpolly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll
Commit 073b254cffeffdef36ffbee0c9afdc0da9cd6ac3 by mkazantsev
[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block

When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri
The file was modifiedllvm/test/Transforms/SimplifyCFG/switch_ub.ll
The file was modifiedllvm/lib/Transforms/Utils/SimplifyCFG.cpp
The file was modifiedllvm/test/CodeGen/AArch64/arm64-ccmp.ll
Commit 58abc8c34bde7021bbfa0a7bdfd2af9524cba263 by clattner
[OpAsmParser] Add a parseCommaSeparatedList helper and beef up Delimeter.

Lots of custom ops have hand-rolled comma-delimited parsing loops, as does
the MLIR parser itself.  Provides a standard interface for doing this that
is less error prone and less boilerplate.

While here, extend Delimiter to support <> and {} delimited sequences as
well (I have a use for <> in CIRCT specifically).

Differential Revision: https://reviews.llvm.org/D110122
The file was modifiedmlir/lib/Parser/Parser.cpp
The file was modifiedmlir/lib/Parser/Parser.h
The file was modifiedmlir/include/mlir/IR/OpImplementation.h
The file was modifiedmlir/test/IR/invalid-affinemap.mlir
The file was modifiedmlir/lib/Dialect/StandardOps/IR/Ops.cpp
The file was modifiedmlir/lib/Parser/AttributeParser.cpp
The file was modifiedmlir/test/IR/invalid.mlir
The file was modifiedmlir/lib/Dialect/Async/IR/Async.cpp
The file was modifiedmlir/lib/Parser/LocationParser.cpp
The file was modifiedmlir/lib/Parser/TypeParser.cpp
The file was modifiedmlir/lib/Parser/AffineParser.cpp
The file was modifiedmlir/lib/Dialect/SPIRV/IR/SPIRVOps.cpp
The file was modifiedmlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
Commit ea72b0319d7b0f0c2fcf41d121afa5d031b319d5 by yhs
BPF: make 32bit register spill with 64bit alignment

In llvm, for non-alu32 mode, the stack alignment is 64bit so only one
64bit spill per 64bit slot. For alu32 mode, the stack alignment
is 32bit, so it is possible to have two 32bit spills per
64bit slot.

Currently, bpf kernel verifier does not preserve register states
for 32bit spills. That is, one 32bit register may hold a constant
value or a bounded range before spill. After reload from the
stack, the information is lost and sometimes this may cause
verifier failure. For 64bit register spill, the verifier
indeed tries to preserve the register state for reloading.

The current verifier can be modestly changed to handle one
32bit spill per 64bit stack slot with state-preserving reload.
Handling two 32bit spills per 64bit stack slot will require
substantial changes.

This patch changes stack alignment for alu32 to be 64bit.
This way, for any 64bit slot in alu32 mode, only one
32bit or 64bit register values can be saved. Together
with previous-mentioned verifier enhancement, 32bit
spill can be handled with state preserving.

Note that llvm stack slot coallescing
seems only doing adjacent packing which may leave some holes
in the stack. For example,
   stack slot 8   <== 8 bytes
   stack slot 4   <== 8 bytes with 4 byte hole
   stack slot 8   <== 8 bytes
   stack slot 4   <== 4 bytes

Differential Revision: https://reviews.llvm.org/D109073
The file was addedllvm/test/CodeGen/BPF/spill-alu32.ll
The file was modifiedllvm/lib/Target/BPF/BPFRegisterInfo.td
Commit 2c7d5fbc9ebf914f90acad8534289ea01e899ec8 by mkazantsev
[SCEV] Generalize implication when signedness of FoundPred doesn't matter

The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.

Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic
The file was modifiedllvm/test/Transforms/IndVarSimplify/negative_ranges.ll
The file was modifiedllvm/lib/Analysis/ScalarEvolution.cpp
Commit 7091a7f781c9889f109b6be7b07822bfd91094dc by Amara Emerson
[GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts.

For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
seem to have debug users of defs. However, in the legalizer we're always calling
MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
In some rare cases, this contributes significantly to unreasonably long compile
times when we have lots of artifact combiner activity.

To verify this, I added asserts to that function when it actually replaced a debug
use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
debug info enabled, I didn't see a single case where it triggered.

In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
for AArch64 with this change.

Differential Revision: https://reviews.llvm.org/D109750
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/bug-legalization-artifact-combiner-dead-def.mir
The file was modifiedllvm/lib/CodeGen/GlobalISel/Utils.cpp