Commit
09fe84abb4ee71f707c3ec8e960a42d8292f6211
by Piotr Sobczak[AMDGPU] Move code sinking before structurizer
Moving code sinking pass before structurizer creates more sinking opportunities.
The extra flow edges introduced by the structurizer can have adverse effects on sinking, because the sinking pass prefers moving instructions to blocks with unique predecessors and the structurizer destroys that property in some cases.
A notable example is moving high-latency image instructions across kills.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D101115
|
 | llvm/test/CodeGen/AMDGPU/multilevel-break.ll |
 | llvm/test/CodeGen/AMDGPU/llc-pipeline.ll |
 | llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |
 | llvm/test/CodeGen/AMDGPU/sink-image-sample.ll |
 | llvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll |
Commit
49950cb1f6f699cbb9d8f141c0c043d4795c3417
by spatel[SLP] restrict matching of load combine candidates
The test example from https://llvm.org/PR50256 (and reduced here) shows that we can match a load combine candidate even when there are no "or" instructions. We can avoid that by confirming that we do see an "or". This doesn't apply when matching an or-reduction because that match begins from the operands of the reduction.
Differential Revision: https://reviews.llvm.org/D102074
|
 | llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp |
 | llvm/test/Transforms/SLPVectorizer/AArch64/widen.ll |
Commit
c02476f3158f2908ef0a6f628210b5380bd33695
by lebedev.ri[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again
Instead of handling power-of-two sized vector chunks, try handling the large vector in a stream mode, decreasing the operational vector size once it no longer works for the elements left to process.
Notably, this improves costs for overaligned loads - loading padding is fine. This more directly tracks when we need to insert/extract the YMM/XMM subvector, some costs fluctuate because of that.
Reviewed By: RKSimon, ABataev
Differential Revision: https://reviews.llvm.org/D100684
|
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
 | llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll |
 | llvm/test/Analysis/CostModel/X86/load_store.ll |
 | llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll |
Commit
bce3cca4889a9e4ab7b9652b0c44bb49ca8f3bad
by Matthew.ArsenaultCodeGen: Fix null dereference before null check
|
 | llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp |
Commit
9acc03ad92c66b856f67bf11ff4460c7da45f413
by llvm-dev[X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp
foldShuffleOfHorizOp only handled basic shufps(hop(x,y),hop(z,w)) folds - by moving this to canonicalizeShuffleMaskWithHorizOp we can work with more general/combined v4x32 shuffles masks, float/integer domains and support shuffle-of-packs as well.
The next step will be to support 256/512-bit vector cases.
|
 | llvm/test/CodeGen/X86/horizontal-sum.ll |
 | llvm/lib/Target/X86/X86ISelLowering.cpp |
 | llvm/test/CodeGen/X86/horizontal-shuffle.ll |
Commit
759b97e55a4bd7b0d89493686f4a769718e385ee
by llvm-dev[X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI.
Noticed while looking at D101944
|
 | llvm/lib/Target/X86/X86ISelDAGToDAG.cpp |
Commit
6ca2bdb03c0fdb6736ed5c6a30d7bec6b557d1a0
by Paul C. Anagnostopoulos[TableGen] Make the NUL character invalid in .td files
Differential Revision: https://reviews.llvm.org/D101923
|
 | llvm/test/TableGen/nul-char.td |
 | llvm/lib/TableGen/TGLexer.cpp |
Commit
69ed93a4355123a45c1d7216aea7cd53d07a361b
by lebedev.ri[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()
Now that getMemoryOpCost() correctly handles all the vector variants, we should no longer hand-roll our own version of it, but use it directly.
The AVX512 variant probably needs a similar change, but there it is less obvious.
|
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
 | llvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll |
 | llvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll |
Commit
faebc6bf108eccdfd75917636c64137f73a7bda7
by flo[VPlan] Register recipe for instr if the simplified value is recipe.
If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled in general may change soon, following some post-commit comments.
This fixes PR50298.
|
 | llvm/test/Transforms/LoopVectorize/reduction.ll |
 | llvm/lib/Transforms/Vectorize/LoopVectorize.cpp |
Commit
c765d140fe45906fb503d843acccf5838e775245
by jonathan.l.peyton[OpenMP] Fix hidden helper + affinity
When KMP_AFFINITY is set, each worker thread's gtid value is used as an index into the place list to determine the thread's placement. With hidden helpers enabled, this gtid value is shifted down leading to unexpected shifted thread placement. This patch restores the previous behavior by adjusting the mask index to take the number of hidden helper threads into account.
Hidden helper threads are given the full initial mask and do not participate in any of the other affinity mechanisms (place partitioning, balanced affinity). Their affinity is only printed for debug builds.
Differential Revision: https://reviews.llvm.org/D101882
|
 | openmp/runtime/src/kmp_runtime.cpp |
 | openmp/runtime/src/kmp_affinity.cpp |
 | openmp/runtime/src/kmp.h |
Commit
46402eb103d06b1e695ecfd6f6c9571615042a9c
by Paul C. AnagnostopoulosRevert "[TableGen] Make the NUL character invalid in .td files"
At least one build uses a 'sed' that does not understand \x00.
This reverts commit cf9647011c4f05e1eb4423c6637d84e2f26b2042.
|
 | llvm/test/TableGen/nul-char.td |
 | llvm/lib/TableGen/TGLexer.cpp |
Commit
c58912eca743c612fd2a22c03b64a1bda3d2180f
by aakanksha555Fix typo "Execpt" in comments
Differential Revision: https://reviews.llvm.org/D101858
|
 | llvm/lib/Target/AMDGPU/SIInstrInfo.td |
Commit
29342291d25b83da97e74d75004b177ba41114fc
by congzhecao[LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.
When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or
for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++),
we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D101305
|
 | llvm/lib/Transforms/Scalar/LoopInterchange.cpp |
 | llvm/test/Transforms/LoopInterchange/inner-indvar-depend-on-outer-indvar.ll |
Commit
eca3d68399246765bc6e8c94ffb4d5927b1add12
by Pushpinder.SinghRevert "[AMDGPU][OpenMP] Emit textual IR for -emit-llvm -S"
This reverts commit 7f78e409d0280c62209e1a7dc8c6d1409acc9184.
|
 | clang/lib/Driver/ToolChains/Clang.cpp |
 | clang/test/Driver/amdgpu-openmp-toolchain.c |