FailedChanges

Summary

  1. [Zorg][OpenMP] Add CUDA offloading worker. (details)
Commit e4aa8a2773fe76c427a91229f021ab067eafc8e7 by llvm-zorg
[Zorg][OpenMP] Add CUDA offloading worker.

This worker tests OpenMP offloading for the x86_64 and NVIDIA GPU. In
addition to check-openmp, it runs the SOLLVE Validation & Verification Suite
via the LLVM test-suite External builder. The builder is configured to
only warn if the SOLLVE suite fails, as it also tests features that
have not been implemented in Clang yet.

CUDA is intentionally not installed in a default location (/opt/cuda) to
resemble setups often found in computing clusters with multiple versions
of CUDA to choose from.

Reviewed By: gkistanova

Differential Revision: https://reviews.llvm.org/D101268
The file was modifiedbuildbot/osuosl/master/config/workers.py (diff)
The file was modifiedbuildbot/osuosl/master/config/builders.py (diff)
The file was modifiedzorg/buildbot/builders/OpenMPBuilder.py (diff)

Summary

  1. [AMDGPU] Move code sinking before structurizer (details)
  2. [SLP] restrict matching of load combine candidates (details)
  3. [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again (details)
  4. CodeGen: Fix null dereference before null check (details)
  5. [X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp (details)
  6. [X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI. (details)
  7. [TableGen] Make the NUL character invalid in .td files (details)
  8. [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost() (details)
  9. [VPlan] Register recipe for instr if the simplified value is recipe. (details)
  10. [OpenMP] Fix hidden helper + affinity (details)
  11. Revert "[TableGen] Make the NUL character invalid in .td files" (details)
  12. Fix typo "Execpt" in comments (details)
  13. [LoopInterchange] Fix legality for triangular loops (details)
  14. Revert "[AMDGPU][OpenMP] Emit textual IR for -emit-llvm -S" (details)
Commit 09fe84abb4ee71f707c3ec8e960a42d8292f6211 by Piotr Sobczak
[AMDGPU] Move code sinking before structurizer

Moving code sinking pass before structurizer creates more sinking
opportunities.

The extra flow edges introduced by the structurizer can have adverse
effects on sinking, because the sinking pass prefers moving instructions
to blocks with unique predecessors and the structurizer destroys that
property in some cases.

A notable example is moving high-latency image instructions across kills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D101115
The file was modifiedllvm/test/CodeGen/AMDGPU/multilevel-break.ll
The file was addedllvm/test/CodeGen/AMDGPU/sink-image-sample.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llc-pipeline.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
Commit 49950cb1f6f699cbb9d8f141c0c043d4795c3417 by spatel
[SLP] restrict matching of load combine candidates

The test example from https://llvm.org/PR50256 (and reduced here)
shows that we can match a load combine candidate even when there
are no "or" instructions. We can avoid that by confirming that we
do see an "or". This doesn't apply when matching an or-reduction
because that match begins from the operands of the reduction.

Differential Revision: https://reviews.llvm.org/D102074
The file was modifiedllvm/test/Transforms/SLPVectorizer/AArch64/widen.ll
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit c02476f3158f2908ef0a6f628210b5380bd33695 by lebedev.ri
[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again

Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.

Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.

Reviewed By: RKSimon, ABataev

Differential Revision: https://reviews.llvm.org/D100684
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/load_store.ll
Commit bce3cca4889a9e4ab7b9652b0c44bb49ca8f3bad by Matthew.Arsenault
CodeGen: Fix null dereference before null check
The file was modifiedllvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp
Commit 9acc03ad92c66b856f67bf11ff4460c7da45f413 by llvm-dev
[X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp

foldShuffleOfHorizOp only handled basic shufps(hop(x,y),hop(z,w)) folds - by moving this to canonicalizeShuffleMaskWithHorizOp we can work with more general/combined v4x32 shuffles masks, float/integer domains and support shuffle-of-packs as well.

The next step will be to support 256/512-bit vector cases.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/horizontal-sum.ll
The file was modifiedllvm/test/CodeGen/X86/horizontal-shuffle.ll
Commit 759b97e55a4bd7b0d89493686f4a769718e385ee by llvm-dev
[X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI.

Noticed while looking at D101944
The file was modifiedllvm/lib/Target/X86/X86ISelDAGToDAG.cpp
Commit 6ca2bdb03c0fdb6736ed5c6a30d7bec6b557d1a0 by Paul C. Anagnostopoulos
[TableGen] Make the NUL character invalid in .td files

Differential Revision: https://reviews.llvm.org/D101923
The file was addedllvm/test/TableGen/nul-char.td
The file was modifiedllvm/lib/TableGen/TGLexer.cpp
Commit 69ed93a4355123a45c1d7216aea7cd53d07a361b by lebedev.ri
[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()

Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll
Commit faebc6bf108eccdfd75917636c64137f73a7bda7 by flo
[VPlan] Register recipe for instr if the simplified value is recipe.

If the simplified VPValue is a recipe, we need to register it for Instr,
in case it needs to be recorded. The way this is handled in general may
change soon, following some post-commit comments.

This fixes PR50298.
The file was modifiedllvm/test/Transforms/LoopVectorize/reduction.ll
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Commit c765d140fe45906fb503d843acccf5838e775245 by jonathan.l.peyton
[OpenMP] Fix hidden helper + affinity

When KMP_AFFINITY is set, each worker thread's gtid value is used as an
index into the place list to determine the thread's placement. With hidden
helpers enabled, this gtid value is shifted down leading to unexpected
shifted thread placement. This patch restores the previous behavior by
adjusting the mask index to take the number of hidden helper threads
into account.

Hidden helper threads are given the full initial mask and do not
participate in any of the other affinity mechanisms (place partitioning,
balanced affinity). Their affinity is only printed for debug builds.

Differential Revision: https://reviews.llvm.org/D101882
The file was modifiedopenmp/runtime/src/kmp.h
The file was modifiedopenmp/runtime/src/kmp_affinity.cpp
The file was modifiedopenmp/runtime/src/kmp_runtime.cpp
Commit 46402eb103d06b1e695ecfd6f6c9571615042a9c by Paul C. Anagnostopoulos
Revert "[TableGen] Make the NUL character invalid in .td files"

At least one build uses a 'sed' that does not understand \x00.

This reverts commit cf9647011c4f05e1eb4423c6637d84e2f26b2042.
The file was modifiedllvm/lib/TableGen/TGLexer.cpp
The file was removedllvm/test/TableGen/nul-char.td
Commit c58912eca743c612fd2a22c03b64a1bda3d2180f by aakanksha555
Fix typo "Execpt" in comments

Differential Revision: https://reviews.llvm.org/D101858
The file was modifiedllvm/lib/Target/AMDGPU/SIInstrInfo.td
Commit 29342291d25b83da97e74d75004b177ba41114fc by congzhecao
[LoopInterchange] Fix legality for triangular loops

This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions of the loop body
will be different before and after interchange, resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
The file was addedllvm/test/Transforms/LoopInterchange/inner-indvar-depend-on-outer-indvar.ll
The file was modifiedllvm/lib/Transforms/Scalar/LoopInterchange.cpp
Commit eca3d68399246765bc6e8c94ffb4d5927b1add12 by Pushpinder.Singh
Revert "[AMDGPU][OpenMP] Emit textual IR for -emit-llvm -S"

This reverts commit 7f78e409d0280c62209e1a7dc8c6d1409acc9184.
The file was modifiedclang/test/Driver/amdgpu-openmp-toolchain.c
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp