SuccessChanges

Summary

  1. [Zorg][OpenMP] Add CUDA offloading worker. (details)
Commit e4aa8a2773fe76c427a91229f021ab067eafc8e7 by llvm-zorg
[Zorg][OpenMP] Add CUDA offloading worker.

This worker tests OpenMP offloading for the x86_64 and NVIDIA GPU. In
addition to check-openmp, it runs the SOLLVE Validation & Verification Suite
via the LLVM test-suite External builder. The builder is configured to
only warn if the SOLLVE suite fails, as it also tests features that
have not been implemented in Clang yet.

CUDA is intentionally not installed in a default location (/opt/cuda) to
resemble setups often found in computing clusters with multiple versions
of CUDA to choose from.

Reviewed By: gkistanova

Differential Revision: https://reviews.llvm.org/D101268
The file was modifiedzorg/buildbot/builders/OpenMPBuilder.py (diff)
The file was modifiedbuildbot/osuosl/master/config/workers.py (diff)
The file was modifiedbuildbot/osuosl/master/config/builders.py (diff)

Summary

  1. [OpenCL] Allow use of double type without extension pragma. (details)
  2. [AMDGPU] Move code sinking before structurizer (details)
  3. [SLP] restrict matching of load combine candidates (details)
  4. [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again (details)
  5. CodeGen: Fix null dereference before null check (details)
  6. [X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp (details)
  7. [X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI. (details)
  8. [TableGen] Make the NUL character invalid in .td files (details)
  9. [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost() (details)
  10. [VPlan] Register recipe for instr if the simplified value is recipe. (details)
  11. [OpenMP] Fix hidden helper + affinity (details)
Commit 13ea238b1e1db96ef5fd409e869d9a8ebeef1332 by anastasia.stulova
[OpenCL] Allow use of double type without extension pragma.

Simply use of extensions by allowing the use of supported
double types without the pragma. Since earlier standards
instructed that the pragma is used explicitly a new warning
is introduced in pedantic mode to indicate that use of
type without extension pragma enable can be non-portable.

This patch does not break backward compatibility since the
extension pragma is still supported and it makes the behavior
of the compiler less strict by accepting code without extra
pragma statements.

Differential Revision: https://reviews.llvm.org/D100980
The file was modifiedclang/lib/Sema/Sema.cpp
The file was modifiedclang/lib/Sema/SemaType.cpp
The file was modifiedclang/test/Misc/warning-flags.c
The file was modifiedclang/test/SemaOpenCL/extensions.cl
The file was modifiedclang/include/clang/Basic/DiagnosticSemaKinds.td
Commit 09fe84abb4ee71f707c3ec8e960a42d8292f6211 by Piotr Sobczak
[AMDGPU] Move code sinking before structurizer

Moving code sinking pass before structurizer creates more sinking
opportunities.

The extra flow edges introduced by the structurizer can have adverse
effects on sinking, because the sinking pass prefers moving instructions
to blocks with unique predecessors and the structurizer destroys that
property in some cases.

A notable example is moving high-latency image instructions across kills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D101115
The file was modifiedllvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/multilevel-break.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llc-pipeline.ll
The file was addedllvm/test/CodeGen/AMDGPU/sink-image-sample.ll
Commit 49950cb1f6f699cbb9d8f141c0c043d4795c3417 by spatel
[SLP] restrict matching of load combine candidates

The test example from https://llvm.org/PR50256 (and reduced here)
shows that we can match a load combine candidate even when there
are no "or" instructions. We can avoid that by confirming that we
do see an "or". This doesn't apply when matching an or-reduction
because that match begins from the operands of the reduction.

Differential Revision: https://reviews.llvm.org/D102074
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/AArch64/widen.ll
Commit c02476f3158f2908ef0a6f628210b5380bd33695 by lebedev.ri
[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again

Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.

Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.

Reviewed By: RKSimon, ABataev

Differential Revision: https://reviews.llvm.org/D100684
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/load_store.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit bce3cca4889a9e4ab7b9652b0c44bb49ca8f3bad by Matthew.Arsenault
CodeGen: Fix null dereference before null check
The file was modifiedllvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp
Commit 9acc03ad92c66b856f67bf11ff4460c7da45f413 by llvm-dev
[X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp

foldShuffleOfHorizOp only handled basic shufps(hop(x,y),hop(z,w)) folds - by moving this to canonicalizeShuffleMaskWithHorizOp we can work with more general/combined v4x32 shuffles masks, float/integer domains and support shuffle-of-packs as well.

The next step will be to support 256/512-bit vector cases.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/horizontal-sum.ll
The file was modifiedllvm/test/CodeGen/X86/horizontal-shuffle.ll
Commit 759b97e55a4bd7b0d89493686f4a769718e385ee by llvm-dev
[X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI.

Noticed while looking at D101944
The file was modifiedllvm/lib/Target/X86/X86ISelDAGToDAG.cpp
Commit 6ca2bdb03c0fdb6736ed5c6a30d7bec6b557d1a0 by Paul C. Anagnostopoulos
[TableGen] Make the NUL character invalid in .td files

Differential Revision: https://reviews.llvm.org/D101923
The file was modifiedllvm/lib/TableGen/TGLexer.cpp
The file was addedllvm/test/TableGen/nul-char.td
Commit 69ed93a4355123a45c1d7216aea7cd53d07a361b by lebedev.ri
[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()

Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll
Commit faebc6bf108eccdfd75917636c64137f73a7bda7 by flo
[VPlan] Register recipe for instr if the simplified value is recipe.

If the simplified VPValue is a recipe, we need to register it for Instr,
in case it needs to be recorded. The way this is handled in general may
change soon, following some post-commit comments.

This fixes PR50298.
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/test/Transforms/LoopVectorize/reduction.ll
Commit c765d140fe45906fb503d843acccf5838e775245 by jonathan.l.peyton
[OpenMP] Fix hidden helper + affinity

When KMP_AFFINITY is set, each worker thread's gtid value is used as an
index into the place list to determine the thread's placement. With hidden
helpers enabled, this gtid value is shifted down leading to unexpected
shifted thread placement. This patch restores the previous behavior by
adjusting the mask index to take the number of hidden helper threads
into account.

Hidden helper threads are given the full initial mask and do not
participate in any of the other affinity mechanisms (place partitioning,
balanced affinity). Their affinity is only printed for debug builds.

Differential Revision: https://reviews.llvm.org/D101882
The file was modifiedopenmp/runtime/src/kmp.h
The file was modifiedopenmp/runtime/src/kmp_affinity.cpp
The file was modifiedopenmp/runtime/src/kmp_runtime.cpp