SuccessChanges

Summary

  1. AST Matchers test: use arrays instead of vectors (details)
  2. [SemaOverload] Use iterator_range to iterate over VectorTypes (NFC). (details)
  3. [mlir] [VectorOps] Add missing comments to CreateMaskOp lowering (details)
  4. [flang] Add the conversions for types. (details)
  5. [NFC] Move test vscale-factor-out-constant.ll to AArch64 sub-directory. (details)
  6. [flang] Fixed crash on forward referenced `len` parameter (details)
  7. [WebAssembly] Lower llvm.debugtrap properly (details)
  8. [OPENMP50]Codegen for inscan reductions in worksharing directives. (details)
  9. AMDGPU/GlobalISel: Fix trying to use wave32 for gfx9 test (details)
  10. AMDGPU/GlobalISel: Fix making LDS FP atomics legal on SI/CI (details)
  11. AMDGPU: Fix using unencodable instructions in tests (details)
  12. [CUDA][HIP] Fix implicit HD function resolution (details)
  13. [OpenMP] Improve D2D memcpy to use more efficient driver API (details)
  14. [Fuchsia] Rely on linker switch rather than dead code ref for profile runtime (details)
  15. [InstCombine] avoid crashing on select-shuffle detection (details)
  16. AMDGPU: Set mayRaiseFPException (details)
  17. AMDGPU: Add test for fdiv nofpexcept preservation (details)
  18. [mlir] Add support for bf16 to StandardToLLVM conversion (details)
  19. AMDGPU: Select strict_fadd (details)
  20. AMDGPU: Select strict_fma (details)
  21. AMDGPU: Select strict_fmul (details)
  22. AMDGPU: Fix overriding global FP atomic feature predicates (details)
Commit a180d5409f218d933bec99bc28f7a9970fb293d4 by gribozavr
AST Matchers test: use arrays instead of vectors

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81180
The file was modifiedclang/unittests/ASTMatchers/ASTMatchersTest.h
Commit 714e84be4615d6e1195f2798c0c3c8c54017dd5f by flo
[SemaOverload] Use iterator_range to iterate over VectorTypes (NFC).

We can simplify the code a bit by using iterator_range instead of
plain iterators. Matrix type support here (added in 6f6e91d19337)
already uses an iterator_range.

Reviewers: rjmccall, arphaman, jfb, Bigcheese

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D81138
The file was modifiedclang/lib/Sema/SemaOverload.cpp
Commit c19fae507e311723b40a0cafa17d4e48b1664fb9 by ajcbik
[mlir] [VectorOps] Add missing comments to CreateMaskOp lowering

Summary: Add missing comment to CreateMask. Fixed typo in ConstantMask comment.

Reviewers: nicolasvasilache, rriddle, reidtatge, ftynse

Reviewed By: ftynse

Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul

Tags: #mlir

Differential Revision: https://reviews.llvm.org/D81125
The file was modifiedmlir/lib/Dialect/Vector/VectorTransforms.cpp
Commit baa12ddb6fa6840c1125f0aa11cd0e05fe82385d by eschweitz
[flang] Add the conversions for types.

Part of lowering is to convert the front-end types to their FIR dialect representations.  These conversions are done by here in the ConvertType module.

proactively update the code to conform better with LLVM coding conventions

Differential Revision: https://reviews.llvm.org/D81034
The file was modifiedflang/lib/Lower/CMakeLists.txt
The file was addedflang/include/flang/Lower/ConvertType.h
The file was addedflang/lib/Lower/ConvertType.cpp
Commit 42048ff97230dcf64a488a1eb5bbbf2c785b47f8 by huihuiz
[NFC] Move test vscale-factor-out-constant.ll to AArch64 sub-directory.

Vscale scalable vector is specific to AArch64 target.

Bring back 'uglygep' check.
The file was addedllvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-factor-out-constant.ll
The file was removedllvm/test/Transforms/LoopStrengthReduce/vscale-factor-out-constant.ll
Commit 1746c8ed2660c83895c79de94453f44f8e729a94 by psteinfeld
[flang] Fixed crash on forward referenced `len` parameter

Summary:
Using a forward reference to define a `len` parameter causes a crash.
The underlying cause was that a previously declared type had an
erroneous expression for its `LEN` param value.  When this expression
was referenced to evaluate a subsequent expression, bad things happened.
I fixed this by putting in code to detect this case.

Reviewers: tskeith, klausler, DavidTruby

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80593
The file was modifiedflang/lib/Evaluate/variable.cpp
The file was modifiedflang/test/Semantics/resolve91.f90
Commit a07c08f74fafcbf196cda4b20f0761538fca3dbe by tlively
[WebAssembly] Lower llvm.debugtrap properly

Summary:
Unlike normal traps, debug traps are allowed to return and can have
additional instructions in the same basic block. Without explicit
backend support for debug traps, they are lowered in ISel as normal
traps. Since normal traps are lowered in the WebAssembly backend to
the UNREACHABLE instruction, which is a terminator, using debug traps
could lead to invalid MBBs when there are additional instructions
after the trap. This patch fixes the issue by lowering debug traps to
a new version of the UNREACHABLE instruction, DEBUG_UNREACHABLE, that
is not a terminator.

An alternative approach would have been to make UNREACHABLE not a
terminator, but that breaks a large number of tests. In particular, it
would require removing the traps inserted after noreturn calls to
@llvm.wasm.throw because otherwise the terminator throw would be
followed by a non-terminator UNREACHABLE and we would be back to
having invalid MBBs. Overall the approach in this patch seems simpler.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81055
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
The file was addedllvm/test/CodeGen/WebAssembly/debugtrap.ll
Commit bd1c03d7b7c8bdd80b534cf2fa956c36a2f8249f by a.bataev
[OPENMP50]Codegen for inscan reductions in worksharing directives.

Summary:
Implemented codegen for reduction clauses with inscan modifiers in
worksharing constructs.

Emits the code for the directive with inscan reductions.
The code is the following:
```
size num_iters = <num_iters>;
<type> buffer[num_iters];
for (i: 0..<num_iters>) {
  <input phase>;
  buffer[i] = red;
}
for (int k = 0; k != ceil(log2(num_iters)); ++k)
for (size cnt = last_iter; cnt >= pow(2, k); --k)
  buffer[i] op= buffer[i-pow(2,k)];
for (0..<num_iters>) {
  red = InclusiveScan ? buffer[i] : buffer[i-1];
  <scan phase>;
}
```

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, arphaman, cfe-commits, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79948
The file was modifiedclang/lib/Serialization/ASTReader.cpp
The file was modifiedclang/tools/libclang/CIndex.cpp
The file was modifiedclang/lib/CodeGen/CodeGenFunction.h
The file was modifiedclang/test/OpenMP/scan_messages.cpp
The file was modifiedclang/lib/Serialization/ASTWriter.cpp
The file was modifiedclang/include/clang/AST/OpenMPClause.h
The file was modifiedclang/lib/CodeGen/CGStmt.cpp
The file was modifiedclang/include/clang/AST/RecursiveASTVisitor.h
The file was modifiedclang/lib/CodeGen/CGStmtOpenMP.cpp
The file was addedclang/test/OpenMP/for_scan_codegen.cpp
The file was modifiedclang/lib/Sema/SemaOpenMP.cpp
The file was modifiedclang/lib/AST/StmtProfile.cpp
The file was modifiedclang/lib/AST/OpenMPClause.cpp
Commit 16acc12e1d6b549755674b63fa9d8cbfe217d316 by Matthew.Arsenault
AMDGPU/GlobalISel: Fix trying to use wave32 for gfx9 test
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-ptr-add.mir
Commit fe0d5121fa97cd0bbf6d310a2536cc36b435cf5b by Matthew.Arsenault
AMDGPU/GlobalISel: Fix making LDS FP atomics legal on SI/CI
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-atomicrmw-fadd-local.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomicrmw-fadd.mir
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
Commit 54a8a8d5095fad1993ac3afaf04eb23f3ae06dcb by Matthew.Arsenault
AMDGPU: Fix using unencodable instructions in tests

There are a number of MIR tests using instructions on subtargets where
they don't really exist. These are some of the easy cases that don't
require splitting up test functions.
The file was modifiedllvm/test/CodeGen/AMDGPU/fold-sgpr-multi-imm.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/remove-short-exec-branches-gpr-idx-mode.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/waitcnt-preexisting.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/i1_copy_phi_with_phi_incoming_value.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/wqm.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/vmem-to-salu-hazard.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/s_add_co_pseudo_lowering.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/memory_clause.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/buffer-intrinsics-mmo-offsets.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/shrink-carry.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/smrd-fold-offset.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/scalar-store-cache-flush.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/waitcnt-loop-single-basic-block.mir
Commit 263390d4f5f23967a31af09eb6e0c12e633d6104 by Yaxun.Liu
[CUDA][HIP] Fix implicit HD function resolution

recommit e03394c6a6ff with fix

When implicit HD function calls a function in device compilation,
if one candidate is an implicit HD function, current resolution rule is:

D wins over HD and H
HD and H are equal

this caused regression when there is an otherwise worse D candidate

This patch changes that to

D, HD and H are all equal

The rationale is that we already know for host compilation there is already
a valid candidate in HD and H candidates that will not cause error. Allowing
HD and H gives us a fall back candidate that will not cause error. If D wins,
that means D has to be a better match otherwise, therefore D should also
be a valid candidate that will not cause error. In this way, we can guarantee
no regression.

Differential Revision: https://reviews.llvm.org/D80450
The file was modifiedclang/lib/Sema/SemaOverload.cpp
The file was modifiedclang/lib/Sema/SemaCUDA.cpp
The file was modifiedclang/test/SemaCUDA/function-overload.cu
The file was modifiedclang/include/clang/Sema/Sema.h
Commit a014fbbc219fc8e1dbce382fd6f9280c3b720219 by tianshilei1992
[OpenMP] Improve D2D memcpy to use more efficient driver API

Summary:
In current implementation, D2D memcpy is first to copy data back to host and then
copy from host to device. This is very efficient if the device supports D2D
memcpy, like CUDA.

In this patch, D2D memcpy will first try to use native supported driver API. If
it fails, fall back to original way. It is worth noting that D2D memcpy in this
scenerio contains two ideas:
- Same devices: this is the D2D memcpy in the CUDA context.
- Different devices: this is the PeerToPeer memcpy in the CUDA context.
My implementation merges this two parts. It chooses the best API according to
the source device and destination device.

Reviewers: jdoerfert, AndreyChurbanov, grokos

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D80649
The file was modifiedopenmp/libomptarget/src/rtl.h
The file was modifiedopenmp/libomptarget/src/device.cpp
The file was modifiedopenmp/libomptarget/src/rtl.cpp
The file was addedopenmp/libomptarget/test/offloading/d2d_memcpy.c
The file was modifiedopenmp/libomptarget/plugins/exports
The file was modifiedopenmp/libomptarget/src/api.cpp
The file was modifiedopenmp/libomptarget/include/omptargetplugin.h
The file was modifiedopenmp/libomptarget/src/device.h
The file was modifiedopenmp/libomptarget/plugins/cuda/src/rtl.cpp
Commit d51054217403b47f452619e11318bd214749a845 by phosek
[Fuchsia] Rely on linker switch rather than dead code ref for profile runtime

Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.

Patch By: mcgrathr

Differential Revision: https://reviews.llvm.org/D79835
The file was modifiedclang/lib/Driver/ToolChains/Fuchsia.h
The file was modifiedclang/test/Driver/fuchsia.c
The file was modifiedclang/lib/Driver/ToolChains/Fuchsia.cpp
The file was modifiedllvm/test/Instrumentation/InstrProfiling/linkage.ll
The file was modifiedllvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
Commit 192cb718361dbd7be082bc0893f43bbc9782288f by spatel
[InstCombine] avoid crashing on select-shuffle detection

As mentioned in the post-commit comments of D81013 -
the mask check API has to assume the shuffle is
not length-changing, but we have not ruled that out
in this code. Use the ShuffleVectorInst call instead.
The file was modifiedllvm/test/Transforms/InstCombine/select-select.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
Commit d259668731f4a7d6f477cefff102fe1e0b86f461 by Matthew.Arsenault
AMDGPU: Set mayRaiseFPException

This may be missing a few overrides to set it off still in some
special cases. Since the flags set during selection should now be
reliably preserved, this should not change codegen for non-strictfp
functions.
The file was modifiedllvm/lib/Target/AMDGPU/VOP3Instructions.td
The file was modifiedllvm/lib/Target/AMDGPU/VOP2Instructions.td
The file was modifiedllvm/lib/Target/AMDGPU/VOPInstructions.td
The file was modifiedllvm/lib/Target/AMDGPU/VOP1Instructions.td
Commit b71f574e7fab8c01867d2c84296c9e3657c22409 by Matthew.Arsenault
AMDGPU: Add test for fdiv nofpexcept preservation

This logically belongs with 89d48ccabe6a950369b2bd922b1d8e987b856ac7,
but this order was needed to avoid regressions before adding
mayRaiseFPExceptions to relevant instructions.
The file was addedllvm/test/CodeGen/AMDGPU/fdiv-nofpexcept.ll
Commit 5c990d6994559225466cb256146f6440431b229e by diego.caballero
[mlir] Add support for bf16 to StandardToLLVM conversion

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D81127
The file was modifiedmlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
The file was modifiedmlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
The file was modifiedmlir/test/Conversion/StandardToLLVM/convert-to-llvmir.mlir
Commit ae26c064ce98a0886bbb6fa3d4bbe123dcec8ffd by Matthew.Arsenault
AMDGPU: Select strict_fadd
The file was modifiedllvm/lib/Target/AMDGPU/VOP3Instructions.td
The file was addedllvm/test/CodeGen/AMDGPU/strict_fadd.f16.ll
The file was modifiedllvm/lib/Target/AMDGPU/VOP3PInstructions.td
The file was addedllvm/test/CodeGen/AMDGPU/strict_fadd.f32.ll
The file was addedllvm/test/CodeGen/AMDGPU/strict_fadd.f64.ll
The file was modifiedllvm/lib/Target/AMDGPU/VOP2Instructions.td
Commit 483d4daa5e9b22b553a45eb91fe735fe81f6f067 by Matthew.Arsenault
AMDGPU: Select strict_fma

Like with strict_fadd, the legalization is scalarizing the v4f16 when
it should split.
The file was addedllvm/test/CodeGen/AMDGPU/strict_fma.f64.ll
The file was addedllvm/test/CodeGen/AMDGPU/strict_fma.f16.ll
The file was addedllvm/test/CodeGen/AMDGPU/strict_fma.f32.ll
The file was modifiedllvm/lib/Target/AMDGPU/VOP3Instructions.td
The file was modifiedllvm/lib/Target/AMDGPU/VOP3PInstructions.td
Commit 651c36b5086db9038591bd1ac387dcb492d011f8 by Matthew.Arsenault
AMDGPU: Select strict_fmul
The file was addedllvm/test/CodeGen/AMDGPU/strict_fmul.f32.ll
The file was addedllvm/test/CodeGen/AMDGPU/strict_fmul.f16.ll
The file was modifiedllvm/lib/Target/AMDGPU/VOP2Instructions.td
The file was addedllvm/test/CodeGen/AMDGPU/strict_fmul.f64.ll
The file was modifiedllvm/lib/Target/AMDGPU/VOP3PInstructions.td
Commit 1657f0ebc2b4d3ec5b9e717119238e9198ad203c by Matthew.Arsenault
AMDGPU: Fix overriding global FP atomic feature predicates

Global TableGen let override blocks are pretty dangerous and override
any local special cases. In this case, the broader HasFlatGlobalInsts
was overriding the more specific predicate for
FeatureAtomicFaddInsts. Make sure HasFlatGlobalInsts is implied by
FeatureAtomicFaddInsts, and make sure the right predicate is used.

One issue with independently setting the subtarget features on
incompatible targets is all of the encoding families do not define all
opcodes. This will hit an assert on gfx10 for example, since we set
the encoding independently based on the generation and not based on a
feature.
The file was modifiedllvm/lib/Target/AMDGPU/FLATInstructions.td
The file was modifiedllvm/test/CodeGen/AMDGPU/global-atomics-fp.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPU.td
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.fadd.ll