Changes

Summary

  1. [mlir][memref] Improve canonicalization of memref.clone (details)
  2. [SLP]Improve handling of compensate external uses cost. (details)
  3. AMDGPU/GlobalISel: Add subtarget to a test (details)
  4. [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass (details)
  5. [X86] Pre-commit test for D90901 (details)
  6. [X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z) (details)
  7. [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation (details)
  8. [X86] Lower calls with clang.arc.attachedcall bundle (details)
  9. [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops (details)
  10. [mlir] Add support for fusion into TiledLoopOp. (details)
  11. [PowerPC] Add stack guard tests (details)
  12. Move a definition into cpp from header in advance of other changes [nfc] (details)
  13. [mlir] NFC: Expose tiled_loop->scf pattern. (details)
Commit 90e55dfcf4bec092ca63ba540e833ed42ee169bf by herhut
[mlir][memref] Improve canonicalization of memref.clone

The previous implementation did not handle casting behavior properly and
did not consider aliases.

Differential Revision: https://reviews.llvm.org/D102785
The file was modifiedmlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
The file was modifiedmlir/lib/Transforms/BufferDeallocation.cpp
The file was modifiedmlir/test/Dialect/MemRef/canonicalize.mlir
Commit 8dab25954b0acb53731c4aa73e9a7f4f98263030 by a.bataev
[SLP]Improve handling of compensate external uses cost.

External insertelement users can be represented as a result of shuffle
of the vectorized element and noconsecutive insertlements too. Added
support for handling non-consecutive insertelements.

Differential Revision: https://reviews.llvm.org/D101555
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit 7521fcd2698740cbb81495de7dfe1a3a4b39b21b by konndennsa
AMDGPU/GlobalISel: Add subtarget to a test

SelectionDAG forces us to have a weird ABI for 16-bit values without
legal 16-bit operations, but currently GlobalISel bypasses this and
sometimes ends up using the gfx8+ ABI in some contexts. Make sure
we're testing the normal ABI to avoid a test change in a future patch.
The file was modifiedllvm/lib/CodeGen/RegAllocBase.cpp
Commit cea7a3fe3d1fc91a00cb54cee3ac6f361343417e by konndennsa
[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass

This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
The file was modifiedllvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
The file was modifiedllvm/lib/Passes/PassBuilder.cpp
The file was modifiedllvm/include/llvm/Transforms/Scalar/LoopUnrollAndJamPass.h
The file was modifiedllvm/lib/Passes/PassRegistry.def
The file was modifiedllvm/test/Transforms/LoopUnrollAndJam/innerloop.ll
Commit 35e5c3310fb0d974b6a151c775ac46a3c3a0e151 by jim
[X86] Pre-commit test for D90901

Differential Revision: https://reviews.llvm.org/D102621
The file was addedllvm/test/CodeGen/X86/fma-signed-zero.ll
Commit 445680593889199667d60207e302bc870f650fa5 by jim
[X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z)

Check if it has no signed zeros flag (nsz) in getNegatedExpression for x86.
This patch fixed miscompilation: https://alive2.llvm.org/ce/z/XxwBAJ

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D90901
The file was modifiedllvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
The file was modifiedllvm/test/CodeGen/X86/fma-signed-zero.ll
The file was modifiedllvm/test/CodeGen/X86/fma_patterns.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/fma-fneg-combine.ll
The file was modifiedllvm/test/CodeGen/X86/fma_patterns_wide.ll
Commit d54712ab4deb5b83ef58db73ce18ed466201f4e1 by jonathanchesterfield
[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation

[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation

There are a lot of different ways we might implement the devicertl local alloc
and free functions. Via host, local buffers (stack or arena), specialising per
kernel etc. It is not yet clear what the right design is. This change makes the
alloc and free functions weak, so one can override them from local tests while
comparing options.

Not strictly necessary, as a comparable patch can be applied locally each time,
but would be convenient for out of tree dev. Plan would be to drop the weak
attribute at the same time as introducing a working allocator to trunk.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102499
The file was modifiedopenmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
Commit c2d44bd2309c1e232d900fd6979aba320c913357 by flo
[X86] Lower calls with clang.arc.attachedcall bundle

This patch adds support for lowering function calls with the
`clang.arc.attachedcall` bundle. The goal is to expand such calls to the
following sequence of instructions:

    callq   @fn
    movq  %rax, %rdi
    callq   _objc_retainAutoreleasedReturnValue / _objc_unsafeClaimAutoreleasedReturnValue

This sequence of instructions triggers Objective-C runtime optimizations,
hence we want to ensure no instructions get moved in between them.
This patch achieves that by adding a new CALL_RVMARKER ISD node,
which gets turned into the CALL64_RVMARKER pseudo, which eventually gets
expanded into the sequence mentioned above.

The ObjC runtime function to call is determined by the
argument in the bundle, which is passed through as a
target constant to the pseudo.

@ahatanak is working on using this attribute in the front- & middle-end.

Together with the front- & middle-end changes, this should address
PR31925 for X86.

This is the X86 version of 46bc40e50246c1902a1ca7916c8286cb837643ee,
which added similar support for AArch64.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D94597
The file was modifiedllvm/lib/Target/X86/X86InstrCompiler.td
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.td
The file was modifiedllvm/lib/Target/X86/X86ExpandPseudo.cpp
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.h
The file was addedllvm/test/CodeGen/X86/expand-call-rvmarker.mir
The file was modifiedllvm/test/CodeGen/X86/call-rv-marker.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/lib/Target/X86/X86InstrControl.td
Commit eaaf7a6a09da905cc314201f93e2be11773726a0 by uday
[MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops

Add conversion of warp synchronous matrix-multiply
accumulate GPU ops
Add conversion of warp synchronous matrix-multiply accumulate GPU ops to
NVVM ops. The following conversions are added :-
  1.) subgroup_mma_load_matrix -> wmma.m16n16k16.load.[a,b,c]..row.stride
  2.) subgroup_mma_store_matrix -> wmma.m16n16k16.store.d.[f16,f32].row.stride
  3.) subgroup_mma_compute -> wmma.m16n16k16.mma.row.row.[f16,f32].[f16,f32]

Reviewed By: bondhugula, ftynse

Differential Revision: https://reviews.llvm.org/D95331
The file was modifiedmlir/include/mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h
The file was addedmlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
The file was modifiedmlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
The file was addedmlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
The file was modifiedmlir/lib/Conversion/GPUToNVVM/CMakeLists.txt
Commit 9ecc8178d72097c8f9e31ea7c50085748d187aff by pifon
[mlir] Add support for fusion into TiledLoopOp.

Differential Revision: https://reviews.llvm.org/D102722
The file was modifiedmlir/tools/mlir-opt/mlir-opt.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
The file was modifiedmlir/test/lib/Dialect/Linalg/TestLinalgFusionTransforms.cpp
The file was modifiedmlir/test/Dialect/Linalg/fusion-tensor-pattern.mlir
Commit f8bb0d97cb99adf7857854a18f96f8d499fff07d by Jinsong Ji
[PowerPC] Add stack guard tests

Copied from X86 and test powerpc triple.
Preparing for AIX stack guard tests.
The file was addedllvm/test/CodeGen/PowerPC/stack-guard-oob.ll
Commit cc5f6ae4b4a2f812cc8d3964532c60a337fa79e9 by listmail
Move a definition into cpp from header in advance of other changes [nfc]
The file was modifiedllvm/include/llvm/Analysis/LoopUnrollAnalyzer.h
The file was modifiedllvm/lib/Analysis/LoopUnrollAnalyzer.cpp
Commit 335fa1802854d651b89e4e79916a10ca87795ff2 by pifon
[mlir] NFC: Expose tiled_loop->scf pattern.

Differential Revision: https://reviews.llvm.org/D102921
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Loops.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h