Commit
123e8dfcf86a74eb7ba08f33681df581d1be9dbd
by ajcbik[mlir][sparse] add support for std unary operations
Adds zero-preserving unary operators from std. Also adds xor. Performs minor refactoring to remove "zero" node, and pushed the irregular logic for negi (not support in std) into one place.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105928
|
 | mlir/test/Dialect/SparseTensor/sparse_int_ops.mlir |
 | mlir/test/Dialect/SparseTensor/sparse_fp_ops.mlir |
 | mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h |
 | mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp |
 | mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp |
 | mlir/unittests/Dialect/SparseTensor/MergerTest.cpp |
Commit
f2b5e438aa3620cd60d115cad8dcb39cc417c8a8
by ravishankarm[mlir][Tensor] Implement `reifyReturnTypeShapesPerResultDim` for `tensor.insert_slice`.
Differential Revision: https://reviews.llvm.org/D105852
|
 | mlir/include/mlir/Dialect/Tensor/IR/Tensor.h |
 | mlir/test/Dialect/Tensor/resolve-shaped-type-result-dims.mlir |
 | utils/bazel/llvm-project-overlay/mlir/BUILD.bazel |
 | mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td |
 | mlir/lib/Dialect/Tensor/IR/CMakeLists.txt |
 | mlir/lib/Dialect/Tensor/IR/TensorOps.cpp |
Commit
18c19414eb70578d4c487d6f4b0f438aead71d6a
by wei.huang[PowerPC] Add PowerPC compare and multiply related builtins and instrinsics for XL compatibility
This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtins and instrisics for compare and multiply related operations.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D102875
|
 | llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-multiply-64bit-only.ll |
 | llvm/lib/Target/PowerPC/PPCInstr64Bit.td |
 | llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-compare-64bit-only.ll |
 | clang/test/CodeGen/builtins-ppc-xlcompat-pwr9-error.c |
 | clang/lib/Basic/Targets/PPC.cpp |
 | clang/lib/Sema/SemaChecking.cpp |
 | clang/test/CodeGen/builtins-ppc-xlcompat-multiply.c |
 | clang/test/CodeGen/builtins-ppc-xlcompat-pwr9.c |
 | clang/include/clang/Basic/BuiltinsPPC.def |
 | llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-multiply.ll |
 | llvm/include/llvm/IR/IntrinsicsPowerPC.td |
 | llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-compare.ll |
 | clang/test/CodeGen/builtins-ppc-xlcompat-pwr9-64bit.c |
 | clang/test/CodeGen/builtins-ppc-xlcompat-multiply-64bit-only.c |
 | llvm/lib/Target/PowerPC/PPCInstrInfo.td |
Commit
9955c652eafdcb5f1d16ee3db857f03ee7e5cfbc
by gcmn[NFC][MLIR][std] Clean up ArithmeticCastOps
The documentation on these was out of sync with the implementation. Also the declaration of inputs was repeated when it is already part of the ArithmeticCastOp definition.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D105934
|
 | mlir/include/mlir/Dialect/StandardOps/IR/Ops.td |
Commit
5df99954392e3a4448e4ff43d4cf644bc06bfa92
by Vitaly Buka[NFC][sanitizer] Rename some MemoryMapper members
Part of D105778
|
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary64.h |
Commit
afa3fedcda98db4d47694ed596270a5396074224
by Vitaly Buka[NFC][sanitizer] Exctract DrainHalfMax
Part of D105778
|
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_local_cache.h |
Commit
bb8c7a980fe487eb322d38641db9145a6b6cb1d4
by efriedma[ScalarEvolution] Make isKnownNonZero handle more cases.
Using an unsigned range instead of signed ranges is a bit more precise.
Differential Revision: https://reviews.llvm.org/D105941
|
 | llvm/test/Analysis/ScalarEvolution/trip-count9.ll |
 | llvm/lib/Analysis/ScalarEvolution.cpp |
Commit
eebe841a47cbbd55bdcc32da943c92d18f88a5b8
by Matthew.ArsenaultRegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point.
Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run.
One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work.
Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.
|
 | llvm/lib/CodeGen/TargetPassConfig.cpp |
 | llvm/include/llvm/CodeGen/RegAllocCommon.h |
 | llvm/test/CodeGen/AMDGPU/llc-pipeline.ll |
 | llvm/test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll |
 | llvm/test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir |
 | llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll |
 | llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |
 | llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll |
 | llvm/lib/Target/AMDGPU/SIRegisterInfo.h |
 | llvm/test/CodeGen/AMDGPU/indirect-call.ll |
 | llvm/test/CodeGen/AMDGPU/agpr-csr.ll |
 | llvm/test/CodeGen/AMDGPU/spill-empty-live-interval.mir |
 | llvm/include/llvm/CodeGen/RegAllocRegistry.h |
 | llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir |
 | llvm/lib/Target/AMDGPU/SIFrameLowering.cpp |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll |
 | llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll |
 | llvm/include/llvm/CodeGen/Passes.h |
 | llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll |
 | llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll |
 | llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx90a.mir |
 | llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp |
 | llvm/test/CodeGen/AMDGPU/pei-build-spill.mir |
 | llvm/lib/CodeGen/RegAllocBasic.cpp |
 | llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp |
 | llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx908.mir |
 | llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp |
 | llvm/lib/CodeGen/RegAllocBase.h |
 | llvm/lib/CodeGen/LiveIntervals.cpp |
 | llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll |
 | llvm/test/CodeGen/AMDGPU/remat-vop.mir |
 | llvm/test/CodeGen/AMDGPU/sibling-call.ll |
 | llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir |
 | llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll |
 | llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll |
 | llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll |
 | llvm/lib/CodeGen/RegAllocGreedy.cpp |
 | llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll |
 | llvm/lib/CodeGen/RegAllocBase.cpp |
 | llvm/lib/CodeGen/RegAllocFast.cpp |
Commit
99aebb62fb4f2a39c7f03579facf3a1e176b245d
by Vitaly Buka[NFC][sanitizer] Don't store region_base_ in MemoryMapper
Part of D105778
|
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary64.h |
 | compiler-rt/lib/sanitizer_common/tests/sanitizer_allocator_test.cpp |
Commit
0024ec59a0f3deb206a21567ac2ebe0fc097ea9d
by aeubanks[NewPM][SimpleLoopUnswitch] Add option to not trivially unswitch
To help with debugging non-trivial unswitching issues.
Don't care about the legacy pass, nobody is using it.
If a pass's string params are empty (e.g. "simple-loop-unswitch"), don't default to the empty constructor for the pass params. We should still let the parser take care of it in case the parser has its own defaults.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D105933
|
 | llvm/lib/Passes/PassRegistry.def |
 | llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp |
 | llvm/test/Other/print-passes.ll |
 | llvm/lib/Passes/PassBuilder.cpp |
 | llvm/test/Transforms/SimpleLoopUnswitch/options.ll |
 | llvm/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h |
Commit
832ba20710ee09b00161ea72cf80c9af800fda63
by Vitaly Bukasanitizer_common: optimize memory drain
Currently we allocate MemoryMapper per size class. MemoryMapper mmap's and munmap's internal buffer. This results in 50 mmap/munmap calls under the global allocator mutex. Reuse MemoryMapper and the buffer for all size classes. This radically reduces number of mmap/munmap calls. Smaller size classes tend to have more objects allocated, so it's highly likely that the buffer allocated for the first size class will be enough for all subsequent size classes.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D105778
|
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary64.h |
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_local_cache.h |
Commit
3191ac27e396dbd141243b8ca6cf5660c10ddf5c
by Matthew.ArsenaultAMDGPU: Try to fix test failure with EXPENSIVE_CHECKS
The machine verifier is enabled by default for EXPENSIVE_CHECKS, so the pass runs of it would pollute the output here.
|
 | llvm/test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll |
Commit
7140382b17df7c33145cc6e9a2df7e84a2259444
by Vitaly Buka[NFC][sanitizer] Move MemoryMapper template parameter
|
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary64.h |
 | compiler-rt/lib/sanitizer_common/tests/sanitizer_allocator_test.cpp |
Commit
8725b382b0a5ea375252d966bafbace62a21e93b
by Vitaly Buka[NFC][sanitizer] Simplify MapPackedCounterArrayBuffer
|
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary64.h |
 | compiler-rt/lib/sanitizer_common/tests/sanitizer_allocator_test.cpp |
Commit
5bd7cc4f42488129adb135539c64bb3933d5da4c
by Jessica Paquette[AArch64][GlobalISel] Mark v2s64 -> v2p0 G_INTTOPTR as legal
Allow
``` %x:_<2 x p0> = G_INTTOPTR %y:_<2 x s64> ```
This shows up when building clang for AArch64 with GlobalISel.
Also show that we can select it.
This should match SDAG's behaviour: https://godbolt.org/z/33oqYoaYv
Differential Revision: https://reviews.llvm.org/D105944
|
 | llvm/test/CodeGen/AArch64/GlobalISel/legalize-inttoptr.mir |
 | llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp |
 | llvm/test/CodeGen/AArch64/GlobalISel/select-int-ptr-casts.mir |
Commit
ed430023e864c3b3ff7f47d5740e5380828c26f6
by Vitaly BukaRevert "[NFC][sanitizer] Simplify MapPackedCounterArrayBuffer"
Does not compile.
This reverts commit 8725b382b0a5ea375252d966bafbace62a21e93b.
|
 | compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary64.h |
 | compiler-rt/lib/sanitizer_common/tests/sanitizer_allocator_test.cpp |