Commit
1a406cd5f2e2403b325f6d914b692459a2ab3b9b
by joker.ephRemove unused llvm/Support/Parallel.h from MLIR (NFC)
This header aren't needed anymore: MLIR is using a thread pool injected in the context instead of a global one.
|
 | mlir/lib/Pass/PassCrashRecovery.cpp |
 | mlir/lib/Transforms/Inliner.cpp |
 | mlir/lib/IR/Verifier.cpp |
 | mlir/lib/Pass/Pass.cpp |
Commit
81f8ad1769665a569a235b749e0e9e69ce7dc65e
by erasmus[flang] Make 'this_image()' an intrinsic function
Added 'this_image()' to the list of functions that are evaluated as intrinsic. Added IsCoarray functions to determine if an expression is a coarray (corank > 1).
Added save attribute to coarray variables in test file, this_image.f90.
reviewers: klausler, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D108059
|
 | flang/lib/Evaluate/intrinsics.cpp |
 | flang/lib/Evaluate/tools.cpp |
 | flang/test/Semantics/this_image.f90 |
 | flang/docs/Intrinsics.md |
 | flang/include/flang/Evaluate/tools.h |
 | flang/test/Semantics/call10.f90 |
Commit
99dfe90695a811f74fb7503703ffd52bd214dd2e
by Matthew.ArsenaultAttributor: Fix typos
|
 | llvm/include/llvm/Transforms/IPO/Attributor.h |
Commit
f12174204c639f9780f17cf7b8e910be703b6b8c
by Matthew.ArsenaultAMDGPU: Rename attributor class for uniform-work-group-size
This isn't really an AMDGPU specific attribute and could be moved to generic code. It's also important to include the word uniform in the name.
|
 | llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp |
Commit
fdd9761dd1a971c9f4d6776b511ea54d7765bfeb
by Matthew.ArsenaultAttributor: Fix crash on undef in !callees
|
 | llvm/test/Transforms/Attributor/callgraph.ll |
 | llvm/lib/Transforms/IPO/AttributorAttributes.cpp |
Commit
88146230e1b21aa042da481e5fd702fab82408fc
by Matthew.ArsenaultSeparateConstOffsetFromGEP: Fix stack overflow in unreachable code
ConstantOffsetExtractor::Find was infinitely recursing on the add referencing itself.
|
 | llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp |
 | llvm/test/Transforms/SeparateConstOffsetFromGEP/crash-in-unreachable-code.ll |
Commit
9adc0114bfeb704ca62d8c369fa52d0530179274
by springerm[mlir][linalg] PadTensorOp vectorization: Avoid redundant FillOps
Do not generate FillOps when these would be entirely overwritten.
Differential Revision: https://reviews.llvm.org/D109741
|
 | mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp |
Commit
aaf62958f1ae3c17ed1f4551bac37c2e202ffd5e
by i[CMake] Delete obsoleted COMPILER_RT_TEST_TARGET_TRIPLE
The last user has been removed from llvm-zorg for Android.
|
 | compiler-rt/cmake/Modules/CompilerRTUtils.cmake |
Commit
4a36e96c3fc2a9128097bfc4f907ccebc5dc66af
by Matthew.ArsenaultRegAllocGreedy: Account for reserved registers in num regs heuristic
This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading.
There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly.
The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.
|
 | llvm/test/CodeGen/X86/i128-mul.ll |
 | llvm/test/CodeGen/X86/pr32284.ll |
 | llvm/test/CodeGen/X86/mul-i1024.ll |
 | llvm/test/CodeGen/X86/mul-constant-result.ll |
 | llvm/test/CodeGen/X86/fp128-cast.ll |
 | llvm/test/CodeGen/X86/nosse-vector.ll |
 | llvm/test/CodeGen/X86/pr31088.ll |
 | llvm/test/CodeGen/X86/pr32329.ll |
 | llvm/test/CodeGen/X86/umax.ll |
 | llvm/test/CodeGen/X86/widen_cast-4.ll |
 | llvm/test/CodeGen/Hexagon/reg-scavengebug-2.ll |
 | llvm/test/CodeGen/X86/funnel-shift-rot.ll |
 | llvm/test/CodeGen/AMDGPU/half.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll |
 | llvm/test/CodeGen/X86/bitreverse.ll |
 | llvm/test/CodeGen/X86/shrink_vmul.ll |
 | llvm/test/CodeGen/X86/legalize-shl-vec.ll |
 | llvm/test/CodeGen/X86/umul-with-overflow.ll |
 | llvm/test/CodeGen/X86/2008-04-16-ReMatBug.ll |
 | llvm/test/CodeGen/X86/mul128.ll |
 | llvm/test/CodeGen/X86/mul-i256.ll |
 | llvm/test/CodeGen/X86/sdiv_fix_sat.ll |
 | llvm/test/CodeGen/AMDGPU/shl.ll |
 | llvm/test/CodeGen/X86/funnel-shift.ll |
 | llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll |
 | llvm/test/CodeGen/X86/neg-abs.ll |
 | llvm/test/CodeGen/X86/subvector-broadcast.ll |
 | llvm/test/CodeGen/ARM/umulo-128-legalisation-lowering.ll |
 | llvm/test/CodeGen/X86/merge-consecutive-stores-nt.ll |
 | llvm/test/CodeGen/X86/avx512-calling-conv.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll |
 | llvm/test/CodeGen/X86/vector-trunc-ssat.ll |
 | llvm/test/CodeGen/X86/xmulo.ll |
 | llvm/test/CodeGen/X86/pr32610.ll |
 | llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir |
 | llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll |
 | llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll |
 | llvm/test/CodeGen/X86/horizontal-reduce-smin.ll |
 | llvm/test/CodeGen/Mips/cconv/vector.ll |
 | llvm/test/CodeGen/ARM/srem-seteq-illegal-types.ll |
 | llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll |
 | llvm/test/CodeGen/X86/vec_shift4.ll |
 | llvm/test/CodeGen/X86/vector-lzcnt-128.ll |
 | llvm/test/CodeGen/AMDGPU/sdiv64.ll |
 | llvm/test/CodeGen/X86/smin.ll |
 | llvm/test/CodeGen/X86/umul_fix.ll |
 | llvm/test/CodeGen/X86/umul_fix_sat.ll |
 | llvm/test/CodeGen/X86/stack-align-memcpy.ll |
 | llvm/test/CodeGen/X86/pr46527.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll |
 | llvm/test/CodeGen/X86/i256-add.ll |
 | llvm/test/CodeGen/X86/mul-i512.ll |
 | llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll |
 | llvm/test/CodeGen/X86/avx512-select.ll |
 | llvm/test/CodeGen/X86/mmx-arith.ll |
 | llvm/test/CodeGen/X86/vshift-6.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll |
 | llvm/test/CodeGen/X86/umin.ll |
 | llvm/test/CodeGen/X86/mul-constant-i64.ll |
 | llvm/test/CodeGen/X86/vector-sext.ll |
 | llvm/test/CodeGen/X86/sshl_sat.ll |
 | llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll |
 | llvm/test/CodeGen/X86/overflow.ll |
 | llvm/test/CodeGen/RISCV/stack-store-check.ll |
 | llvm/test/CodeGen/X86/sshl_sat_vec.ll |
 | llvm/test/CodeGen/X86/usub_sat.ll |
 | llvm/test/CodeGen/X86/vector-fshl-128.ll |
 | llvm/test/CodeGen/AMDGPU/frem.ll |
 | llvm/test/CodeGen/X86/vector-tzcnt-128.ll |
 | llvm/test/CodeGen/Thumb2/mve-simple-arith.ll |
 | llvm/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll |
 | llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll |
 | llvm/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll |
 | llvm/test/CodeGen/X86/i128-sdiv.ll |
 | llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll |
 | llvm/test/CodeGen/X86/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll |
 | llvm/test/CodeGen/X86/hoist-and-by-const-from-shl-in-eqcmp-zero.ll |
 | llvm/test/CodeGen/X86/statepoint-vreg-unlimited-tied-opnds.ll |
 | llvm/test/CodeGen/X86/umulo-64-legalisation-lowering.ll |
 | llvm/test/CodeGen/X86/bool-vector.ll |
 | llvm/test/CodeGen/X86/select.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll |
 | llvm/test/CodeGen/X86/setcc-wide-types.ll |
 | llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll |
 | llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll |
 | llvm/test/CodeGen/X86/ushl_sat_vec.ll |
 | llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll |
 | llvm/test/CodeGen/X86/fshr.ll |
 | llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll |
 | llvm/lib/CodeGen/RegAllocGreedy.cpp |
 | llvm/test/CodeGen/X86/ushl_sat.ll |
 | llvm/test/CodeGen/X86/smul_fix_sat.ll |
 | llvm/test/CodeGen/X86/vector-rotate-128.ll |
 | llvm/test/CodeGen/RISCV/rv32zbp.ll |
 | llvm/test/CodeGen/X86/sadd_sat.ll |
 | llvm/test/CodeGen/X86/vec-strict-cmp-sub128.ll |
 | llvm/test/CodeGen/X86/smulo-128-legalisation-lowering.ll |
 | llvm/test/CodeGen/X86/vector-fshr-128.ll |
 | llvm/test/CodeGen/AMDGPU/srl.ll |
 | llvm/test/CodeGen/X86/vector-fshl-rot-128.ll |
 | llvm/test/CodeGen/Thumb2/srem-seteq-illegal-types.ll |
 | llvm/test/CodeGen/X86/scheduler-backtracking.ll |
 | llvm/test/CodeGen/X86/build-vector-128.ll |
 | llvm/test/CodeGen/Thumb2/mve-fptosi-sat-vector.ll |
 | llvm/test/CodeGen/X86/horizontal-reduce-smax.ll |
 | llvm/test/CodeGen/X86/clear-highbits.ll |
 | llvm/test/CodeGen/X86/known-signbits-vector.ll |
 | llvm/test/CodeGen/X86/gather-addresses.ll |
 | llvm/test/CodeGen/X86/vector-fshr-rot-128.ll |
 | llvm/test/CodeGen/X86/combine-sbb.ll |
 | llvm/test/CodeGen/X86/horizontal-reduce-umin.ll |
 | llvm/test/CodeGen/X86/popcnt.ll |
 | llvm/test/CodeGen/X86/load-combine.ll |
 | llvm/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll |
 | llvm/test/CodeGen/X86/smax.ll |
 | llvm/test/CodeGen/AMDGPU/greedy-global-heuristic.mir |
 | llvm/test/CodeGen/X86/sdiv_fix.ll |
 | llvm/test/CodeGen/X86/udiv_fix_sat.ll |
 | llvm/test/CodeGen/AMDGPU/load-constant-i16.ll |
 | llvm/test/CodeGen/AMDGPU/load-global-i16.ll |
 | llvm/test/CodeGen/X86/vector-shift-lshr-256.ll |
 | llvm/test/CodeGen/X86/illegal-bitfield-loadstore.ll |
 | llvm/test/CodeGen/X86/nontemporal.ll |
 | llvm/test/CodeGen/X86/horizontal-reduce-umax.ll |
 | llvm/test/CodeGen/X86/pr34080-2.ll |
 | llvm/test/CodeGen/X86/i64-to-float.ll |
 | llvm/test/CodeGen/Thumb2/mve-fptoui-sat-vector.ll |
 | llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll |
 | llvm/test/CodeGen/X86/vector-idiv-v2i32.ll |
 | llvm/test/CodeGen/RISCV/rv64zbp.ll |
 | llvm/test/CodeGen/X86/uadd_sat.ll |
 | llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll |
 | llvm/test/CodeGen/X86/avx512-regcall-NoMask.ll |
 | llvm/test/CodeGen/X86/vector-shift-shl-256.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll |
 | llvm/test/CodeGen/X86/vec_umulo.ll |
 | llvm/test/CodeGen/X86/2007-10-12-SpillerUnfold1.ll |
 | llvm/test/CodeGen/X86/vector-gep.ll |
 | llvm/test/CodeGen/X86/smul_fix.ll |
 | llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll |
 | llvm/test/CodeGen/X86/64-bit-shift-by-32-minus-y.ll |
 | llvm/test/CodeGen/X86/abs.ll |
 | llvm/test/CodeGen/X86/masked_gather_scatter.ll |
 | llvm/test/CodeGen/X86/vec-strict-cmp-128.ll |
 | llvm/test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll |
 | llvm/test/CodeGen/X86/bswap.ll |
 | llvm/test/CodeGen/X86/fptosi-sat-scalar.ll |
 | llvm/test/CodeGen/X86/peephole-na-phys-copy-folding.ll |
 | llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll |
 | llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll |
Commit
962acf0a27fbdb945b7b790cc57ba4ae4729879f
by tlively[lld][WebAssembly] Use llvm-objdump to test __wasm_init_memory
Rather than depending on the hex dump from obj2yaml. Now the test shows the expected function body in a human readable format.
Differential Revision: https://reviews.llvm.org/D109730
|
 | lld/test/wasm/data-segments.ll |
Commit
299b5d420df15fafc9936bc24995f6cd6ad325be
by hoy[CSSPGO] Enable pseudo probe instrumentation in O0 mode.
Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode.
Reviewed By: wenlei, wlei, wmi
Differential Revision: https://reviews.llvm.org/D109531
|
 | clang/test/CodeGen/pseudo-probe-emit.c |
 | llvm/lib/Passes/PassBuilder.cpp |
Commit
54d755a034362814bd7a0b90f172cbba39729cf4
by Matthew.ArsenaultDAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.
|
 | llvm/test/CodeGen/AArch64/fp16_intrinsic_scalar_3op.ll |
 | llvm/test/CodeGen/Hexagon/opt-fneg.ll |
 | llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp |
 | llvm/test/CodeGen/AMDGPU/fneg-combines.ll |
 | llvm/test/CodeGen/ARM/fnegs.ll |
 | llvm/test/CodeGen/PowerPC/combine-fneg.ll |
 | llvm/test/CodeGen/AArch64/arm64-fmadd.ll |