Commit
119a9ea13f9f2e5fe78125bc3f9a76ebf85d3270
by huberjn[OpenMP] Fix failing test due to change in offloading flags
Summary: Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95133
|
 | openmp/libomptarget/include/Debug.h |
 | openmp/libomptarget/plugins/cuda/src/rtl.cpp |
 | openmp/libomptarget/src/private.h |
 | openmp/libomptarget/src/device.cpp |
Commit
f2fd41d7897e1cc8fc6e9fb2ea46e5b6527852e4
by Duncan P. N. Exon SmithX86: Fix use-after-realloc in X86AsmParser::ParseIntelExpression
`X86AsmParser::ParseIntelExpression` has a while loop. In the body, calls to MCAsmLexer::UnLex can force a reallocation in the MCAsmLexer's `CurToken` SmallVector, invalidating saved references to `MCAsmLexer::getTok()`.
`const MCAsmToken &Tok` is such a saved reference, and this moves it from outside the while loop to inside the body, fixing a use-after-realloc.
`Tok` will still be reused across calls to `Lex()`, each of which effectively destroys and constructs the pointed-to token. I'm a bit skeptical of this usage pattern, but it seems broadly used in the X86AsmParser (and others) so I'm leaving it alone (for now).
Somehow this bug was exposed by https://reviews.llvm.org/D94739, resulting in test failures in dot-operator related tests in llvm/test/tools/llvm-ml. I suspect the exposure path is related to optimizer changes from splitting up the grow operation, but I haven't dug all the way in. Regardless, there are already tests in tree that cover this; they might fail consistently if we added ASan instrumentation to SmallVector.
Differential Revision: https://reviews.llvm.org/D95112
|
 | llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp |
Commit
65fd034b95d69fa0e634861ee165b502ceb92a12
by nikita.ppv[FunctionAttrs] Infer willreturn for functions without loops
If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point.
Differential Revision: https://reviews.llvm.org/D94633
|
 | llvm/test/Transforms/FunctionAttrs/atomic.ll |
 | llvm/test/Transforms/InferFunctionAttrs/norecurse_debug.ll |
 | llvm/test/CodeGen/AMDGPU/inline-attr.ll |
 | llvm/test/Transforms/FunctionAttrs/incompatible_fn_attrs.ll |
 | clang/test/CodeGenOpenCL/convergent.cl |
 | llvm/test/Transforms/FunctionAttrs/nofree.ll |
 | llvm/test/Analysis/TypeBasedAliasAnalysis/functionattrs.ll |
 | llvm/lib/Transforms/IPO/FunctionAttrs.cpp |
 | llvm/test/Transforms/FunctionAttrs/optnone.ll |
 | llvm/test/Transforms/FunctionAttrs/willreturn.ll |
 | llvm/test/Transforms/FunctionAttrs/writeonly.ll |
Commit
8e0b17931530e84f45586e31b58b031d6d68ee6c
by llvm[ELF] report section sizes when output file too large
Fixes PR48523. When the linker errors with "output file too large", one question that comes to mind is how the section sizes differ from what they were previously. Unfortunately, this information is lost when the linker exits without writing the output file. This change makes it so that the error message includes the sizes of the largest sections.
Reviewed By: MaskRay, grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D94560
|
 | lld/test/ELF/linkerscript/output-too-large.s |
 | lld/ELF/Writer.cpp |
Commit
d77753381fe024434ae8ffaaacfe4b9ed9d4d760
by spatel[SLP] simplify reduction matching
This is NFC-intended and removes the "OperationData" class which had become nothing more than a recurrence (reduction) type.
I adjusted the matching logic to distinguish instructions from non-instructions - that's all that the "IsLeafValue" member was keeping track of.
|
 | llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp |
Commit
4ab0f51a7518332b8b7691915b5fdad4c1ed045f
by craig.topperRecommit "[RISCV] Legalize select when Zbt extension available"
This recommits 71ed4b6ce57d8843ef705af8f98305976a8f107a with the polarity of some of the pattern corrected.
Original commit message: The custom expansion of select operations in the RISC-V backend interferes with the matching of cmov instructions. Legalizing select when the Zbt extension is available solves that problem.
Reviewed By: luismarques, craig.topper
Differential Revision: https://reviews.llvm.org/D93767
|
 | llvm/test/CodeGen/RISCV/select-cc.ll |
 | llvm/test/CodeGen/RISCV/rv64Zbt.ll |
 | llvm/lib/Target/RISCV/RISCVISelLowering.cpp |
 | llvm/test/CodeGen/RISCV/rv32Zbb.ll |
 | llvm/test/CodeGen/RISCV/select-optimize-multiple.ll |
 | llvm/test/CodeGen/RISCV/select-bare.ll |
 | llvm/test/CodeGen/RISCV/rv32Zbs.ll |
 | llvm/lib/Target/RISCV/RISCVInstrInfoB.td |
 | llvm/test/CodeGen/RISCV/select-const.ll |
 | llvm/test/CodeGen/RISCV/rv32Zbt.ll |
 | llvm/test/CodeGen/RISCV/rv32Zbbp.ll |
 | llvm/test/CodeGen/RISCV/select-and.ll |
 | llvm/test/CodeGen/RISCV/select-or.ll |
Commit
d7ff0036463fbf049a240fe3792fcfcd8081c41e
by Duncan P. N. Exon SmithADT: Fix reference invalidation in SmallVector::emplace_back and assign(N,V)
This fixes the final (I think?) reference invalidation in `SmallVector` that we need to fix to align with `std::vector`. (There is still some left in the range insert / append / assign, but the standard calls that UB for `std::vector` so I think we don't care?)
For POD-like types, reimplement `emplace_back()` in terms of `push_back()`, taking a copy even for large `T` rather than lose the realloc optimization in `grow_pod()`.
For other types, split the grow operation in three and construct the new element in the middle.
- `mallocForGrow()` calculates the new capacity and returns the result of `safe_malloc()`. We only need a single definition per `SmallVectorBase` so this is defined in SmallVector.cpp to avoid code size bloat. Moving this part of non-POD grow to the source file also allows the logic to be easily shared with `grow_pod`, and `report_size_overflow()` and `report_at_maximum_capacity()` can move there too. - `moveElementsForGrow()` moves elements from the old to the new allocation. - `takeAllocationForGrow()` frees the old allocation and saves the new allocation and capacity .
`SmallVector:assign(size_type, const T&)` also uses the split-grow operations for non-POD, but it also has a semantic change when not growing. Previously, assign would start with `clear()`, and so the old elements were destructed and all elements of the new vector were copy-constructed (potentially invalidating references). The new implementation skips destruction and uses copy-assignment for the prefix of the new vector that fits. The new semantics match what libc++ does for `std::vector::assign()`.
Note that the following is another possible implementation: ``` void assign(size_type NumElts, ValueParamT Elt) { std::fill_n(this->begin(), std::min(NumElts, this->size()), Elt); this->resize(NumElts, Elt); } ``` The downside of this simpler implementation is that if the vector has to grow there will be `size()` redundant copy operations.
(I had planned on splitting this patch up into three for committing (after getting performance numbers / initial review), but I've realized that if this does for some reason need to be reverted we'll probably want to revert the whole package...)
Differential Revision: https://reviews.llvm.org/D94739
|
 | llvm/unittests/ADT/SmallVectorTest.cpp |
 | llvm/lib/Support/SmallVector.cpp |
 | llvm/include/llvm/ADT/SmallVector.h |
Commit
8827e07aaf2114b7f09e229e22481cd58137ea6a
by csiggRemove deprecated methods from OpState.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95123
|
 | mlir/include/mlir/IR/OpDefinition.h |
 | mlir/lib/IR/Operation.cpp |
Commit
bfec9148a042c8fd6093ae0d54c784211a295c6c
by Duncan P. N. Exon SmithScalar: Don't visit constants in findInnerReductionPhi in LoopInterchange
In LoopInterchange, `findInnerReductionPhi()` looks for reduction variables, which cannot be constants. Update it to return early in that case.
This also addresses a blocker for removing use-lists from ConstantData, whose users could be spread across arbitrary modules in the same LLVMContext.
Differential Revision: https://reviews.llvm.org/D94712
|
 | llvm/lib/Transforms/Scalar/LoopInterchange.cpp |
 | llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll |
Commit
2f03528f5e7fd9df0a12091392e000c697497262
by spatel[SLP] rename reduction variable to avoid shadowing; NFC
The code structure can likely be improved now that 'OperationData' is gone.
|
 | llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp |
Commit
39db5753f993abcc4289dd165e8297a4e28f4b0a
by david.green[LV][ARM] Inloop reduction cost modelling
This adds cost modelling for the inloop vectorization added in 745bf6cf4471. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction:
%sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B
There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs.
Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common.
This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now.
Put together this can greatly improve performance for reduction loop under MVE.
Differential Revision: https://reviews.llvm.org/D93476
|
 | llvm/lib/Analysis/TargetTransformInfo.cpp |
 | llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp |
 | llvm/include/llvm/CodeGen/BasicTTIImpl.h |
 | llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-types.ll |
 | llvm/include/llvm/Analysis/TargetTransformInfo.h |
 | llvm/include/llvm/Analysis/TargetTransformInfoImpl.h |
 | llvm/lib/Target/ARM/ARMTargetTransformInfo.h |
 | llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll |
 | llvm/lib/Transforms/Vectorize/LoopVectorize.cpp |
Commit
39239f9b5666bebb059fa562badeffb9f1c3afab
by a20012251[lldb-vscode] improve modules request
lldb-vsdode was communicating the list of modules to the IDE with events, which in practice ended up having some drawbacks - when debugging large targets, the number of these events were easily 10k, which polluted the messages being transmitted, which caused the following: a harder time debugging the messages, a lag after terminated the process because of these messages being processes (this could easily take several seconds). The latter was specially bad, as users were complaining about it even when they didn't check the modules view. - these events were rarely used, as users only check the modules view when something is wrong and they try to debug things.
After getting some feedback from users, we realized that it's better to not used events but make this simply a request and is triggered by users whenever they needed.
This diff achieves that and does some small clean up in the existing code.
Differential Revision: https://reviews.llvm.org/D94033
|
 | lldb/tools/lldb-vscode/lldb-vscode.cpp |
 | lldb/test/API/tools/lldb-vscode/module/TestVSCode_module.py |
 | lldb/packages/Python/lldbsuite/test/tools/lldb-vscode/vscode.py |
 | lldb/tools/lldb-vscode/VSCode.cpp |
Commit
866d480fe0549d616bfdd69986dd07a7b2dc5b52
by danalbert[libc++abi] Add an option to avoid demangling in terminate.
We've been using this patch in Android so we can avoid including the demangler in libc++.so. It comes with a rather large cost in RSS and isn't commonly needed.
Reviewed By: #libc_abi, compnerd
Differential Revision: https://reviews.llvm.org/D88189
|
 | libcxxabi/src/cxa_default_handlers.cpp |
 | libcxxabi/CMakeLists.txt |
Commit
bd3a387ee76f58caa0d7901f3f84e9bb3d006f27
by csiggRevert [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers (cf50f4f76456)
There are cmake failures that I do not know how to fix.
Differential Revision: https://reviews.llvm.org/D95162
|
 | mlir/test/mlir-rocm-runner/vecadd.mlir |
 | mlir/test/mlir-cuda-runner/all-reduce-or.mlir |
 | mlir/test/mlir-cuda-runner/shuffle.mlir |
 | mlir/tools/mlir-cuda-runner/CMakeLists.txt |
 | mlir/lib/ExecutionEngine/CMakeLists.txt |
 | mlir/test/mlir-cuda-runner/multiple-all-reduce.mlir |
 | mlir/test/mlir-rocm-runner/vector-transferops.mlir |
 | mlir/test/mlir-cuda-runner/all-reduce-and.mlir |
 | mlir/test/mlir-cuda-runner/all-reduce-max.mlir |
 | mlir/test/mlir-rocm-runner/gpu-to-hsaco.mlir |
 | mlir/test/mlir-cuda-runner/all-reduce-region.mlir |
 | mlir/test/mlir-cuda-runner/gpu-to-cubin.mlir |
 | mlir/test/mlir-cuda-runner/all-reduce-xor.mlir |
 | mlir/test/mlir-rocm-runner/two-modules.mlir |
 | mlir/tools/mlir-rocm-runner/CMakeLists.txt |
 | mlir/test/mlir-cuda-runner/all-reduce-op.mlir |
 | mlir/test/mlir-cuda-runner/all-reduce-min.mlir |
 | mlir/test/mlir-cuda-runner/two-modules.mlir |
Commit
d75b3719828f3e0c9736476e50a08e5083f90c0b
by sbc[WebAssembly] Test that invalid symbol/relocation types generate errors
See https://bugs.llvm.org/show_bug.cgi?id=48827
Differential Revision: https://reviews.llvm.org/D95163
|
 | llvm/lib/Object/WasmObjectFile.cpp |
 | llvm/test/Object/Inputs/WASM/bad-symbol-type.wasm |
 | llvm/test/Object/Inputs/WASM/bad-reloc-type.wasm |
 | llvm/test/Object/wasm-bad-reloc-type.test |
 | llvm/test/Object/wasm-bad-symbol-type.test |