Commit
1cea25eec90e355c9b072edc1b6e1e9903d7bca4
by llvm-project[Polly] Remove isConstCall.
The function was intended to catch OpenMP functions such as get_thread_id(). If matched, the call would be considered synthesizable.
There were a few problems with this:
* get_thread_id() is not 'const' in the sense of have the gcc manual defines it: "do not examine any values except their arguments". get_thread_id() reads OpenCL runtime libreary global state. What was inteded was probably 'speculable'.
* isConstCall was implemented using mayReadOrWriteMemory(). 'const' is stricter than that, mayReadOrWriteMemory is e.g. true for malloc(), since it may only read/write addresses that are considered inaccessible fro the application. However, malloc is certainly not speculable.
* Values that are isConstCall were not handled consistently throughout Polly. In particular, it was not considered for referenced values (OpenMP outlining and PollyACC).
Fix by removing special handling for isConstCall entirely.
|
 | polly/include/polly/Support/SCEVValidator.h |
 | polly/lib/Analysis/ScopInfo.cpp |
 | polly/test/ScopInfo/constant_functions_multi_dim.ll |
 | polly/test/ScopInfo/constant_functions_as_unknowns.ll |
 | polly/lib/Support/SCEVValidator.cpp |
 | polly/lib/Analysis/ScopDetection.cpp |
Commit
d5c87162db7763c89160dc66894bebf3bd1e90d7
by llvm-project[Polly] Use VirtualUse to determine references.
VirtualUse ensures consistency over different source of values with Polly. In particular, this enables its use of instructions moved between Statement. Before the patch, the code wrongly assumed that the BB's instructions are also the ScopStmt's instructions. Reference are determined for OpenMP outlining and GPGPU kernel extraction.
GPGPU CodeGen had some problems. For one, it generated GPU kernel parameters for constants. Second, it emitted GPU-side invariant loads which have already been loaded by the host. This has been partially fixed, it still generates a store for the invariant load result, but using the value that the host has already written.
WARNING: I did not test the generated PollyACC code on an actual GPU.
The improved consistency will be made use of in the next patch.
|
 | polly/test/GPGPU/phi-nodes-in-kernel.ll |
 | polly/test/GPGPU/invariant-load-of-scalar.ll |
 | polly/include/polly/CodeGen/IslNodeBuilder.h |
 | polly/lib/CodeGen/IslNodeBuilder.cpp |
 | polly/test/GPGPU/invariant-load-hoisting-read-in-kernel.ll |
Commit
9820dd970c1b72c7f77fad647b762053e2f60e31
by llvm-project[Polly] Support for InlineAsm.
Inline assembly was not handled at all and treated like a llvm::Value. In particular, it tried to create a pointer it which is not allowed.
Fix by handling like a llvm::Constant such that it is just reused when required, instead of trying to marshall it in memory.
Fixes llvm.org/PR51960
|
 | polly/test/CodeGen/OpenMP/inlineasm.ll |
 | polly/lib/Support/VirtualInstruction.cpp |
Commit
3538ee763d13a26515a224ddeb3a51a5af143e38
by llvm-dev[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972)
Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports
|
 | llvm/test/Analysis/CostModel/X86/trunc.ll |
 | llvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll |
 | llvm/test/Analysis/CostModel/X86/arith-fix.ll |
 | llvm/test/Analysis/CostModel/X86/cast.ll |
 | llvm/test/Analysis/CostModel/X86/arith-overflow.ll |
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
Commit
6063e6b499c7829b941f94456af493a1ecb93ea1
by spatel[InstCombine] move add after min/max intrinsic
This is another regression noted with the proposal to canonicalize to the min/max intrinsics in D98152.
Here are Alive2 attempts to show correctness without specifying exact constants: https://alive2.llvm.org/ce/z/bvfCwh (smax) https://alive2.llvm.org/ce/z/of7eqy (smin) https://alive2.llvm.org/ce/z/2Xtxoh (umax) https://alive2.llvm.org/ce/z/Rm4Ad8 (umin) (if you comment out the assume and/or no-wrap, you should see failures)
The different output for the umin test is due to a fold added with c4fc2cb5b2d98125 :
// umin(x, 1) == zext(x != 0)
We probably want to adjust that, so it applies more generally (umax --> sext or patterns where we can fold to select-of-constants). Some folds that were ok when starting with cmp+select may increase instruction count for the equivalent intrinsic, so we have to decide if it's worth altering a min/max.
Differential Revision: https://reviews.llvm.org/D110038
|
 | llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp |
 | llvm/test/Transforms/InstCombine/minmax-intrinsics.ll |
Commit
8cf93a35d4b873b5e50c152d00adfc3701c679ea
by jonathanchesterfield[libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D109511
|
 | openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.h |
 | openmp/libomptarget/plugins/amdgpu/src/rtl.cpp |
 | openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.cpp |
Commit
f3c74b72f45ec3e6ca2402468cb070d7e485e3d4
by nikita.ppv[DSE] Make DSEState non-copyable (NFC)
As it contains a self-reference, the default copy/move ctors would not be safe.
Move the DSEState::get() method into the ctor to make sure no move occurs here even without NRVO.
This is a speculative fix for test failures on llvm-clang-x86_64-expensive-checks-win.
|
 | llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp |
Commit
14a49f5840a15791e7452200832a51bd11620df6
by nikita.ppv[DSE] Don't check getUnderlyingObject() return value (NFC)
getUnderlyingObject() never returns null. It will simply return something that is not the "root" underlying object.
Also drop a stale comment.
|
 | llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp |
Commit
d9413f46b308df5afd7fc106df2af809757bb0c9
by lebedev.ri[X86][Costmodel] Load/store i16 VF=2 interleaving costs
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have: https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`.
For store we have: https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`.
I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103144
|
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
 | llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-2.ll |
 | llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-2.ll |
Commit
c4ae4a745dbdb2ac3d8a5c77d2cd12b5e5349154
by kazu[RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC)
Note that RISCVMnemonicSpellCheck is defined in RISCVGenAsmMatcher.inc, which RISCVAsmParser.cpp includes.
Identified with readability-redundant-declaration.
|
 | llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp |
Commit
175c1a39e8f924f9199d4c8c94f06ad0d3502235
by Lang Hames[ORC][llvm-jitlink] Add debugging output to SimpleRemoteEPC (and Server).
Also adds an optional 'debug' argument to the llvm-jitlink-executor tool to enable debug-logging.
|
 | llvm/lib/ExecutionEngine/Orc/SimpleRemoteEPC.cpp |
 | llvm/include/llvm/ExecutionEngine/Orc/TargetProcess/SimpleRemoteEPCServer.h |
 | llvm/tools/llvm-jitlink/llvm-jitlink-executor/llvm-jitlink-executor.cpp |
 | llvm/include/llvm/ExecutionEngine/Orc/SimpleRemoteEPC.h |
 | llvm/lib/ExecutionEngine/Orc/TargetProcess/SimpleRemoteEPCServer.cpp |
Commit
3fe97672047bcdedbd5d34a26498b10f9dba369d
by llvm-dev[X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W))
Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements.
There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....
|
 | llvm/lib/Target/X86/X86ISelLowering.cpp |
 | llvm/test/CodeGen/X86/madd.ll |
 | llvm/test/CodeGen/X86/pmaddubsw.ll |
Commit
6498b0e991babe71e69ab02e1afa7f5535f2be0f
by Lang HamesReintroduce "[ORC] Introduce EPCGenericRTDyldMemoryManager."
This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager." (bef55a2b47a938ef35cbd7b61a1e5fa74e68c9ed) and "[lli] Add ChildTarget dependence on OrcTargetProcess library." (7a219d801bf2c3006482cf3cbd3170b3b4ea2e1b) which were reverted in 99951a56842d8e4cd0706cd17a04f77b5d0f6dd0 due to bot failures.
The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized variable." (0371049277912afc201da721fa659ecef7ab7fba) and "[ORC] Wait for handleDisconnect to complete in SimpleRemoteEPC::disconnect." (320832cc9b7e7fea5fc8afbed75c34c4a43287ba).
|
 | llvm/lib/ExecutionEngine/Orc/TargetProcess/SimpleExecutorMemoryManager.cpp |
 | llvm/lib/ExecutionEngine/Orc/CMakeLists.txt |
 | llvm/tools/lli/ForwardingMemoryManager.h |
 | llvm/lib/ExecutionEngine/Orc/TargetProcess/RegisterEHFrames.cpp |
 | llvm/include/llvm/ExecutionEngine/Orc/EPCGenericRTDyldMemoryManager.h |
 | llvm/lib/ExecutionEngine/Orc/EPCGenericRTDyldMemoryManager.cpp |
 | llvm/tools/lli/lli.cpp |
 | llvm/lib/ExecutionEngine/Orc/TargetProcess/OrcRTBootstrap.cpp |
 | llvm/tools/lli/ChildTarget/CMakeLists.txt |
 | llvm/include/llvm/ExecutionEngine/Orc/Shared/OrcRTBridge.h |
 | llvm/tools/lli/ChildTarget/ChildTarget.cpp |
 | llvm/include/llvm/ExecutionEngine/Orc/TargetProcess/RegisterEHFrames.h |
 | llvm/lib/ExecutionEngine/Orc/Shared/OrcRTBridge.cpp |
 | llvm/tools/lli/RemoteJITUtils.h |
Commit
a44b122adead759e6af4d73ccf4cc2b3c813c400
by llvmgnsyncbot[gn build] Port 6498b0e991ba
|
 | llvm/utils/gn/secondary/llvm/lib/ExecutionEngine/Orc/BUILD.gn |
Commit
a12c0d5ea66a1059333b9b8ea364e9301c1413c5
by Lang Hames[ORC] Export process symbols in lli-child-target.
We want this behavior for future testing infrastructure anyway, and it may help with the failure in https://lab.llvm.org/buildbot/#/builders/98/builds/6401:
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: warning: remote mcjit does not support lazy compilation Finalization error: could not register eh-frame: __register_frame function not found /b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: disconnecting
|
 | llvm/tools/lli/ChildTarget/CMakeLists.txt |
Commit
f40685138ba182fc823f569f6d88b7d3ddf34b9e
by Lang Hames[ORC] Remote OrcRemoteTargetClient and OrcRemoteTargetServer.
Now that the lli and lli-child-target tools have been updated to use SimpleRemoteEPC (6498b0e991b) the OrcRemoteTarget* APIs are no longer needed.
Once the LLJITWithRemoteDebugging example has been migrated to SimpleRemoteEPC we will remove OrcRPCExecutorProcessControl, and the ORC RPC system itself.
|
 | llvm/include/llvm/ExecutionEngine/Orc/OrcRemoteTargetClient.h |
 | llvm/include/llvm/ExecutionEngine/Orc/OrcRemoteTargetServer.h |
 | llvm/include/llvm/ExecutionEngine/Orc/OrcRemoteTargetRPCAPI.h |
Commit
ed3e4917b36f2530703115066700daeb2b45b4f0
by llvm-dev[X86] Fold PACK(*_EXTEND_VECTOR_INREG, UNDEF) -> *_EXTEND_VECTOR_INREG
For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.
|
 | llvm/lib/Target/X86/X86ISelLowering.cpp |
 | llvm/test/CodeGen/X86/pmaddubsw.ll |
Commit
c0eff50fc5a48990ad9ebfcb7e81c6ab6fea79c5
by llvm-dev[X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold
The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.
|
 | llvm/lib/Target/X86/X86ISelLowering.cpp |
 | llvm/test/CodeGen/X86/madd.ll |
Commit
7a855596c3a29ba7a9b0cc9bcc820f7f78d07afe
by nikita.ppv[BasicAA] Don't check whether GEP is sized (NFC)
GEPs are required to have sized source element type, so we can just assert that here.
|
 | llvm/lib/Analysis/BasicAliasAnalysis.cpp |
Commit
e2f780fba96c55b0dcb7aa3c4719110875b36dfb
by mgorny[lldb] [gdb-remote] Use llvm::StringRef.split() and llvm::to_integer()
Replace the uses of StringConvert combined with hand-rolled array splitting with llvm::StringRef.split() and llvm::to_integer().
Differential Revision: https://reviews.llvm.org/D110472
|
 | lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h |
 | lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp |
 | lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp |
Commit
daf0b2f078171ce3a04a8804b0bfbed8f859115a
by Lang Hames[MCJIT] This test shouldn't require an unwind table.
This should fix the failures on the Fuchsia bot that started in https://lab.llvm.org/buildbot/#/builders/98/builds/6401.
|
 | llvm/test/ExecutionEngine/MCJIT/remote/test-global-init-nonzero-sm-pic.ll |
Commit
b3891f28a31248de264741d70401c42c969b284b
by joker.ephFix ClangTidyLegacy warning: "'virtual' is redundant since the function is already declared 'final' " (NFC)
|
 | mlir/test/lib/Dialect/Test/TestDialect.cpp |
Commit
9c2cd6e7c803eabbee652f9477c23aeda8ce02c8
by joker.ephFix clang-tidy warning "modernize-use-nullptr" in MLIR VulkanRuntime (NFC)
|
 | mlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp |
Commit
acd13994d17fe64269f71324840bd45031a2e552
by Amara Emerson[GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour.
|
 | llvm/test/CodeGen/X86/GlobalISel/irtranslator-callingconv.ll |
 | llvm/test/CodeGen/AArch64/GlobalISel/arm64-callingconv.ll |
 | llvm/test/CodeGen/Mips/GlobalISel/irtranslator/float_args.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll |
 | llvm/test/CodeGen/AArch64/GlobalISel/legalize-s128-div.mir |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll |
 | llvm/test/CodeGen/Mips/GlobalISel/irtranslator/extend_args.ll |
 | llvm/test/CodeGen/Mips/GlobalISel/irtranslator/stack_args.ll |
 | llvm/test/CodeGen/AArch64/GlobalISel/arm64-callingconv-ios.ll |
 | llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-arguments.ll |
 | llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-exceptions.ll |
 | llvm/test/CodeGen/AArch64/GlobalISel/call-translator-tail-call.ll |
 | llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll |
Commit
4b37462aab4e5f5f1ffdb04294e87b990d189751
by Lang Hames[ORC] Fix SimpleRemoteEPC data races.
Adds a 'start' method to SimpleRemoteEPCTransport to defer transport startup until the client has been configured. This avoids races on client members if the first messages arrives while the client is being configured.
Also fixes races on the file descriptors in FDSimpleRemoteEPCTransport.
|
 | llvm/include/llvm/ExecutionEngine/Orc/TargetProcess/SimpleRemoteEPCServer.h |
 | llvm/include/llvm/ExecutionEngine/Orc/Shared/SimpleRemoteEPCUtils.h |
 | llvm/include/llvm/ExecutionEngine/Orc/SimpleRemoteEPC.h |
 | llvm/lib/ExecutionEngine/Orc/Shared/SimpleRemoteEPCUtils.cpp |
 | llvm/lib/ExecutionEngine/Orc/SimpleRemoteEPC.cpp |
Commit
7d6889964ab534164698ef134de9cf11cd87a09d
by pengfei.wang[X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D110336
|
 | clang/lib/CodeGen/CGBuiltin.cpp |
 | clang/test/CodeGen/X86/avx512fp16-builtins.c |
 | llvm/include/llvm/IR/IntrinsicsX86.td |
 | clang/lib/Headers/avx512fp16intrin.h |
 | clang/include/clang/Basic/BuiltinsX86.def |
 | clang/lib/Sema/SemaChecking.cpp |
Commit
1ea8d12510b9e1b208a7541c86e1b02a9a3db0e2
by Lang Hames[ORC] Add missing lock to CompileOnDemandLayer::getPerDylibResources.
The getPerDylibResources method may be called concurrently from multiple threads, so we need to protect access to the underlying map.
Possible for fix https://llvm.org/PR51064
|
 | llvm/lib/ExecutionEngine/Orc/CompileOnDemandLayer.cpp |
Commit
91f46bb77e6d56955c3b96e9e844ae6a251c41e9
by llvm-project[Polly] Reject reject regions entered by an indirectbr/callbr.
SplitBlockPredecessors is unable to insert an additional BasicBlock between an indirectbr/callbr terminator and the successor blocks. This is needed by Polly to normalize the control flow before emitting its optimzed code.
This patches rejects regions entered by an indirectbr/callbr to not fail later at code generation.
This fixes llvm.org/PR51964
|
 | polly/lib/Analysis/ScopDetection.cpp |
 | polly/lib/Analysis/ScopDetectionDiagnostic.cpp |
 | polly/include/polly/ScopDetectionDiagnostic.h |
 | polly/test/ScopDetect/callbr.ll |