Changes

Summary

  1. [Polly] Remove isConstCall. (details)
  2. [Polly] Use VirtualUse to determine references. (details)
  3. [Polly] Support for InlineAsm. (details)
  4. [CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972) (details)
  5. [InstCombine] move add after min/max intrinsic (details)
  6. [libomptarget][amdgpu] Destruct HSA queues (details)
  7. [DSE] Make DSEState non-copyable (NFC) (details)
  8. [DSE] Don't check getUnderlyingObject() return value (NFC) (details)
  9. [X86][Costmodel] Load/store i16 VF=2 interleaving costs (details)
  10. [RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC) (details)
  11. [ORC][llvm-jitlink] Add debugging output to SimpleRemoteEPC (and Server). (details)
  12. [X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W)) (details)
  13. Reintroduce "[ORC] Introduce EPCGenericRTDyldMemoryManager." (details)
  14. [gn build] Port 6498b0e991ba (details)
  15. [ORC] Export process symbols in lli-child-target. (details)
  16. [ORC] Remote OrcRemoteTargetClient and OrcRemoteTargetServer. (details)
  17. [X86] Fold PACK(*_EXTEND_VECTOR_INREG, UNDEF) -> *_EXTEND_VECTOR_INREG (details)
  18. [X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold (details)
  19. [BasicAA] Don't check whether GEP is sized (NFC) (details)
  20. [lldb] [gdb-remote] Use llvm::StringRef.split() and llvm::to_integer() (details)
  21. [MCJIT] This test shouldn't require an unwind table. (details)
  22. Fix ClangTidyLegacy warning: "'virtual' is redundant since the function is already declared 'final' " (NFC) (details)
  23. Fix clang-tidy warning "modernize-use-nullptr" in MLIR VulkanRuntime (NFC) (details)
  24. [GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour. (details)
  25. [ORC] Fix SimpleRemoteEPC data races. (details)
  26. [X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics (details)
  27. [ORC] Add missing lock to CompileOnDemandLayer::getPerDylibResources. (details)
  28. [Polly] Reject reject regions entered by an indirectbr/callbr. (details)
Commit 1cea25eec90e355c9b072edc1b6e1e9903d7bca4 by llvm-project
[Polly] Remove isConstCall.

The function was intended to catch OpenMP functions such as
get_thread_id(). If matched, the call would be considered synthesizable.

There were a few problems with this:

* get_thread_id() is not 'const' in the sense of have the gcc manual
   defines it: "do not examine any values except their arguments".
   get_thread_id() reads OpenCL runtime libreary global state.
   What was inteded was probably 'speculable'.

* isConstCall was implemented using mayReadOrWriteMemory(). 'const' is
   stricter than that, mayReadOrWriteMemory is e.g. true for malloc(),
   since it may only read/write addresses that are considered
   inaccessible fro the application. However, malloc is certainly not
   speculable.

* Values that are isConstCall were not handled consistently throughout
   Polly. In particular, it was not considered for referenced values
   (OpenMP outlining and PollyACC).

Fix by removing special handling for isConstCall entirely.
The file was modifiedpolly/include/polly/Support/SCEVValidator.h
The file was modifiedpolly/lib/Analysis/ScopInfo.cpp
The file was modifiedpolly/lib/Support/SCEVValidator.cpp
The file was removedpolly/test/ScopInfo/constant_functions_as_unknowns.ll
The file was removedpolly/test/ScopInfo/constant_functions_multi_dim.ll
The file was modifiedpolly/lib/Analysis/ScopDetection.cpp
Commit d5c87162db7763c89160dc66894bebf3bd1e90d7 by llvm-project
[Polly] Use VirtualUse to determine references.

VirtualUse ensures consistency over different source of values with
Polly. In particular, this enables its use of instructions moved between
Statement. Before the patch, the code wrongly assumed that the BB's
instructions are also the ScopStmt's instructions. Reference are
determined for OpenMP outlining and GPGPU kernel extraction.

GPGPU CodeGen had some problems. For one, it generated GPU kernel
parameters for constants. Second, it emitted GPU-side invariant loads
which have already been loaded by the host. This has been partially
fixed, it still generates a store for the invariant load result, but
using the value that the host has already written.

WARNING: I did not test the generated PollyACC code on an actual GPU.

The improved consistency will be made use of in the next patch.
The file was modifiedpolly/lib/CodeGen/IslNodeBuilder.cpp
The file was modifiedpolly/test/GPGPU/invariant-load-hoisting-read-in-kernel.ll
The file was modifiedpolly/test/GPGPU/invariant-load-of-scalar.ll
The file was modifiedpolly/test/GPGPU/phi-nodes-in-kernel.ll
The file was modifiedpolly/include/polly/CodeGen/IslNodeBuilder.h
Commit 9820dd970c1b72c7f77fad647b762053e2f60e31 by llvm-project
[Polly] Support for InlineAsm.

Inline assembly was not handled at all and treated like a llvm::Value.
In particular, it tried to create a pointer it which is not allowed.

Fix by handling like a llvm::Constant such that it is just reused when
required, instead of trying to marshall it in memory.

Fixes llvm.org/PR51960
The file was modifiedpolly/lib/Support/VirtualInstruction.cpp
The file was addedpolly/test/CodeGen/OpenMP/inlineasm.ll
Commit 3538ee763d13a26515a224ddeb3a51a5af143e38 by llvm-dev
[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972)

Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/trunc.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-overflow.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/cast.ll
Commit 6063e6b499c7829b941f94456af493a1ecb93ea1 by spatel
[InstCombine] move add after min/max intrinsic

This is another regression noted with the proposal to canonicalize
to the min/max intrinsics in D98152.

Here are Alive2 attempts to show correctness without specifying
exact constants:
https://alive2.llvm.org/ce/z/bvfCwh (smax)
https://alive2.llvm.org/ce/z/of7eqy (smin)
https://alive2.llvm.org/ce/z/2Xtxoh (umax)
https://alive2.llvm.org/ce/z/Rm4Ad8 (umin)
(if you comment out the assume and/or no-wrap, you should see failures)

The different output for the umin test is due to a fold added with
c4fc2cb5b2d98125 :

// umin(x, 1) == zext(x != 0)

We probably want to adjust that, so it applies more generally
(umax --> sext or patterns where we can fold to select-of-constants).
Some folds that were ok when starting with cmp+select may increase
instruction count for the equivalent intrinsic, so we have to decide
if it's worth altering a min/max.

Differential Revision: https://reviews.llvm.org/D110038
The file was modifiedllvm/test/Transforms/InstCombine/minmax-intrinsics.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
Commit 8cf93a35d4b873b5e50c152d00adfc3701c679ea by jonathanchesterfield
[libomptarget][amdgpu] Destruct HSA queues

Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109511
The file was modifiedopenmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.h
The file was modifiedopenmp/libomptarget/plugins/amdgpu/src/rtl.cpp
The file was modifiedopenmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.cpp
Commit f3c74b72f45ec3e6ca2402468cb070d7e485e3d4 by nikita.ppv
[DSE] Make DSEState non-copyable (NFC)

As it contains a self-reference, the default copy/move ctors
would not be safe.

Move the DSEState::get() method into the ctor to make sure no move
occurs here even without NRVO.

This is a speculative fix for test failures on
llvm-clang-x86_64-expensive-checks-win.
The file was modifiedllvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
Commit 14a49f5840a15791e7452200832a51bd11620df6 by nikita.ppv
[DSE] Don't check getUnderlyingObject() return value (NFC)

getUnderlyingObject() never returns null. It will simply return
something that is not the "root" underlying object.

Also drop a stale comment.
The file was modifiedllvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
Commit d9413f46b308df5afd7fc106df2af809757bb0c9 by lebedev.ri
[X86][Costmodel] Load/store i16 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`;
                                  for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`;
                                  for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103144
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-2.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-2.ll
Commit c4ae4a745dbdb2ac3d8a5c77d2cd12b5e5349154 by kazu
[RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC)

Note that RISCVMnemonicSpellCheck is defined in
RISCVGenAsmMatcher.inc, which RISCVAsmParser.cpp includes.

Identified with readability-redundant-declaration.
The file was modifiedllvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
Commit 175c1a39e8f924f9199d4c8c94f06ad0d3502235 by Lang Hames
[ORC][llvm-jitlink] Add debugging output to SimpleRemoteEPC (and Server).

Also adds an optional 'debug' argument to the llvm-jitlink-executor tool to
enable debug-logging.
The file was modifiedllvm/lib/ExecutionEngine/Orc/SimpleRemoteEPC.cpp
The file was modifiedllvm/include/llvm/ExecutionEngine/Orc/SimpleRemoteEPC.h
The file was modifiedllvm/tools/llvm-jitlink/llvm-jitlink-executor/llvm-jitlink-executor.cpp
The file was modifiedllvm/include/llvm/ExecutionEngine/Orc/TargetProcess/SimpleRemoteEPCServer.h
The file was modifiedllvm/lib/ExecutionEngine/Orc/TargetProcess/SimpleRemoteEPCServer.cpp
Commit 3fe97672047bcdedbd5d34a26498b10f9dba369d by llvm-dev
[X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W))

Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements.

There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/madd.ll
The file was modifiedllvm/test/CodeGen/X86/pmaddubsw.ll
Commit 6498b0e991babe71e69ab02e1afa7f5535f2be0f by Lang Hames
Reintroduce "[ORC] Introduce EPCGenericRTDyldMemoryManager."

This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager."
(bef55a2b47a938ef35cbd7b61a1e5fa74e68c9ed) and "[lli] Add ChildTarget dependence
on OrcTargetProcess library." (7a219d801bf2c3006482cf3cbd3170b3b4ea2e1b) which were
reverted in 99951a56842d8e4cd0706cd17a04f77b5d0f6dd0 due to bot failures.

The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized
variable." (0371049277912afc201da721fa659ecef7ab7fba) and "[ORC] Wait for
handleDisconnect to complete in SimpleRemoteEPC::disconnect."
(320832cc9b7e7fea5fc8afbed75c34c4a43287ba).
The file was addedllvm/include/llvm/ExecutionEngine/Orc/EPCGenericRTDyldMemoryManager.h
The file was modifiedllvm/lib/ExecutionEngine/Orc/TargetProcess/RegisterEHFrames.cpp
The file was addedllvm/lib/ExecutionEngine/Orc/EPCGenericRTDyldMemoryManager.cpp
The file was modifiedllvm/include/llvm/ExecutionEngine/Orc/Shared/OrcRTBridge.h
The file was modifiedllvm/tools/lli/ChildTarget/ChildTarget.cpp
The file was addedllvm/tools/lli/ForwardingMemoryManager.h
The file was modifiedllvm/include/llvm/ExecutionEngine/Orc/TargetProcess/RegisterEHFrames.h
The file was modifiedllvm/tools/lli/lli.cpp
The file was modifiedllvm/lib/ExecutionEngine/Orc/Shared/OrcRTBridge.cpp
The file was modifiedllvm/lib/ExecutionEngine/Orc/TargetProcess/SimpleExecutorMemoryManager.cpp
The file was modifiedllvm/lib/ExecutionEngine/Orc/CMakeLists.txt
The file was modifiedllvm/tools/lli/ChildTarget/CMakeLists.txt
The file was removedllvm/tools/lli/RemoteJITUtils.h
The file was modifiedllvm/lib/ExecutionEngine/Orc/TargetProcess/OrcRTBootstrap.cpp
Commit a44b122adead759e6af4d73ccf4cc2b3c813c400 by llvmgnsyncbot
[gn build] Port 6498b0e991ba
The file was modifiedllvm/utils/gn/secondary/llvm/lib/ExecutionEngine/Orc/BUILD.gn
Commit a12c0d5ea66a1059333b9b8ea364e9301c1413c5 by Lang Hames
[ORC] Export process symbols in lli-child-target.

We want this behavior for future testing infrastructure anyway, and it may help
with the failure in https://lab.llvm.org/buildbot/#/builders/98/builds/6401:

/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: warning:
remote mcjit does not support lazy compilation
Finalization error: could not register eh-frame: __register_frame function not
found
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: disconnecting
The file was modifiedllvm/tools/lli/ChildTarget/CMakeLists.txt
Commit f40685138ba182fc823f569f6d88b7d3ddf34b9e by Lang Hames
[ORC] Remote OrcRemoteTargetClient and OrcRemoteTargetServer.

Now that the lli and lli-child-target tools have been updated to use
SimpleRemoteEPC (6498b0e991b) the OrcRemoteTarget* APIs are no longer needed.

Once the LLJITWithRemoteDebugging example has been migrated to SimpleRemoteEPC
we will remove OrcRPCExecutorProcessControl, and the ORC RPC system itself.
The file was removedllvm/include/llvm/ExecutionEngine/Orc/OrcRemoteTargetRPCAPI.h
The file was removedllvm/include/llvm/ExecutionEngine/Orc/OrcRemoteTargetClient.h
The file was removedllvm/include/llvm/ExecutionEngine/Orc/OrcRemoteTargetServer.h
Commit ed3e4917b36f2530703115066700daeb2b45b4f0 by llvm-dev
[X86] Fold PACK(*_EXTEND_VECTOR_INREG, UNDEF) -> *_EXTEND_VECTOR_INREG

For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/pmaddubsw.ll
Commit c0eff50fc5a48990ad9ebfcb7e81c6ab6fea79c5 by llvm-dev
[X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold

The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.
The file was modifiedllvm/test/CodeGen/X86/madd.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 7a855596c3a29ba7a9b0cc9bcc820f7f78d07afe by nikita.ppv
[BasicAA] Don't check whether GEP is sized (NFC)

GEPs are required to have sized source element type, so we can
just assert that here.
The file was modifiedllvm/lib/Analysis/BasicAliasAnalysis.cpp
Commit e2f780fba96c55b0dcb7aa3c4719110875b36dfb by mgorny
[lldb] [gdb-remote] Use llvm::StringRef.split() and llvm::to_integer()

Replace the uses of StringConvert combined with hand-rolled array
splitting with llvm::StringRef.split() and llvm::to_integer().

Differential Revision: https://reviews.llvm.org/D110472
The file was modifiedlldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
The file was modifiedlldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
The file was modifiedlldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
Commit daf0b2f078171ce3a04a8804b0bfbed8f859115a by Lang Hames
[MCJIT] This test shouldn't require an unwind table.

This should fix the failures on the Fuchsia bot that started in
https://lab.llvm.org/buildbot/#/builders/98/builds/6401.
The file was modifiedllvm/test/ExecutionEngine/MCJIT/remote/test-global-init-nonzero-sm-pic.ll
Commit b3891f28a31248de264741d70401c42c969b284b by joker.eph
Fix ClangTidyLegacy warning: "'virtual' is redundant since the function is already declared 'final' " (NFC)
The file was modifiedmlir/test/lib/Dialect/Test/TestDialect.cpp
Commit 9c2cd6e7c803eabbee652f9477c23aeda8ce02c8 by joker.eph
Fix clang-tidy warning "modernize-use-nullptr" in MLIR VulkanRuntime (NFC)
The file was modifiedmlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp
Commit acd13994d17fe64269f71324840bd45031a2e552 by Amara Emerson
[GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour.
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/irtranslator-arguments.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/arm64-callingconv-ios.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll
The file was modifiedllvm/test/CodeGen/Mips/GlobalISel/irtranslator/float_args.ll
The file was modifiedllvm/test/CodeGen/Mips/GlobalISel/irtranslator/extend_args.ll
The file was modifiedllvm/test/CodeGen/X86/GlobalISel/irtranslator-callingconv.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/irtranslator-exceptions.ll
The file was modifiedllvm/test/CodeGen/Mips/GlobalISel/irtranslator/stack_args.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/arm64-callingconv.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/call-translator-tail-call.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-s128-div.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll
Commit 4b37462aab4e5f5f1ffdb04294e87b990d189751 by Lang Hames
[ORC] Fix SimpleRemoteEPC data races.

Adds a 'start' method to SimpleRemoteEPCTransport to defer transport startup
until the client has been configured. This avoids races on client members if the
first messages arrives while the client is being configured.

Also fixes races on the file descriptors in FDSimpleRemoteEPCTransport.
The file was modifiedllvm/include/llvm/ExecutionEngine/Orc/Shared/SimpleRemoteEPCUtils.h
The file was modifiedllvm/include/llvm/ExecutionEngine/Orc/TargetProcess/SimpleRemoteEPCServer.h
The file was modifiedllvm/lib/ExecutionEngine/Orc/Shared/SimpleRemoteEPCUtils.cpp
The file was modifiedllvm/lib/ExecutionEngine/Orc/SimpleRemoteEPC.cpp
The file was modifiedllvm/include/llvm/ExecutionEngine/Orc/SimpleRemoteEPC.h
Commit 7d6889964ab534164698ef134de9cf11cd87a09d by pengfei.wang
[X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110336
The file was modifiedllvm/include/llvm/IR/IntrinsicsX86.td
The file was modifiedclang/lib/CodeGen/CGBuiltin.cpp
The file was modifiedclang/lib/Sema/SemaChecking.cpp
The file was modifiedclang/test/CodeGen/X86/avx512fp16-builtins.c
The file was modifiedclang/include/clang/Basic/BuiltinsX86.def
The file was modifiedclang/lib/Headers/avx512fp16intrin.h
Commit 1ea8d12510b9e1b208a7541c86e1b02a9a3db0e2 by Lang Hames
[ORC] Add missing lock to CompileOnDemandLayer::getPerDylibResources.

The getPerDylibResources method may be called concurrently from multiple
threads, so we need to protect access to the underlying map.

Possible for fix https://llvm.org/PR51064
The file was modifiedllvm/lib/ExecutionEngine/Orc/CompileOnDemandLayer.cpp
Commit 91f46bb77e6d56955c3b96e9e844ae6a251c41e9 by llvm-project
[Polly] Reject reject regions entered by an indirectbr/callbr.

SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.

This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.

This fixes llvm.org/PR51964
The file was modifiedpolly/include/polly/ScopDetectionDiagnostic.h
The file was addedpolly/test/ScopDetect/callbr.ll
The file was modifiedpolly/lib/Analysis/ScopDetectionDiagnostic.cpp
The file was modifiedpolly/lib/Analysis/ScopDetection.cpp