Changes

Summary

  1. Reland [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again (details)
  2. Reland [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost() (details)
  3. [CVP] Add test for PR50399 (NFC) (details)
  4. [Demangle][Rust] Parse raw pointers (details)
  5. [Demangle][Rust] Parse references (details)
  6. [Demangle][Rust] Parse function signatures (details)
  7. [mlir] ConvertStandardToLLVM: make AllocLikeOpLowering public (details)
  8. [CostModel][X86] Improve v8i32 MUL costs on AVX1 targets to account for slower btver2 (details)
  9. [CostModel][X86] Add test coverage for sub-64bit vXi8 multiplication costs (details)
  10. [Matrix] Bail out early if there are no matrix intrinsics. (details)
  11. [MLIR] Drop stale reference to mlir-edsc-builder-api-test (details)
  12. [MLIR][GPU] Add CUDA Tensor core WMMA test (details)
  13. [CostModel][X86] vXi8 MUL is always promoted to vXi16 (details)
  14. [mlir][SCF] Canonicalize nested ParallelOp's (details)
  15. [ARM] Clean up some tests, removing dead instructions. NFC (details)
  16. Reapply [InstCombine] Fold multiuse shr eq zero (details)
  17. [mlir][linalg][nfc] Fix signed/unsigned comparison warning in header (details)
  18. [HIP] support ThinLTO (details)
  19. [JITLink] Move some Block bitfields into Addressable to improve packing. (details)
  20. [ORC] Add more synchronization to TestLookupWithUnthreadedMaterialization. (details)
  21. [CostModel][X86] Pull out X86/X64 scalar int arithmetric costs from SSE tables. NFCI. (details)
  22. [IR] Optimize no-op removal from AttributeSet (NFC) (details)
  23. [IR] Optimize no-op removal from AttributeList (NFC) (details)
  24. [CostModel][X86] Align v4i64 MUL costs on AVX1 targets with worst case (details)
  25. [Driver] Support libc++ in MSVC (details)
  26. [MinGW] Mark a number of library functions unavailable for mingw targets (details)
  27. [Windows] Use TerminateProcess to exit without running destructors (details)
  28. Revert "[Driver] Support libc++ in MSVC" (details)
  29. [ELF][test] Avoid local signature symbols for section groups to match reality (details)
Commit 05a4e4a89c6b6dc6e3edfb5efb9ddc950ae47469 by lebedev.ri
Reland [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again

Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.

Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.

This was initially landed in c02476f3158f2908ef0a6f628210b5380bd33695,
but reverted in 5fddc3312bad7e62493f1605385fad5e589e6450,
because the code made some very optimistic assumptions about invariants
that didn't hold in practice.

Reviewed By: RKSimon, ABataev

Differential Revision: https://reviews.llvm.org/D100684
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/load_store.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Commit 8ed0864fd76ded2646b33de8fc610519dd7f1eb5 by lebedev.ri
Reland [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()

Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.

This was initially landed in 69ed93a4355123a45c1d7216aea7cd53d07a361b,
but was reverted in 6b95fd199d96e3ba5c28a23b17b74203522bdaa8
because the patch it depends on was reverted.
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll
Commit 069174a6349b18a05b7d48b09a8f8b113b402aae by nikita.ppv
[CVP] Add test for PR50399 (NFC)
The file was modifiedllvm/test/Transforms/CorrelatedValuePropagation/phi-common-val.ll
Commit 6aac56336d49fe27c8b8d6c1554a73065a10453b by tomasz.miasko
[Demangle][Rust] Parse raw pointers

Reviewed By: dblaikie

Part of https://reviews.llvm.org/D102580
The file was modifiedllvm/lib/Demangle/RustDemangle.cpp
The file was modifiedllvm/test/Demangle/rust.test
Commit e4fa6c95aca1555167f867a0205cbc99caa2ce09 by tomasz.miasko
[Demangle][Rust] Parse references

Reviewed By: dblaikie

Part of https://reviews.llvm.org/D102580
The file was modifiedllvm/test/Demangle/rust.test
The file was modifiedllvm/lib/Demangle/RustDemangle.cpp
Commit 75cc1cf0181a78d1e79c96b5d318f58a72050939 by tomasz.miasko
[Demangle][Rust] Parse function signatures

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102581
The file was modifiedllvm/test/Demangle/rust.test
The file was modifiedllvm/include/llvm/Demangle/RustDemangle.h
The file was modifiedllvm/lib/Demangle/RustDemangle.cpp
Commit 9afbca746b6c93e5359e5723e5f39c21bca2f4ac by ivan.butygin
[mlir] ConvertStandardToLLVM: make AllocLikeOpLowering public

It is useful for someone who wants to implement custom AllocOp LLVM lowering

Differential Revision: https://reviews.llvm.org/D102932
The file was modifiedmlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
The file was modifiedmlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h
Commit 9bd0dc83b55b53cbe4ae9544b5917e1d9d14dbfb by llvm-dev
[CostModel][X86] Improve v8i32 MUL costs on AVX1 targets to account for slower btver2

BTVER2 has a 2 cycle throughput for v4i32 multiplies (same as SSE41 targets), which is only partially hidden by the subvector extracts/insert when splitting v8i32.
The file was modifiedllvm/test/Analysis/CostModel/X86/arith.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-shl-cost-inseltpoison.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/fshr.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/fshl.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/reduce-mul.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-overflow.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/rem.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vectorized-loop.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-shl-cost.ll
Commit 02918f1079432ae46792d2d2f4542a0fce1ba08b by llvm-dev
[CostModel][X86] Add test coverage for sub-64bit vXi8 multiplication costs

These can be cheaply promoted to a single v8i16 vector for multiplication
The file was modifiedllvm/test/Analysis/CostModel/X86/arith.ll
Commit a6de8d95db484e07c7b1e2d86dfaeacf0d95e656 by flo
[Matrix] Bail out early if there are no matrix intrinsics.

If there are no matrix intrinsics in a function, we can directly bail
out, as there's nothing left to do.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102931
The file was modifiedllvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
Commit 3597b2c37dd61eac680d7442234d05a29dc04d95 by uday
[MLIR] Drop stale reference to mlir-edsc-builder-api-test

Drop stale reference to mlir-edsc-builder-api-test.

Differential Revision: https://reviews.llvm.org/D102967
The file was modifiedmlir/test/lit.cfg.py
Commit e552fa28da286f20f963d51dd05bd3ec278553b7 by uday
[MLIR][GPU] Add CUDA Tensor core WMMA test

Add a test case to test the complete execution of WMMA ops on a Nvidia
GPU with tensor cores. These tests are enabled under
MLIR_RUN_CUDA_TENSOR_CORE_TESTS.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D95334
The file was addedmlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir
The file was addedmlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f16.mlir
The file was modifiedmlir/test/CMakeLists.txt
The file was modifiedmlir/test/lit.site.cfg.py.in
The file was addedmlir/test/Integration/GPU/CUDA/TensorCore/lit.local.cfg
Commit 7a898477bbd4e06113f9aa67c9c53904889c7cbf by llvm-dev
[CostModel][X86] vXi8 MUL is always promoted to vXi16
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/slm-arith-costs.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/extract-shuffle-inseltpoison.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/extract-shuffle.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/rem.ll
Commit 4184018253e720b0f2449b2b83ce27fc682f8579 by ivan.butygin
[mlir][SCF] Canonicalize nested ParallelOp's

Differential Revision: https://reviews.llvm.org/D102799
The file was modifiedmlir/test/Dialect/SCF/canonicalize.mlir
The file was modifiedmlir/lib/Dialect/SCF/SCF.cpp
Commit 211ce51f27e3b00806a6b3df830e0799b9ee8207 by david.green
[ARM] Clean up some tests, removing dead instructions. NFC
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/vector-arith-codegen.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/extending-loads.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-intrinsic-round.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-disabled-in-loloops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-gather-scatter-optimisation.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-intrinsic-fabs.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/reductions.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-basic.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-intrinsic-add-sat.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/vector-reduce-mve-tail.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-gather-increment.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-intrinsic-sub-sat.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-fma-loops.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-widen.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-const.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-reduce.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-gather-optimisation-deep.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/cond-vector-reduce-mve-codegen.ll
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/nested.ll
Commit 9a9421a461166482465e786a46f8cced63cd2e9f by nikita.ppv
Reapply [InstCombine] Fold multiuse shr eq zero

This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).

-----

The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
The file was modifiedllvm/test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll
The file was modifiedllvm/test/Transforms/InstCombine/icmp-shr.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/X86/ctlz-loop.ll
Commit 0dd36f81b9f894497773caed509603eb0f090cae by ivan.butygin
[mlir][linalg][nfc] Fix signed/unsigned comparison warning in header

Differential Revision: https://reviews.llvm.org/D102968
The file was modifiedmlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
Commit bf6124580dfba86b73d828851f03fb9eea1269bd by Yaxun.Liu
[HIP] support ThinLTO

Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.

AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.

HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.

Reviewed by: Teresa Johnson, Artem Belevich

Differential Revision: https://reviews.llvm.org/D99683
The file was modifiedclang/include/clang/Driver/Options.td
The file was addedllvm/test/Transforms/FunctionImport/Inputs/noinline.ll
The file was modifiedllvm/lib/Transforms/IPO/FunctionImport.cpp
The file was modifiedllvm/test/Transforms/FunctionImport/funcimport.ll
The file was modifiedclang/lib/Driver/Driver.cpp
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
The file was addedllvm/test/Transforms/FunctionImport/noinline.ll
The file was modifiedclang/include/clang/Driver/Driver.h
The file was modifiedclang/test/Driver/hip-options.hip
The file was modifiedllvm/test/Transforms/FunctionImport/Inputs/funcimport.ll
The file was modifiedllvm/test/Transforms/FunctionImport/adjustable_threshold.ll
The file was modifiedclang/lib/Driver/ToolChains/HIP.cpp
Commit 2b45895df46e3e87b9588bd207f417d2d2fe7482 by Lang Hames
[JITLink] Move some Block bitfields into Addressable to improve packing.

Keeping these bitfields from Block to Addressable allows them to be packed with
the bitfields at the end of Addressable, reducing the size of Block by eight
bytes.
The file was modifiedllvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
Commit 1a1d6e6f98738be249b20994bcfed48dccac59e3 by Lang Hames
[ORC] Add more synchronization to TestLookupWithUnthreadedMaterialization.

Don't run tasks until their corresponding thread has been added to the running
threads vector. This is an extention to fda4300da82, which doesn't seem to have
been enough to fix the synchronization issues on its own.
The file was modifiedllvm/unittests/ExecutionEngine/Orc/CoreAPIsTest.cpp
Commit 6f9ac11e3960bf5953b3af4b0c4e2682ea802081 by llvm-dev
[CostModel][X86] Pull out X86/X64 scalar int arithmetric costs from SSE tables. NFCI.

These aren't dependent on any SSE level (and don't tend to get quicker either).
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit fd46ed3f397d6cf41bc6c5a04ab2089f585afe44 by nikita.ppv
[IR] Optimize no-op removal from AttributeSet (NFC)

When removing an AttrBuilder from an AttributeSet, first check
whether there is any overlap. If nothing is being removed, we can
directly return the original set.
The file was modifiedllvm/lib/IR/Attributes.cpp
Commit 05738ffcb87b76c6f166f965ba9b2db3257a4338 by nikita.ppv
[IR] Optimize no-op removal from AttributeList (NFC)

When removing an AttrBuilder from an index of an AttributeList,
directly return the original list if no attributes were actually
removed.
The file was modifiedllvm/lib/IR/Attributes.cpp
Commit fc01b9bdf8b55f6b09f1dcaedf78dad62ff205c1 by llvm-dev
[CostModel][X86] Align v4i64 MUL costs on AVX1 targets with worst case

Based on worst case of sandybridge (vs btver2 + bdver2) llvm-mca analysis - which is a lot less than what we were predicting (I think based off total uop count).
The file was modifiedllvm/test/Analysis/CostModel/X86/rem.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/reduce-mul.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-overflow.ll
Commit b604301be3559fb85a11779db79fc9bda4b62bce by phosek
[Driver] Support libc++ in MSVC

This implements support for using libc++ headers and library in the MSVC
toolchain.  We only support libc++ that is a part of the toolchain, and
not headers installed elsewhere on the system.

Differential Revision: https://reviews.llvm.org/D101479
The file was addedclang/test/Driver/msvc-libcxx.cpp
The file was addedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/include/c++/v1/.keep
The file was addedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/bin/.keep
The file was modifiedclang/lib/Driver/ToolChains/MSVC.cpp
The file was addedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/include/x86_64-pc-windows-msvc/c++/v1/.keep
The file was addedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/lib/.keep
The file was addedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/lib/x86_64-pc-windows-msvc/.keep
Commit c5638a71d805330294e9b6e2c670e1ed7420b63a by martin
[MinGW] Mark a number of library functions unavailable for mingw targets

These functions were marked unavailable for MSVC targets before,
within an "T.isOSWindows() && !T.isOSCygMing()" block, but these ones
are unavailable on MinGW targets too.

This avoids generating calls to stpcpy for MinGW targets, which has
been happening since 6dbf0cfcf789365493f70ae69df8a7a59be41c75 (in
some cases).

This fixes https://github.com/mstorsjo/llvm-mingw/issues/201.

Differential Revision: https://reviews.llvm.org/D102946
The file was modifiedllvm/test/Transforms/InstCombine/sprintf-1.ll
The file was modifiedllvm/lib/Analysis/TargetLibraryInfo.cpp
Commit b4fd512c36ca344a3ff69350219e8b0a67e9472a by martin
[Windows] Use TerminateProcess to exit without running destructors

If exiting using _Exit or ExitProcess, DLLs are still unloaded
cleanly before exiting, running destructors and other cleanup in those
DLLs. When the caller expects to exit without cleanup, running
destructors in some loaded DLLs (which can be either libLLVM.dll or
e.g. libc++.dll) can cause deadlocks occasionally.

This is an alternative to D102684.

Differential Revision: https://reviews.llvm.org/D102944
The file was modifiedllvm/lib/Support/Unix/Process.inc
The file was modifiedllvm/include/llvm/Support/Process.h
The file was modifiedllvm/lib/Support/Process.cpp
The file was modifiedllvm/lib/Support/Windows/Process.inc
Commit 5ff79f001feb6584e87173348a24f3f317e35984 by phosek
Revert "[Driver] Support libc++ in MSVC"

This reverts commit b604301be3559fb85a11779db79fc9bda4b62bce since
it caused compilation failure in sanitizer_unwind_win.cpp when using
the runtimes build.
The file was removedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/bin/.keep
The file was removedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/include/c++/v1/.keep
The file was removedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/lib/x86_64-pc-windows-msvc/.keep
The file was removedclang/test/Driver/msvc-libcxx.cpp
The file was modifiedclang/lib/Driver/ToolChains/MSVC.cpp
The file was removedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/include/x86_64-pc-windows-msvc/c++/v1/.keep
The file was removedclang/test/Driver/Inputs/msvc_libcxx_tree/usr/lib/.keep
Commit 0f298ec6ccc0877117e4ee8036a58d10e5ff1ac3 by i
[ELF][test] Avoid local signature symbols for section groups to match reality

If we support local signature symbols (PR43094), these tests would fail.

When the support is added, new tests (local signature symbol specific) should be developed.
The file was modifiedlld/test/ELF/comdat-discarded-lazy.s
The file was modifiedlld/test/ELF/Inputs/comdat.s
The file was modifiedlld/test/ELF/undef-not-suggest.test
The file was modifiedlld/test/ELF/comdat-discarded-error.s
The file was modifiedlld/test/ELF/relocatable-comdat.s
The file was modifiedlld/test/ELF/lto/Inputs/comdat.s
The file was modifiedlld/test/ELF/Inputs/comdat-discarded-reloc.s
The file was modifiedlld/test/ELF/comdat-discarded-reloc.s
The file was modifiedlld/test/ELF/comdat.s
The file was modifiedlld/test/ELF/start-lib-comdat.s