SuccessChanges

Summary

  1. Reapply [InstCombine] Fold multiuse shr eq zero (details)
  2. [mlir][linalg][nfc] Fix signed/unsigned comparison warning in header (details)
  3. [HIP] support ThinLTO (details)
  4. [JITLink] Move some Block bitfields into Addressable to improve packing. (details)
  5. [ORC] Add more synchronization to TestLookupWithUnthreadedMaterialization. (details)
  6. [CostModel][X86] Pull out X86/X64 scalar int arithmetric costs from SSE tables. NFCI. (details)
Commit 9a9421a461166482465e786a46f8cced63cd2e9f by nikita.ppv
Reapply [InstCombine] Fold multiuse shr eq zero

This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).

-----

The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
The file was modifiedllvm/test/Transforms/InstCombine/icmp-shr.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/X86/ctlz-loop.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
The file was modifiedllvm/test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll
Commit 0dd36f81b9f894497773caed509603eb0f090cae by ivan.butygin
[mlir][linalg][nfc] Fix signed/unsigned comparison warning in header

Differential Revision: https://reviews.llvm.org/D102968
The file was modifiedmlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
Commit bf6124580dfba86b73d828851f03fb9eea1269bd by Yaxun.Liu
[HIP] support ThinLTO

Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.

AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.

HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.

Reviewed by: Teresa Johnson, Artem Belevich

Differential Revision: https://reviews.llvm.org/D99683
The file was modifiedclang/include/clang/Driver/Driver.h
The file was modifiedllvm/test/Transforms/FunctionImport/Inputs/funcimport.ll
The file was modifiedclang/test/Driver/hip-options.hip
The file was modifiedllvm/test/Transforms/FunctionImport/adjustable_threshold.ll
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
The file was modifiedclang/lib/Driver/Driver.cpp
The file was modifiedllvm/test/Transforms/FunctionImport/funcimport.ll
The file was modifiedclang/lib/Driver/ToolChains/HIP.cpp
The file was addedllvm/test/Transforms/FunctionImport/Inputs/noinline.ll
The file was addedllvm/test/Transforms/FunctionImport/noinline.ll
The file was modifiedclang/include/clang/Driver/Options.td
The file was modifiedllvm/lib/Transforms/IPO/FunctionImport.cpp
Commit 2b45895df46e3e87b9588bd207f417d2d2fe7482 by Lang Hames
[JITLink] Move some Block bitfields into Addressable to improve packing.

Keeping these bitfields from Block to Addressable allows them to be packed with
the bitfields at the end of Addressable, reducing the size of Block by eight
bytes.
The file was modifiedllvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
Commit 1a1d6e6f98738be249b20994bcfed48dccac59e3 by Lang Hames
[ORC] Add more synchronization to TestLookupWithUnthreadedMaterialization.

Don't run tasks until their corresponding thread has been added to the running
threads vector. This is an extention to fda4300da82, which doesn't seem to have
been enough to fix the synchronization issues on its own.
The file was modifiedllvm/unittests/ExecutionEngine/Orc/CoreAPIsTest.cpp
Commit 6f9ac11e3960bf5953b3af4b0c4e2682ea802081 by llvm-dev
[CostModel][X86] Pull out X86/X64 scalar int arithmetric costs from SSE tables. NFCI.

These aren't dependent on any SSE level (and don't tend to get quicker either).
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp