SuccessChanges

Summary

  1. [X86] LowerBUILD_VECTOR - track zero/nonzero elements with APInt masks. NFCI. (details)
  2. [VE] Add logical mask intrinsic instructions (details)
  3. [AMDGPU] Make use of HasSMemRealTime predicate. NFC. (details)
  4. Revert "Re-apply "[CMake][compiler-rt][AArch64] Avoid preprocessing LSE builtins separately"" (details)
  5. [SLP] Control maximum vectorization factor from TTI (details)
  6. [libc][Obvious] Include <fenv.h> from DummyFenv.h. (details)
Commit 5f5a2547c174cf1eaf7874ff02c198629fe02c22 by llvm-dev
[X86] LowerBUILD_VECTOR - track zero/nonzero elements with APInt masks. NFCI.

Prep work for undef/zero 'upper elements' handling as proposed in D92645.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit aefedb170734d680516c3875873c80fc29498b43 by marukawa
[VE] Add logical mask intrinsic instructions

Add andm, orm, xorm, eqvm, nndm, negm, pcvm, lzvm, and tovm intrinsic
instructions, a few pseudo instructions to expand logical intrinsic
using VM512, a mechnism to expand such pseudo instructions, and
regression tests.  Also, assign vector mask types and vector mask
register classes correctly.  This is required to use VM512 registers
as function arguments.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D93093
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/tovm.ll
The file was modifiedllvm/lib/Target/VE/VEInstrInfo.cpp
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/eqvm.ll
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/pcvm.ll
The file was modifiedllvm/include/llvm/IR/IntrinsicsVEVL.gen.td
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/lzvm.ll
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/xorm.ll
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/nndm.ll
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/orm.ll
The file was modifiedllvm/lib/Target/VE/VEInstrIntrinsicVL.gen.td
The file was modifiedllvm/lib/Target/VE/VEInstrVec.td
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/negm.ll
The file was addedllvm/test/CodeGen/VE/VELIntrinsics/andm.ll
Commit 07e92e6b6002d95d438d24eaabf4452ad6e4ef8f by jay.foad
[AMDGPU] Make use of HasSMemRealTime predicate. NFC.

We have this subtarget feature so it makes sense to use it here. This is
NFC because it's always defined by default on GFX8+.

Differential Revision: https://reviews.llvm.org/D93202
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPU.td
The file was modifiedllvm/lib/Target/AMDGPU/SMInstructions.td
Commit c21df2a79c268d1e0f467ec25a1ec7cb4aff5dfb by raul
Revert "Re-apply "[CMake][compiler-rt][AArch64] Avoid preprocessing LSE builtins separately""

This reverts commit 03ebe1937192c247c4a7b8ec19dde2cf9845c914.

It's still breaking bots, e.g. http://green.lab.llvm.org/green/job/clang-stage1-RA/17027/console although it doesn't change any actual code.
The compile errors don't make much sense either. Revert for now.

Differential Revision: https://reviews.llvm.org/D93228
The file was modifiedcompiler-rt/lib/builtins/aarch64/lse.S
The file was modifiedcompiler-rt/cmake/Modules/CompilerRTDarwinUtils.cmake
The file was modifiedcompiler-rt/lib/builtins/CMakeLists.txt
Commit 87d7757bbe14fed420092071ded3430072053316 by Stanislav.Mekhanoshin
[SLP] Control maximum vectorization factor from TTI

D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
The file was modifiedllvm/include/llvm/Analysis/TargetTransformInfo.h
The file was modifiedllvm/lib/Analysis/TargetTransformInfo.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/AMDGPU/round.ll
The file was modifiedllvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Commit 9ad2091e78eb47e6707abbc7c83e208ea1150589 by sivachandra
[libc][Obvious] Include <fenv.h> from DummyFenv.h.
The file was modifiedlibc/utils/FPUtil/DummyFEnv.h