SuccessChanges

Changes from Git (git http://labmaster3.local/git/llvm-project.git)

Summary

  1. [X86] Add an assert that v32i16/v64i8 splitting in LowerVSETCC should only occur when AVX512BW is disabled. NFC (details)
  2. [X86] Teach getUndefRegClearance that we use undef for inputs to PUNPCK in some cases. (details)
  3. [X86] Add XOP vector shift by scalar amount tests (details)
  4. [CodeGenPrepare][X86] Add x16i16, v32i8 and XOP vector shift by scalar amount tests (details)
  5. AMDGPU: Skip GetUnderlyingObject check in pointsToConstantMemory (details)
  6. Fix typo (details)
  7. InstCombine: Broaden copy-constant-to-alloca optimization (details)
  8. GlobalISel: Combine G_UNMERGE_VALUES with G_TRUNC (details)
  9. GlobalISel: Move code into lowering for G_MERGE_VALUES (details)
  10. [Clang] Pass --pack-dyn-relocs=relr to lld for Fuchsia (details)
  11. [Clang] Pass -z max-page-size to linker for Fuchsia (details)
  12. [X86] isVectorShiftByScalarCheap - don't limit fast XOP vector shifts to 128-bit vectors (details)
  13. [LAA] Remove unneeded PtrRtChecking argument (NFC). (details)
  14. [BreakFalseDeps] Harden pickBestRegisterForUndef against changing tied operands or physical registers that aren't renamable. (details)
  15. GlobalISel: Handle more cases in lowerUnmergeValues (details)
  16. [X86] Add test cases for vXi16 PMULH opportunities that don't end in truncate. (details)
Commit 56bf0b58c24c4292a2345e646298f5aba67dffb5 by craig.topper
[X86] Add an assert that v32i16/v64i8 splitting in LowerVSETCC should only occur when AVX512BW is disabled. NFC

With BWI we should only get a v32i1/v64i1 result type.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit c7be6a86f44a0df4f47c183c828ea3e29840ade2 by craig.topper
[X86] Teach getUndefRegClearance that we use undef for inputs to PUNPCK in some cases.

This enables the register to be changed from XMM/YMM/ZMM0 to
instead match the other source. This prevents a false
dependency.

I added all the integer unpck instructions, but the tests
only show changes for BW and WD.

Unfortunately, we can have undef on operand 1 or 2 of the AVX
instructions. This breaks the interface with hasUndefRegUpdate
which used to tell which operand to check.

Now we scan the input operands looking for an undef register and
then ask hasUndefRegUpdate if its an instruction we care about
and which operands of that instruction we care about.

I also had to make some changes to the load folding code to
always pass operand 1 to hasUndefRegUpdate. I've updated
hasUndefRegUpdate to return false when ForLoadFold is set for
instructions that are not explicitly blocked for load folding in
isel patterns.

Differential Revision: https://reviews.llvm.org/D79615
The file was modifiedllvm/test/CodeGen/X86/vector-shift-shl-256.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshr-rot-128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-ashr-256.ll
The file was modifiedllvm/test/CodeGen/X86/pr45833.ll
The file was modifiedllvm/test/CodeGen/X86/midpoint-int-vec-256.ll
The file was modifiedllvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-256.ll
The file was modifiedllvm/test/CodeGen/X86/mmx-arith.ll
The file was modifiedllvm/test/CodeGen/X86/cast-vsel.ll
The file was modifiedllvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-mul.ll
The file was modifiedllvm/test/CodeGen/X86/vshli-simplify-demanded-bits.ll
The file was modifiedllvm/test/CodeGen/X86/vec_umulo.ll
The file was modifiedllvm/test/CodeGen/X86/widen_mul.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshl-rot-256.ll
The file was modifiedllvm/test/CodeGen/X86/vector-ext-logic.ll
The file was modifiedllvm/test/CodeGen/X86/pr45563-2.ll
The file was modifiedllvm/test/CodeGen/X86/vec_usubo.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-ashr-512.ll
The file was modifiedllvm/test/CodeGen/X86/pmul.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-ashr-sub128.ll
The file was modifiedllvm/test/CodeGen/X86/combine-shl.ll
The file was modifiedllvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll
The file was modifiedllvm/test/CodeGen/X86/min-legal-vector-width.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshl-128.ll
The file was modifiedllvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-shl-sub128.ll
The file was modifiedllvm/test/CodeGen/X86/midpoint-int-vec-512.ll
The file was modifiedllvm/test/CodeGen/X86/midpoint-int-vec-128.ll
The file was modifiedllvm/test/CodeGen/X86/vec_setcc.ll
The file was modifiedllvm/test/CodeGen/X86/avx2-vector-shifts.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshl-rot-128.ll
The file was modifiedllvm/test/CodeGen/X86/vec_saddo.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-ashr-128.ll
The file was modifiedllvm/test/CodeGen/X86/vec_smulo.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshr-128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshl-256.ll
The file was modifiedllvm/test/CodeGen/X86/prefer-avx256-shift.ll
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-128.ll
The file was modifiedllvm/test/CodeGen/X86/avx2-arith.ll
The file was modifiedllvm/test/CodeGen/X86/vec_ssubo.ll
The file was modifiedllvm/test/CodeGen/X86/combine-mul.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshr-rot-256.ll
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-shift-shl-128.ll
The file was modifiedllvm/test/CodeGen/X86/prefer-avx256-wide-mul.ll
The file was modifiedllvm/test/CodeGen/X86/vec_uaddo.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshr-256.ll
The file was modifiedllvm/test/CodeGen/X86/vector-idiv-sdiv-256.ll
Commit d7258c6a833a9e2be37f7044ae68e47ff683cc3d by llvm-dev
[X86] Add XOP vector shift by scalar amount tests

Helps improve test coverage of the XOP modes in X86TargetLowering::isVectorShiftByScalarCheap
The file was modifiedllvm/test/CodeGen/X86/vector-shift-by-select-loop.ll
Commit f8b09f7b52030fcb078c1f99666256d3927a6eea by llvm-dev
[CodeGenPrepare][X86] Add x16i16, v32i8 and XOP vector shift by scalar amount tests

Helps improve test coverage of the XOP modes in X86TargetLowering::isVectorShiftByScalarCheap (and where we always return false for vXi8 vector shifts).
The file was modifiedllvm/test/Transforms/CodeGenPrepare/X86/vec-shift.ll
Commit beda9d04c284ab68073c6b7d5a858ee609b5311c by Matthew.Arsenault
AMDGPU: Skip GetUnderlyingObject check in pointsToConstantMemory

Check the address space first before searching for the object
definition to save compile time. As an added bonus, this will now
treat casts to constant addrspace as constant.

We also seemed to be missing targeted tests for this, so add a few
missing other cases too.
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
The file was addedllvm/test/CodeGen/AMDGPU/aa-points-to-constant-memory.ll
Commit a881dc1103579926f039e81c0d25626ff8a582a9 by Matthew.Arsenault
Fix typo
The file was modifiedclang/lib/CodeGen/TargetInfo.cpp
Commit 16295d521e294b27106e51fac29957c1aac8ff89 by Matthew.Arsenault
InstCombine: Broaden copy-constant-to-alloca optimization

Consider any constant memory type, not just global constants. AMDGPU
kernel parameters are effectively global constants, but appear as
either reads from an intrinsic derived pointer or function argument.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
The file was addedllvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
Commit ee1a69824d9a9fceea2b51616c3363c4d210af4c by arsenm2
GlobalISel: Combine G_UNMERGE_VALUES with G_TRUNC

G_BITCAST can be lowered with a pair of G_UNMERGE_VALUES and
G_MERGE_VALUES with different types, but G_UNMERGE_VALUES of a vector
can also be implemented with a bitcast to a scalar, which introduces
the possibility for infinite loops. Try to eliminate an illegal source
register type in the artifact combiner to avoid this from happening.

Avoids infinite looping in the legalizer in a future patch which
allows lowering G_UNMERGE_VALUES of a vector source with a G_BITCAST.
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-select.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sext-inreg.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-unmerge-values.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-or.mir
The file was removedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-bitcast.mir
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-select.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trunc.mir
The file was modifiedllvm/test/CodeGen/AArch64/arm64-vabs.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
The file was modifiedllvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
Commit 69999605ee91a4216ba1a29e2daa748d91212bad by arsenm2
GlobalISel: Move code into lowering for G_MERGE_VALUES

Currently this code exists in widenScalar for G_MERGE_VALUE
sources. I'm not sure if the existing expansion in widenScalar should
be removed or not. The widenScalar variant tries to extend to the
requested size, but this just uses the original bitwidth.
The file was modifiedllvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-local.mir
The file was modifiedllvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-merge-values.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-zextload-constant-32bit.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-constant-32bit.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def-s1025.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-private.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-constant.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-global.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-flat.mir
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-constant-32bit.mir
Commit c8fbcb1e78adcbaeadca9db9188771d81f49493a by phosek
[Clang] Pass --pack-dyn-relocs=relr to lld for Fuchsia

The compact format is fully supported on Fuchsia and is the
preferred default.

Patch By: mcgrathr

Differential Revision: https://reviews.llvm.org/D79665
The file was modifiedclang/test/Driver/fuchsia.c
The file was modifiedclang/lib/Driver/ToolChains/Fuchsia.cpp
Commit 5b02be0b973a0d792bf8ce39170487f48b6cbd08 by phosek
[Clang] Pass -z max-page-size to linker for Fuchsia

Currently all Fuchsia ABIs use a 4k page size, departing from
the recommended page sizes in the respective psABI documents.

Differential Revision: https://reviews.llvm.org/D79667
The file was modifiedclang/test/Driver/fuchsia.c
The file was modifiedclang/lib/Driver/ToolChains/Fuchsia.cpp
Commit 9237d88001cad7effd1e5dbe2a20a4412ab6262c by llvm-dev
[X86] isVectorShiftByScalarCheap - don't limit fast XOP vector shifts to 128-bit vectors

XOP targets have fast per-element vector shifts and we're better off splitting to 128-bit shifts where necessary (which is what we already do in LowerShift).
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-shift-by-select-loop.ll
The file was modifiedllvm/test/Transforms/CodeGenPrepare/X86/vec-shift.ll
Commit 57fb56b30e85c8e9662075c671d02fbdc37d8f3b by flo
[LAA] Remove unneeded PtrRtChecking argument (NFC).

The argument is not required and simplifies D78460 a bit.
The file was modifiedllvm/lib/Analysis/LoopAccessAnalysis.cpp
Commit 24b3c2d0585f2f96574e9819313ab05e8943ee02 by craig.topper
[BreakFalseDeps] Harden pickBestRegisterForUndef against changing tied operands or physical registers that aren't renamable.

I don't have any test cases since X86 doesn't return any tied
operands from getUndefRegClearance today. But conceivably we could
want BreakFalseDeps to insert a dependency breaking XOR for
a tied operand in the future.
The file was modifiedllvm/lib/CodeGen/BreakFalseDeps.cpp
Commit 3af85fa8f06220b43f03f26de216a67be4568fe7 by arsenm2
GlobalISel: Handle more cases in lowerUnmergeValues

Handle scalar sources, as well as vectors.
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sext.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-store-global.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-freeze.mir
The file was modifiedllvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-unmerge-values.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-anyext.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/legalize-zext.mir
The file was modifiedllvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
Commit 66db6f21292dec25487fd8d8d2c3f544950ade8e by craig.topper
[X86] Add test cases for vXi16 PMULH opportunities that don't end in truncate.

We already have matching for extend+mul+shift+trunc. But we could
also match up to the shift without the truncate and just extend the
result. That would still be a savings.
The file was modifiedllvm/test/CodeGen/X86/pmulh.ll