SuccessChanges

Summary

  1. [InstCombine] matchFunnelShift - remove shift value commutation. NFCI. (details)
  2. [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED) (details)
  3. [AMDGPU] Use @LINE for error checking in gfx10 assembler tests (details)
  4. [SVE] Lower fixed length VECREDUCE_XOR operation (details)
  5. [AMDGPU] Insert waterfall loops for divergent calls (details)
  6. [LoopDeletion] Remove over-eager SCEV verification. (details)
  7. [AMDGPU] Print metadata on error (details)
  8. [NFC][Regalloc] Pass VirtRegMap by reference. (details)
  9. [VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe. (details)
  10. [NFC][MC] Use MCRegister in LiveRangeMatrix (details)
  11. [Tests] Regenerate test checks; NFC (details)
  12. [GlobalISel][KnownBits] Early return on out of bound shift amounts (details)
  13. Revert 1c021c64c "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown" (details)
  14. [compiler-rt] Suppress -Wunused-result due to ::write when _FORTIFY_SOURCE>0 in glibc (details)
  15. Make likelihood lit test less brittle (details)
  16. [VPlan] Use VPValue def for VPMemoryInstructionRecipe. (details)
  17. Restore "[ThinLTO] Avoid temporaries when loading global decl attachment metadata" (details)
  18. [InstCombine] FoldShiftByConstant - merge equivalent types. NFCI. (details)
  19. [InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI. (details)
  20. [flang][openacc] Update Loop Construct lowering to use fir::getBase (details)
Commit fa566233706ce8345f2c0152b51312a217b848c9 by llvm-dev
[InstCombine] matchFunnelShift - remove shift value commutation. NFCI.

After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
Commit bbf3925879b56aea42daeecc794bb41e99ebc126 by llvm-dev
[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED)

If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values.

Differential Revision: https://reviews.llvm.org/D88783
The file was modifiedllvm/test/Transforms/InstCombine/rotate.ll
The file was modifiedllvm/test/Transforms/InstCombine/funnel.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
Commit b8901230c07eb363f8d61f71c436c36375b750db by jay.foad
[AMDGPU] Use @LINE for error checking in gfx10 assembler tests
The file was modifiedllvm/test/MC/AMDGPU/gfx10_unsupported.s
Commit 974ddb54c9adfb533f4bd9665ef902ebe75fa7ee by mcinally
[SVE] Lower fixed length VECREDUCE_XOR operation

Differential Revision: https://reviews.llvm.org/D88974
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AArch64/sve-fixed-length-log-reduce.ll
Commit 7f2a641aad28fd9b15fa1bcae1dd496150638d79 by sebastian.neubauer
[AMDGPU] Insert waterfall loops for divergent calls

Extend loadSRsrcFromVGPR to allow moving a range of instructions into
the loop. The call instruction is surrounded by copies into physical
registers which should be part of the waterfall loop.

Differential Revision: https://reviews.llvm.org/D88291
The file was modifiedllvm/lib/Target/AMDGPU/SIInstrInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/indirect-call.ll
Commit ad5541045a63fe3049fc910d843bcbb78f7c7056 by flo
[LoopDeletion] Remove over-eager SCEV verification.

60b852092c98dbdc6248d60109d90ae6f8ad841c introduced SCEV verification to
deleteDeadLoop, but it appears this check is currently a bit over-eager
and some users of deleteDeadLoop appear to only patch up SE after
calling it (e.g. PR47753).

Remove the extra check for now. We can consider adding it back after we
tracked down the source of the inconsistency for PR47753.
The file was modifiedllvm/lib/Transforms/Utils/LoopUtils.cpp
Commit c2216d796aab7659771c05303f9d78bad4aeca07 by sebastian.neubauer
[AMDGPU] Print metadata on error

If the metadata is valid yaml, we can print it, even if it failed
validation. That makes it easier to debug any wrong metadata.

Differential Revision: https://reviews.llvm.org/D89243
The file was modifiedllvm/tools/llvm-readobj/ELFDumper.cpp
Commit 596a9f6b89d0d3e3f2897132ef1283941bd3607b by mtrofin
[NFC][Regalloc] Pass VirtRegMap by reference.

It's never null - the reason it's modeled as a pointer is because the
pass can't init it in its ctor. Passing by ref simplifies the code, too,
as the null checks were unnecessary complexity.

Differential Revision: https://reviews.llvm.org/D89171
The file was modifiedllvm/lib/CodeGen/CalcSpillWeights.cpp
The file was modifiedllvm/include/llvm/CodeGen/CalcSpillWeights.h
The file was modifiedllvm/lib/CodeGen/RegAllocPBQP.cpp
The file was modifiedllvm/lib/CodeGen/RegAllocBasic.cpp
The file was modifiedllvm/lib/CodeGen/LiveRangeEdit.cpp
The file was modifiedllvm/lib/CodeGen/RegAllocGreedy.cpp
Commit ea058d289cbf54e5b33aac7f7a13d0d58625f1b9 by flo
[VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe.

Now that operands of the recipe are managed through VPUser, we can
simplify the printing by just using the operands.
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.h
Commit 43d347995c33a5f48f0b4d9cf3d541a1f6ba66c6 by mtrofin
[NFC][MC] Use MCRegister in LiveRangeMatrix

The change starts from LiveRangeMatrix and also checks the users of the
APIs are typed accordingly.

Differential Revision: https://reviews.llvm.org/D89145
The file was modifiedllvm/lib/CodeGen/RegAllocGreedy.cpp
The file was modifiedllvm/lib/CodeGen/RegAllocBase.cpp
The file was modifiedllvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
The file was modifiedllvm/include/llvm/CodeGen/LiveRegMatrix.h
The file was modifiedllvm/lib/CodeGen/RegAllocBase.h
The file was modifiedllvm/lib/CodeGen/RegAllocBasic.cpp
The file was modifiedllvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
The file was modifiedllvm/lib/CodeGen/LiveRegMatrix.cpp
The file was modifiedllvm/lib/Target/AMDGPU/GCNNSAReassign.cpp
Commit 2f66bfac280f9ae9299dccc357ae10e8a48525ed by Dávid Bolvanský
[Tests] Regenerate test checks; NFC
The file was modifiedllvm/test/Transforms/InstCombine/objsize.ll
The file was modifiedllvm/test/Transforms/InstCombine/cabs-discrete.ll
The file was modifiedllvm/test/Transforms/InstCombine/cabs-array.ll
The file was modifiedllvm/test/Transforms/InstCombine/fabs-libcall.ll
Commit 734112343917a011676c2915c5e5d29803a51ba6 by konstantin.schwarz
[GlobalISel][KnownBits] Early return on out of bound shift amounts

If the known shift amount is bigger than or equal to the bitwidth of the type of the value to be shifted,
the result is target dependent, so don't try to infer any bits.

This fixes a crash we've seen in one of our internal test suites.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89232
The file was modifiedllvm/unittests/CodeGen/GlobalISel/KnownBitsTest.cpp
The file was modifiedllvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
Commit 17cec6a11a12f815052d56a17ef738cf246a2d9a by hans
Revert 1c021c64c "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"

> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806

It caused the following assert during Chromium builds:

  llvm/lib/IR/Constants.cpp:1868:
  static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
  Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.

See code review for a link to a reproducer.

This reverts commit 1c021c64caef83cccb719c9bf0a2554faa6563af.
The file was modifiedllvm/lib/Analysis/ScalarEvolution.cpp
The file was modifiedllvm/test/CodeGen/ARM/lsr-undef-in-binop.ll
The file was modifiedllvm/test/CodeGen/X86/ragreedy-hoist-spill.ll
The file was modifiedllvm/lib/Transforms/Utils/SimplifyIndVar.cpp
The file was modifiedllvm/test/Analysis/ScalarEvolution/ptrtoint.ll
The file was modifiedllvm/test/Transforms/IndVarSimplify/2011-11-01-lftrptr.ll
The file was modifiedllvm/test/Analysis/ScalarEvolution/add-expr-pointer-operand-sorting.ll
The file was modifiedpolly/test/Isl/CodeGen/scev_looking_through_bitcasts.ll
The file was modifiedllvm/test/Analysis/ScalarEvolution/no-wrap-add-exprs.ll
Commit 1ef0e94d5b0206f69e4e822c6828d0b5121c11fb by i
[compiler-rt] Suppress -Wunused-result due to ::write when _FORTIFY_SOURCE>0 in glibc

Noticed by Peter Foley.
In glibc, ::write is declared as __attribute__((__warn_unused_result__)) when __USE_FORTIFY_LEVEL is larger than 0.
The file was modifiedcompiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp
The file was modifiedcompiler-rt/lib/scudo/standalone/linux.cpp
The file was modifiedcompiler-rt/lib/fuzzer/FuzzerIOPosix.cpp
Commit 551caec4a8af79483823e2940d40afb4c1df5da1 by koraq
Make likelihood lit test less brittle

Jeremy Morse discovered an issue with the lit test introduced in D88363. The
test gives different results for Sony's `-O1`.

The test needs to run at `-O1` otherwise the likelihood attribute will be
ignored. Instead of running all `-O1` passes it only runs the lower-expect pass
which is needed to lower `__builtin_expect`.

Differential Revision: https://reviews.llvm.org/D89204
The file was modifiedclang/test/CodeGenCXX/attr-likelihood-if-vs-builtin-expect.cpp
Commit 525b085a65d30a5f2ae2af38c0be252fe8d4781b by flo
[VPlan] Use VPValue def for VPMemoryInstructionRecipe.

This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84680
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/VPlanValue.h
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.h
Commit c27ab339ad8fcdd0abbe81ec9f44a440570de708 by tejohnson
Restore "[ThinLTO] Avoid temporaries when loading global decl attachment metadata"

This restores commit ab1b4810b55279bcf6fdd87be74a403440be3991 which was
reverted in 01b9deba76a950f04574b656c7c31ae389104f2d, with a fix for the
issue it caused. We should use a temporary BitstreamCursor when
loading the global decl attachment records so that the abbrev ids held
in the lazy loading IndexCursor are not clobbered. Enhanced the test so
that the issue is exposed there.

Original description:

When performing ThinLTO importing, the metadata loader attempts to lazy
load, by building an index. However, module level global decl attachment
metadata was being parsed early while building the index, since the
associated (module level) global values aren't materialized on demand.
This results in the creation of forward reference temporary metadatas,
which are expensive.

Normally, these module level global values don't have much attached
metadata. However, in the case of -fwhole-program-vtables (e.g. for
whole program devirtualization), the vtables may have many attached type
metadatas. This was resulting in very slow performance when performing
ThinLTO importing with the default lazy loading.

This patch restructures the handling of these global decl attachment
records, delaying their parsing until after the lazy loading index has
been built. Then the parser can use the interface that loads from the
index, which resolves forward references immediately instead of creating
expensive temporaries.

For one ThinLTO backend that imports from modules containing huge
numbers of vtables and associated types, I measured the following
compile times for the metadata materialization during function
importing, rounded to nearest second:

No -fwhole-program-vtables:
  Lazy loading on (head):  1s
  Lazy loading off (head): 3s
  Lazy loading on (patch): 1s

With -fwhole-program-vtables:
  Lazy loading on (head):  440s
  Lazy loading off (head): 4s
  Lazy loading on (patch): 2s

Differential Revision: https://reviews.llvm.org/D87970
The file was modifiedllvm/test/ThinLTO/X86/devirt2.ll
The file was modifiedllvm/lib/Bitcode/Reader/MetadataLoader.cpp
The file was modifiedllvm/test/ThinLTO/X86/Inputs/devirt2.ll
Commit 2de368f6a780f4cdbaf9cf8a4f803272f5de5938 by llvm-dev
[InstCombine] FoldShiftByConstant - merge equivalent types. NFCI.

Consistently use the original shift instruction's Type/BitWidth instead of the operands, casted values etc.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
Commit 24dd0cd1edd5e5a2cb3cc361c76a3751b4896132 by llvm-dev
[InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI.

There's no need to create constant vector splats manually.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
Commit 726a6e84be1892e90e2f3572a36480c0fe616119 by clementval
[flang][openacc] Update Loop Construct lowering to use fir::getBase

This patch update the loop construct lowring to match fir-dev changes.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D88914
The file was modifiedflang/lib/Lower/OpenACC.cpp