SuccessChanges

Changes from Git (git http://labmaster3.local/git/llvm-project.git)

Summary

  1. Revert "[RISCV] Make CanLowerReturn protected for downstream maintenance" (details)
  2. [CodeGen][SVE] Add patterns for whole vector predicate select (details)
  3. [libcxx testing] Remove ALLOW_RETRIES from sleep_for.pass.cpp (details)
  4. [Target][ARM] Replace re-uses of old VPR values with VPNOTs (details)
  5. [Target][ARM] Replace outdated getARMVPTBlockMask function (details)
  6. DebugCounter.h - remove unused includes. NFC. (details)
  7. FuzzerCLI.h - reduce StringRef.h include to forward declaration. NFC. (details)
  8. [X86][AVX] Use X86ISD::VPERM2X128 for blend-with-zero if optimizing for size (details)
  9. [NFC][AArch64] More casts tests... (details)
  10. [CUDA][HIP] Workaround for resolving host device function against wrong-sided function (details)
  11. [X86] combineX86ShuffleChain - use narrowShuffleMaskElts scale == 1 builtin handling. NFC. (details)
Commit 9d6064ec49ec189d2ff032927e41bb90ac471ae1 by tclin914
Revert "[RISCV] Make CanLowerReturn protected for downstream maintenance"

This reverts commit d775841d7d6ee3e8bbf3a420590be9bb19433eaa.
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.h
Commit 077d2d6802efefe6680cbae78f90e90ef7f04134 by sander.desmalen
[CodeGen][SVE] Add patterns for whole vector predicate select

Added patterns to implement `select i1 %p, <vty> %a, <vty> %b`

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79356
The file was addedllvm/test/CodeGen/AArch64/select-sve.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Commit 9e32bf550d13ffbc75671c0968b466e0e5c9dea2 by dave
[libcxx testing] Remove ALLOW_RETRIES from sleep_for.pass.cpp

Operating systems are best effort by default, so we cannot assume that
sleep-like APIs return as soon as we'd like.

Even if a sleep-like API returns when we want it to, the potential for
preemption means that attempts to measure time are subject to delays.
The file was modifiedlibcxx/test/libcxx/thread/thread.threads/thread.thread.this/sleep_for.pass.cpp
Commit bf2183374a6740a033db1f824b0c6a6e0d2e7ee4 by pierre.vanhoutryve
[Target][ARM] Replace re-uses of old VPR values with VPNOTs

Differential Revision: https://reviews.llvm.org/D76847
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
The file was modifiedllvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vpt-blocks.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-pred-not.ll
Commit 24bf8063d677f261f26d8771180cc08d51007a2e by pierre.vanhoutryve
[Target][ARM] Replace outdated getARMVPTBlockMask function

getARMVPTBlockMask was an outdated function that only handled basic
block masks: T, TT, TTT and TTTT. This worked fine before the MVE
VPT Block Insertion Pass improvements as it was the only kind of
masks that it could generate, but now it can generate more complex
masks that uses E predicates, so it's dangerous to use that function
to calculate VPT/VPST block masks.

I replaced it with 2 different functions:
  - expandPredBlockMask, in ARMBaseInfo. This adds an "E" or "T" at
    the end of an existing PredBlockMask.
  - recomputeVPTBlockMask, in Thumb2InstrInfo. This takes an iterator
    to a VPT/VPST instruction and recomputes its block mask by looking
    at the predicated instructions that follows it. This should be
    used to recompute a block mask after removing/adding a predicated
    instruction to the block.

The expandPredBlockMask function is pretty much imported from the MVE
VPT Blocks pass.

I had to change the ARMLowOverheadLoops and MVEVPTBlocks passes as well
so they could use these new functions.

Differential Revision: https://reviews.llvm.org/D78201
The file was modifiedllvm/lib/Target/ARM/Utils/ARMBaseInfo.h
The file was modifiedllvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
The file was modifiedllvm/lib/Target/ARM/Thumb2InstrInfo.cpp
The file was modifiedllvm/lib/Target/ARM/MVEVPTBlockPass.cpp
The file was modifiedllvm/lib/Target/ARM/Thumb2InstrInfo.h
The file was modifiedllvm/lib/Target/ARM/Utils/ARMBaseInfo.cpp
The file was modifiedllvm/lib/Target/ARM/ARMBaseInstrInfo.h
Commit e143253fa8bb382a33eb743b28341015d5d0d031 by llvm-dev
DebugCounter.h - remove unused includes. NFC.

Added explicit StringRef.h include as we need the full definition for several inline functions in DebugCounter.h.
The file was modifiedllvm/include/llvm/Support/DebugCounter.h
Commit 24ac6a2d7dd0551f9681239075834d37732831d2 by llvm-dev
FuzzerCLI.h - reduce StringRef.h include to forward declaration. NFC.
The file was modifiedllvm/include/llvm/FuzzMutate/FuzzerCLI.h
The file was modifiedllvm/lib/FuzzMutate/FuzzerCLI.cpp
Commit 45aa1b88534c74246773aab6dd33c3568bb25d24 by llvm-dev
[X86][AVX] Use X86ISD::VPERM2X128 for blend-with-zero if optimizing for size

Last part of PR22984 - avoid the zero-register dependency if optimizing for size
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/avx-vperm2x128.ll
Commit f1f8cffce49fe56817e25f648b29e1a8cfcfac8a by sam.parker
[NFC][AArch64] More casts tests...

Don't use truncs are users because sometimes they're free too.
The file was modifiedllvm/test/Analysis/CostModel/AArch64/cast.ll
Commit e03394c6a6ff5832aa43259d4b8345f40ca6a22c by Yaxun.Liu
[CUDA][HIP] Workaround for resolving host device function against wrong-sided function

recommit c77a4078e01033aa2206c31a579d217c8a07569b with fix

https://reviews.llvm.org/D77954 caused regressions due to diagnostics in implicit
host device functions.

For now, it seems the most feasible workaround is to treat implicit host device function and explicit host
device function differently. Basically in device compilation for implicit host device functions, keep the
old behavior, i.e. give host device candidates and wrong-sided candidates equal preference. For explicit
host device functions, favor host device candidates against wrong-sided candidates.

The rationale is that explicit host device functions are blessed by the user to be valid host device functions,
that is, they should not cause diagnostics in both host and device compilation. If diagnostics occur, user is
able to fix them. However, there is no guarantee that implicit host device function can be compiled in
device compilation, therefore we need to preserve its overloading resolution in device compilation.

Differential Revision: https://reviews.llvm.org/D79526
The file was modifiedclang/test/SemaCUDA/function-overload.cu
The file was modifiedclang/lib/Sema/SemaCUDA.cpp
The file was modifiedclang/include/clang/Sema/Sema.h
The file was modifiedclang/lib/Sema/SemaOverload.cpp
Commit 0387df7f02f9a0a0239b5a90f840e98b823bc6c1 by llvm-dev
[X86] combineX86ShuffleChain - use narrowShuffleMaskElts scale == 1 builtin handling. NFC.

narrowShuffleMaskElts already has the fast-path for scale == 1, no need to reimplement it here.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp