Changes

Summary

  1. [BOLT] Add AArch64 builder and worker (details)
Commit 69f0ee9d09665aa5ef5089eedbaca7c8ebb20ff7 by amir.aupov
[BOLT] Add AArch64 builder and worker

Reviewed By: gkistanova

Differential Revision: https://reviews.llvm.org/D124329
The file was modifiedbuildbot/osuosl/master/config/workers.py (diff)
The file was modifiedbuildbot/osuosl/master/config/builders.py (diff)

Summary

  1. [include-cleaner] Add missing deps from unittests (details)
  2. [X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0 (details)
  3. [AMDGPU] Simplify the test case for D124450 (details)
  4. [InstCombine] Add test for is_alpha check with logical or and nsw (NFC) (details)
  5. Reland "[lldb] Use shutil.which in Shell tests find_executable" (details)
  6. [DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling (details)
  7. AVRExpandPseudoInsts.cpp: Fix a warning. [-Wunused-but-set-variable] (details)
  8. llvm/Support/Debug.h: Suppress warnings with -Asserts. [-Wunused-variable] (details)
  9. [lldb] Allow EXE or exe in toolchain-msvc.test (details)
  10. [VPlan] Simplify & adjust code as suggested in D123005. (details)
  11. [mlir][linalg][transform] Add TileOp to transform dialect (details)
  12. [InstCombine] Fold logical and/or of range icmps with nowrap flags (details)
  13. [LV] Rename CountRoundDown to VectorTripCount (NFC) (details)
  14. [InstCombine] Remove memset of undef value (details)
  15. [flang] Handle common block with different sizes in same file (details)
  16. [SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize (details)
  17. [InstCombine] Add test for unused atomic load from non-constant global (NFC) (details)
  18. [OpenMP] Make generating offloading entries more generic (details)
  19. [OpenMP] Make clang argument handling for the new driver more generic (details)
  20. [Clang] Make enabling the new driver more generic (details)
  21. [CUDA] Add driver support for compiling CUDA with the new driver (details)
  22. [SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost. (details)
  23. [DAGCombiner] Stop invalid sign conversion in refineIndexType. (details)
  24. [clang] Eliminate TypeProcessingState::trivial. (details)
  25. [PowerPC][NFC] Add a function to determine if a call needs to be NOTOC. (details)
  26. [CompileTime] [Passes] Avoid computing unnecessary analyses. NFC (details)
  27. Additionally set f32 mode with denormal-fp-math (details)
  28. [SimplifyCFG] Avoid shifting by a too large exponent. (details)
  29. [COST]Fix crash for non-power-2 vector shuffle mask. (details)
  30. [X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC. (details)
  31. [InstCombine] Add additional tests for gep of minus ptrtoint (NFC) (details)
  32. [OpenMP] Add options to only compile the host or device when offloading (details)
  33. [InstCombine] Require LoopInfo in test (NFC) (details)
  34. [OpenMP] Allow CUDA to be linked with OpenMP using the new driver (details)
  35. [RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW. (details)
  36. [SLP][NFC]Fix a comment. (details)
  37. [RISCV] Extract getAllOnesMask helper [nfc] (details)
  38. [SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine. (details)
  39. [AMDGPU] Add gfx11 subtarget ELF definition (details)
  40. [RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc] (details)
  41. [Clang][Docs] Add new offloading flags to the clang documentation (details)
  42. [lldb] Define LLDB_VERSION_PATCH correctly (details)
  43. Silence -Wstrict-prototype diagnostics in C2x mode (details)
  44. Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM (details)
  45. Fix sphinx build error in AMDGPUUsage.rst (details)
  46. [randstruct] Automatically randomize a structure of function pointers (details)
  47. Add a mutex to the ThreadPlanStackMap class. (details)
  48. [mlir] Prevent argStorage relocations (details)
  49. Add a paragraph showing how to use container commands. (details)
  50. [flang] Fix build bot problem (details)
  51. [AMDGPU] Produce waitcounts for LDS DMA (details)
  52. [AMDGPU][clang] Definition of gfx11 subtarget (details)
  53. Fix the encoding and decoding of UniqueCStringMap<T> objects when saved to cache files. (details)
  54. [lldb] Fix initialization of LazyBool/bool variables m_overwrite/m_overwrite_lazy. NFCI. (details)
  55. [flang] Update Google Doc link for Flang Biweekly Sync call notes (details)
  56. [memprof] Correct comment in test [NFC] (details)
  57. [AsmParser] Introduce a new "Argument" abstraction + supporting logic (details)
  58. [InstCombine] add tests for FP<->int casts; NFC (details)
  59. [LoopCacheAnalysis] Enable delinearization of fixed sized arrays (details)
Commit 97b6c92dcd56937bc27de7c4c08381fc71c402e7 by github
[include-cleaner] Add missing deps from unittests
The file was modifiedclang-tools-extra/include-cleaner/unittests/CMakeLists.txt
Commit 3562f855b71e159908806d3152e81ef439d041ca by llvm-dev
[X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
Commit 5fa169335f7d086ef71547c6033a9357a617c1b8 by jay.foad
[AMDGPU] Simplify the test case for D124450
The file was modifiedllvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll
Commit 4cacd22418ceb31ad9277469a2a10d714c651deb by npopov
[InstCombine] Add test for is_alpha check with logical or and nsw (NFC)

The combination of logical or and nsw prevents the fold from
happening.
The file was modifiedllvm/test/Transforms/InstCombine/and-or-icmps.ll
Commit cacaa445c3a3a2551a6e2aef51414e47def9cc06 by david.spickett
Reland "[lldb] Use shutil.which in Shell tests find_executable"

This reverts commit d9247cc84825539d346c74eb1379c6cb948d3a71.

With the Windows tests updated to expect .EXE suffixes. This changed
because shutil.which uses PATHEXT which will contain, amongst others,
"EXE".

Also I noticed the "." in ".exe" was the wildcard dot not literal
dot so I've escaped those.
The file was modifiedlldb/test/Shell/helper/build.py
The file was modifiedlldb/test/Shell/BuildScript/toolchain-clang.test
The file was modifiedlldb/test/Shell/BuildScript/toolchain-clang-cl.test
The file was modifiedlldb/test/Shell/BuildScript/toolchain-msvc.test
Commit 7a0b897e8664d11481230a69a88fca2b2ee5f904 by paul.walker
[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling

refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:

  base(0) + index(A+splat(B)) => base(B) + index(A)

However, this is only safe when index is not implicitly scaled.

Differential Revision: https://reviews.llvm.org/D123222
The file was modifiedllvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Commit 61d3a3afe26fd49353d9fd37d0e7817a13c28659 by geek4civic
AVRExpandPseudoInsts.cpp: Fix a warning. [-Wunused-but-set-variable]

It has been enabled since llvmorg-15-init-5683-g2af845a6519c, aka D122271.
The file was modifiedllvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
Commit 2e6657b340f0f33414a97c79b3d1e37ad947ec7a by geek4civic
llvm/Support/Debug.h: Suppress warnings with -Asserts. [-Wunused-variable]

Re. setCurrentDebugTypes(X,N), the only user is llvm-ml.cpp (exc. DebugTests)
since llvmorg-15-init-8355-g82ecf9a0b1b3.

FIXME: X and N are evaluated regardless of NDEBUG.
Could we avoid evaluating (but w/o warnings) with NDEBUG?
The file was modifiedllvm/include/llvm/Support/Debug.h
Commit f8463da4a329b839cfd01d7f80ae72e18f3c061e by david.spickett
[lldb] Allow EXE or exe in toolchain-msvc.test

I suspect that one of link or cl is found by shutil.which
and one isn't, hence the case difference. It doesn't really
matter for what the test is looking for.
The file was modifiedlldb/test/Shell/BuildScript/toolchain-msvc.test
Commit e66127e69bfabe1b18857c3e3962125a9fe5aa7c by flo
[VPlan] Simplify & adjust code as suggested in D123005.

Improve code as suggested in D123005. Applied separately, because the
comments where made a diff that has not been rebased to current main.
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.cpp
Commit 3c2a74a3ae02d16e899e280953c055f92aa6cdaa by springerm
[mlir][linalg][transform] Add TileOp to transform dialect

This commit adds a tiling op to the transform dialect as an external op.

Differential Revision: https://reviews.llvm.org/D124661
The file was addedmlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
The file was addedmlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
The file was addedmlir/test/Dialect/Linalg/transform-ops.mlir
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
The file was addedmlir/include/mlir/Dialect/Linalg/TransformOps/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/CMakeLists.txt
The file was addedmlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h
The file was modifiedmlir/include/mlir/InitAllDialects.h
The file was addedmlir/lib/Dialect/Linalg/TransformOps/CMakeLists.txt
Commit 982cbed81920b474d41b63760cbde68680b15966 by npopov
[InstCombine] Fold logical and/or of range icmps with nowrap flags

This is an edge-case where we don't convert to bitwise and/or based
on implies poison reasoning, so explicitly try to perform the fold
in logical form. The transform itself is poison-safe, as both icmps
are based on the same value and any nowrap flags are discarded as
part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used
example).
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
The file was modifiedllvm/test/Transforms/InstCombine/and-or-icmps.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineInternal.h
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
Commit 24a133e16fc50a39e530c7795edabbdf3b2edd2d by flo
[LV] Rename CountRoundDown to VectorTripCount (NFC)

The name CountRoundDown is potentially misleading, as the number of
iterations can be rounded up when folding the tail.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119681
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Commit 1881711fbb7b0cd1b8d492b3ca4b70ce75824030 by npopov
[InstCombine] Remove memset of undef value

This removes memset with undef char. We already do this for stores
of undef value.

This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.

Once https://github.com/llvm/llvm-project/issues/52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.

Differential Revision: https://reviews.llvm.org/D124173
The file was modifiedllvm/test/Transforms/InstCombine/store.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
The file was modifiedllvm/test/Transforms/InstCombine/memset.ll
Commit 2c8cb9acb51e2fa74bf9339ddd0884ef9d921dfc by jperier
[flang] Handle common block with different sizes in same file

Semantics is not preventing a named common block to appear with
different size in a same file (named common block should always have
the same storage size (see Fortran 2018 8.10.2.5), but it is a common
extension to accept different sizes).

Lowering was not coping with this well, since it just use the first
common block appearance, starting with BLOCK DATAs to define common
blocks (this also was an issue with the blank common block, which can
legally appear with different size in different scoping units).

Semantics is also not preventing named common from being initialized
outside of a BLOCK DATA, and lowering was dealing badly with this,
since it only gave an initial value to common blocks Globals if the
first common block appearance, starting with BLOCK DATAs had an initial
value.

Semantics is also allowing blank common to be initialized, while
lowering was assuming this would never happen, and was never creating
an initial value for it.

Lastly, semantics was not complaining if a COMMON block was initialized
in several scoping unit in a same file, while lowering can only generate
one of these initial value.

To fix this, add a structure to keep track of COMMON block properties
(biggest size, and initial value if any) at the Program level. Once the
size of a common block appearance is know, the common block appearance
is checked against this information. It allows semantics to emit an error
in case of multiple initialization in different scopes of a same common
block, and to warn in case named common blocks appears with different
sizes. Lastly, this allows lowering to use the Program level info about
common blocks to emit the right GlobalOp for a Common Block, regardless
of the COMMON Block appearances order: It emits a GlobalOp with the
biggest size, whose lowest bytes are initialized with the initial value
if any is given in a scope where the common block appears.

Lowering is updated to go emit the common blocks before anything else so
that the related GlobalOps are available when lowering the scopes where
common block appear. It is also updated to not assume that blank common
are never initialized.

Differential Revision: https://reviews.llvm.org/D124622
The file was modifiedflang/lib/Lower/PFTBuilder.cpp
The file was addedflang/test/Semantics/common-blocks-warn.f90
The file was modifiedflang/test/Semantics/resolve42.f90
The file was modifiedflang/test/Lower/module_use.f90
The file was modifiedflang/include/flang/Lower/ConvertVariable.h
The file was modifiedflang/lib/Lower/Bridge.cpp
The file was modifiedflang/lib/Lower/ConvertVariable.cpp
The file was addedflang/test/Lower/common-block-2.f90
The file was modifiedflang/docs/Extensions.md
The file was modifiedflang/include/flang/Semantics/semantics.h
The file was modifiedflang/include/flang/Lower/PFTBuilder.h
The file was addedflang/test/Semantics/common-blocks.f90
The file was modifiedflang/test/Lower/common-block.f90
The file was modifiedflang/lib/Semantics/compute-offsets.cpp
The file was modifiedflang/lib/Semantics/semantics.cpp
The file was modifiedflang/test/Lower/module_definition.f90
The file was modifiedflang/test/Lower/pointer-initial-target-2.f90
Commit 027c728f29889ea6502030ec3623774d830c2ac3 by npopov
[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize

This is an alternative to D124530. In getUniformBase() only create
scales that match the gather/scatter element size. If targets also
support other scales, then they can produce those scales in target
DAG combines. This is what X86 already does (as long as the
resulting scale would be 1, 2, 4 or 8).

This essentially restores the pre-opaque-pointer state of things.

Fixes https://github.com/llvm/llvm-project/issues/55021.

Differential Revision: https://reviews.llvm.org/D124605
The file was modifiedllvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
The file was addedllvm/test/CodeGen/X86/gather-scatter-opaque-ptr-2.ll
The file was addedllvm/test/CodeGen/X86/gather-scatter-opaque-ptr.ll
Commit 5b524da42f6817920eb1f1dd30f8a2dc3241d614 by npopov
[InstCombine] Add test for unused atomic load from non-constant global (NFC)
The file was modifiedllvm/test/Transforms/InstCombine/atomic.ll
Commit 643c9b22ef527be8532d7b75ccf64180fa060339 by jhuber6
[OpenMP] Make generating offloading entries more generic

This patch moves the logic for generating the offloading entries to the
OpenMPIRBuilder. This makes it easier to re-use in other places, such as
for OpenMP support in Flang or using the same method for generating
offloading entires for other languages like Cuda.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D123460
The file was modifiedllvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
The file was modifiedllvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
The file was modifiedclang/lib/CodeGen/CGOpenMPRuntime.h
The file was modifiedclang/lib/CodeGen/CGOpenMPRuntime.cpp
The file was modifiedllvm/include/llvm/Frontend/OpenMP/OMPKinds.def
Commit ca6bbe008512c9dc6a1ac242466a9d42288daff8 by jhuber6
[OpenMP] Make clang argument handling for the new driver more generic

In preparation for accepting other offloading kinds with the new driver,
this patch makes the way we handle offloading actions more generic. A
new field to get the associated device action's toolchain is used rather
than manually iterating a list. This makes building the arguments easier
and makes sure that we doin't rely on any implicit ordering.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D123313
The file was modifiedclang/lib/Driver/Action.cpp
The file was modifiedclang/include/clang/Driver/Action.h
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
Commit 4e2b5a6693e299fcb8671d4dbb69c993d181b29f by jhuber6
[Clang] Make enabling the new driver more generic

In preparation for allowing other offloading kinds to use the new driver
a new opt-in flag `-foffload-new-driver` is added. This is distinct from
the existing `-fopenmp-new-driver` because OpenMP will soon use the new
driver by default while the others should not.

Reviewed By: yaxunl, tra

Differential Revision: https://reviews.llvm.org/D123325
The file was modifiedclang/lib/Driver/Driver.cpp
The file was modifiedclang/include/clang/Driver/Options.td
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
Commit c5e5b54350fecd4b44c60eb4e982c13de5307aee by jhuber6
[CUDA] Add driver support for compiling CUDA with the new driver

This patch adds the basic support for the clang driver to compile and link CUDA
using the new offloading driver. This requires handling the CUDA offloading kind
and embedding the generated files into the host. This will allow us to link
OpenMP code with CUDA code in the linker wrapper. More support will be required
to create functional CUDA / HIP binaries using this method.

Depends on D120270 D120271 D120934

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D120272
The file was addedclang/test/Driver/cuda-openmp-driver.cu
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
The file was modifiedclang/lib/Driver/Driver.cpp
The file was modifiedclang/test/Driver/cuda-phases.cu
The file was modifiedclang/include/clang/Basic/Cuda.h
The file was modifiedclang/include/clang/Basic/DiagnosticDriverKinds.td
Commit 59588f0a3d47e3e366d675b8f9724c10a6222c0e by paul.walker
[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost.

getGatherScatterIndexIsExtended currently looks through all
SIGN_EXTEND_INREG operations regardless of their input type.  This
patch restricts the code to only look through i32->i64 extensions,
which are the ones supported implicitly by SVE addressing modes.

Differential Revision: https://reviews.llvm.org/D123318
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll
Commit 23c509754d4b81e5503f8da5caa3d4c00af85afb by paul.walker
[DAGCombiner] Stop invalid sign conversion in refineIndexType.

When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.

Depends On D123318

Differential Revision: https://reviews.llvm.org/D123326
The file was modifiedllvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit 23c10e8d0f97dc38c9f620541c7f3ffd04bef905 by mboehme
[clang] Eliminate TypeProcessingState::trivial.

This flag is redundant -- it's true iff `savedAttrs` is empty.

Querying `savedAttrs.empty()` should not take any more time than querying the
`trivial` flag, so this should not have a performance impact either.

I noticed this while working on https://reviews.llvm.org/D111548.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D123783
The file was modifiedclang/lib/Sema/SemaType.cpp
Commit f685bce8080cd817adc67a7c06fb6834fa356139 by stefanp
[PowerPC][NFC] Add a function to determine if a call needs to be NOTOC.

Add the isNoTOCCallInstr function to PPCInstrInfo to determine if a call opcode
does not need a TOC restore after the call. All call opcodes should be listed in
this function. A default unreachable in this function should force future call
opcodes to also be added.

This is a follow up patch to D122012

Reviewed By: jsji, shchenz

Differential Revision: https://reviews.llvm.org/D124415
The file was modifiedllvm/lib/Target/PowerPC/PPCInstrInfo.h
The file was modifiedllvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
Commit 205246cb64358aa6f03b54d47d73708122d76bbf by anna
[CompileTime] [Passes] Avoid computing unnecessary analyses. NFC

Similar to c515b2f39e77, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI).  Reordered the analyses requests and early return
if there are no loops.

The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.

This is an NFC with compile time improvement.

Differential Revision: https://reviews.llvm.org/D124529
The file was modifiedllvm/test/Transforms/SCCP/preserve-analysis.ll
The file was modifiedllvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
Commit 9e7c9967c3fd573ef53b145e24e6a1e6ba930c82 by david.candler
Additionally set f32 mode with denormal-fp-math

When the denormal-fp-math option is used, this should set the
denormal handling mode for all floating point types. However,
currently 32-bit float types can ignore this setting as there is a
variant of the option, denormal-fp-math-f32, specifically for that type
which takes priority when checking the mode based on type and remains
at the default of IEEE. From the description, denormal-fp-math would
be expected to set the mode for floats unless overridden by the f32
variant, and code in the front end only emits the f32 option if it is
different to the general one, so setting just denormal-fp-math should
be valid.

This patch changes the denormal-fp-math option to also set the f32
mode. If denormal-fp-math-f32 is also specified, this is then
overridden as expected, but if it is absent floats will be set to the
mode specified by the former option, rather than remain on the default.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122589
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
The file was modifiedclang/lib/Frontend/CompilerInvocation.cpp
The file was addedclang/test/CodeGen/denormalfpmode-f32.c
Commit a80081763cb3792bf69ee95ee73c8754f1bfe074 by flo
[SimplifyCFG] Avoid shifting by a too large exponent.

TI->getBitWidth can be > 64 and in those cases the shift will be UB due
to the exponent being too large.

To fix this, cap the shift at 63. I think this should work out fine,
because TableSize is itself a 64 bit type and the maximum table size
must fit in the type. Also, if we would underestimate the size here, at
most we get an extra ZExt.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124608
The file was addedllvm/test/Transforms/SimplifyCFG/X86/switch-to-lookup-large-types.ll
The file was modifiedllvm/lib/Transforms/Utils/SimplifyCFG.cpp
Commit 371412e065a63107d5d79330da6757ff693d91cc by a.bataev
[COST]Fix crash for non-power-2 vector shuffle mask.

Need to normalizize the mask to avoid possible crashes during attempts
to estimate cost of the very long shuffles with non-power-2 number of
elements in masks.
The file was modifiedllvm/test/Analysis/CostModel/X86/shuffle-non-pow-2.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit b424055b52a52c0a2ae8eb08de1460f7cfb4fb43 by llvm-dev
[X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC.

This is a NFC cleanup as part of the work on #55066 - the idea being that we will be able to check for multiple sub lane scales.
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit b3826192fb6e3f7f05ff21911f5f948ad5eabcdc by npopov
[InstCombine] Add additional tests for gep of minus ptrtoint (NFC)
The file was modifiedllvm/test/Transforms/InstCombine/constant-fold-gep.ll
Commit 47d66255701a5cfeab6c05e3642a2cccf7a4c09f by jhuber6
[OpenMP] Add options to only compile the host or device when offloading

OpenMP recently moved to the new offloading driver, this had the effect
of making it more difficult to inspect intermediate code for the device.
This patch adds `-foffload-host-only` and `-foffload-device-only` to
control which sides get compiled. This will allow users to more easily
inspect output without needing the temp files.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124220
The file was modifiedclang/include/clang/Driver/Options.td
The file was modifiedclang/test/Driver/openmp-offload-gpu-new.c
The file was modifiedclang/lib/Driver/Driver.cpp
The file was modifiedclang/test/Driver/cuda-openmp-driver.cu
Commit 6aeb2a215ee170a0abbc4a99f75065816fcff744 by npopov
[InstCombine] Require LoopInfo in test (NFC)

This test case doesn't show what it was intended to without
require<loops>.
The file was modifiedllvm/test/Transforms/InstCombine/constant-fold-gep.ll
Commit d9c64d33b98be695fc78a65624242033058ed117 by jhuber6
[OpenMP] Allow CUDA to be linked with OpenMP using the new driver

After basic support for embedding and handling CUDA files was added to
the new driver, we should be able to call CUDA functions from OpenMP
code. This patch makes the necessary changes to successfuly link in CUDA
programs that were compiled using the new driver. With this patch it
should be possible to compile device-only CUDA code (no kernels) and
call it from OpenMP as follows:

```
$ clang++ cuda.cu -fopenmp-new-driver -offload-arch=sm_70 -c
$ clang++ openmp.cpp cuda.o -fopenmp-new-driver -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_70
```

Currently this requires using a host variant to suppress the generation
of a CPU-side fallback call.

Depends on D120272

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120273
The file was modifiedclang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
The file was modifiedclang/test/Driver/linker-wrapper.c
Commit 5c3837312503b4ef8443951194127c4ba2a03153 by craig.topper
[RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW.

It's possible that we have a constant that isn't simm32 so we can't
use LUI+ADDIW, but we can use LUI+ADDI. Because ADDI uses a sign
extended constant, it's possible that after subtracting it out, we
end up with a simm32 that maps to LUI.

This patch detects this case after removing Lo12 and before shifting
the value for SLLI.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124222
The file was modifiedllvm/test/CodeGen/RISCV/rv64zbs.ll
The file was modifiedllvm/test/CodeGen/RISCV/imm.ll
The file was modifiedllvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
The file was modifiedllvm/test/MC/RISCV/rv64zbs-aliases-valid.s
The file was modifiedllvm/test/MC/RISCV/rv64i-aliases-valid.s
Commit 484fcb98883ffc43d2daab6f29e2569399950936 by a.bataev
[SLP][NFC]Fix a comment.
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit f927be0df8a599942314f761b672cb5faee69a0f by preames
[RISCV] Extract getAllOnesMask helper [nfc]
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.cpp
Commit b481512485a87a5510bf28f63cc512ad26c075a8 by paul.walker
[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine.

This is essentially a refactoring patch but allows more cases to
be caught, hence the output changes to some tests.

Differential Revision: https://reviews.llvm.org/D122994
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll
The file was modifiedllvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll
The file was modifiedllvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll
Commit 813e521e55b11165138b071f446eda94b14570dc by Joseph.Nash
[AMDGPU] Add gfx11 subtarget ELF definition

This is the first patch of a series to upstream support for the new
subtarget.

Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 1/N for upstreaming AMDGPU gfx11 architectures.

Reviewed By: foad, kzhuravl, #amdgpu

Differential Revision: https://reviews.llvm.org/D124536
The file was modifiedllvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
The file was modifiedllvm/tools/llvm-readobj/ELFDumper.cpp
The file was modifiedllvm/docs/AMDGPUUsage.rst
The file was modifiedllvm/include/llvm/Support/TargetParser.h
The file was modifiedclang/test/Misc/target-invalid-cpu-note.c
The file was modifiedllvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
The file was modifiedllvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
The file was modifiedllvm/lib/ObjectYAML/ELFYAML.cpp
The file was modifiedllvm/include/llvm/BinaryFormat/ELF.h
The file was modifiedllvm/lib/Support/TargetParser.cpp
The file was modifiedllvm/lib/Object/ELFObjectFile.cpp
Commit 3ea191ed03d40489357c5069aedd3383abb3ad58 by preames
[RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc]
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.cpp
Commit 9c8a88382d86c731db3c5c92d8ecd6ef296329ab by jhuber6
[Clang][Docs] Add new offloading flags to the clang documentation

Summary:
Some previous patches introduced the `--offload-new-driver` flag, which
is a generic way to enable the new driver, and the `--offload-host-only`
and `--offload-device-only` flags which allow users to compile for one
side, making it easier to inspect intermediate code for offloading
compilations. This patch just documents them in the command line
reference.
The file was modifiedclang/docs/ClangCommandLineReference.rst
Commit 7abfaa0a815a37ef6abd3ad7eb169007bdc36619 by dimitry
[lldb] Define LLDB_VERSION_PATCH correctly

In commit ccf1469a4cdb lldb got its own generated Version.inc file, with
`LLDB_VERSION` macros. However, it used `LLDB_VERSION_PATCHLEVEL`
instead of the actually correct `LLDB_VERSION_PATCH`. Correct this.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D124672
The file was modifiedlldb/include/lldb/Version/Version.inc.in
The file was modifiedllvm/utils/gn/secondary/lldb/include/lldb/Version/BUILD.gn
Commit ef87865b98fa25af1d2c045bab1268b2a1503374 by aaron
Silence -Wstrict-prototype diagnostics in C2x mode

This also disables the diagnostic when the user passes -fno-knr-functions.
The file was addedclang/test/Sema/c2x-warn-strict-prototypes.c
The file was modifiedclang/lib/Sema/SemaType.cpp
The file was modifiedclang/docs/ReleaseNotes.rst
Commit dcb77643e3440e948010ed8ecb4c2f8fe4fadb93 by david.penry
Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM

Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f
reverted in: fa49021c68ef7a7adcdf7b8a44b9006506523191

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
The file was modifiedllvm/lib/Target/ARM/ARMSubtarget.h
The file was addedllvm/test/CodeGen/Thumb2/swp-exitbranchdir.mir
The file was modifiedllvm/lib/Target/ARM/ARM.td
The file was modifiedllvm/lib/Target/ARM/ARMTargetMachine.cpp
The file was modifiedllvm/lib/Target/ARM/ARMSubtarget.cpp
The file was modifiedllvm/include/llvm/CodeGen/ModuloSchedule.h
The file was modifiedllvm/lib/CodeGen/ModuloSchedule.cpp
The file was modifiedllvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
The file was modifiedllvm/include/llvm/CodeGen/MachinePipeliner.h
The file was addedllvm/test/CodeGen/Thumb2/swp-fixedii.mir
The file was modifiedllvm/test/CodeGen/ARM/O3-pipeline.ll
The file was modifiedllvm/lib/Target/ARM/ARMBaseInstrInfo.h
The file was modifiedllvm/lib/CodeGen/MachinePipeliner.cpp
Commit ec6d1a0278dd22606f085c3584e8d3a26a4478c1 by Joseph.Nash
Fix sphinx build error in AMDGPUUsage.rst

Corrects error from
813e521e55b11165138b071f446eda94b14570dc
The file was modifiedllvm/docs/AMDGPUUsage.rst
Commit 6f79700830292d86afec5f3cf5143b00e6f3f1fd by isanbard
[randstruct] Automatically randomize a structure of function pointers

Strutures of function pointers are a good surface area for attacks. We
should therefore randomize them unless explicitly told not to.

Reviewed By: aaron.ballman, MaskRay

Differential Revision: https://reviews.llvm.org/D123544
The file was modifiedclang/lib/Sema/SemaDecl.cpp
The file was modifiedclang/unittests/AST/RandstructTest.cpp
Commit dca2bc408186667346ab3bbb951adab44feba5bd by jingham
Add a mutex to the ThreadPlanStackMap class.
We've seen very occasional crashes that we can only explain by
simultaneous access to the ThreadPlanStackMap, so I'm adding a
mutex to protect it.

Differential Revision: https://reviews.llvm.org/D124029
The file was modifiedlldb/include/lldb/Target/ThreadPlanStack.h
The file was modifiedlldb/source/Target/ThreadPlanStack.cpp
Commit f735b3a2b0ce348a5e3149622325c2d676703ef0 by Vitaly Buka
[mlir] Prevent argStorage relocations

This fixes msan reports like https://reviews.llvm.org/P8285

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D124576
The file was modifiedmlir/lib/Dialect/ControlFlow/IR/ControlFlowOps.cpp
Commit aa7470a1b31346f3509ba4b831be39fa1d327772 by jingham
Add a paragraph showing how to use container commands.

Differential Revision: https://reviews.llvm.org/D124028
The file was modifiedlldb/docs/use/python-reference.rst
Commit 5a7936401c0a5a8959273108051b094ffba84b8c by pklausler
[flang] Fix build bot problem

A recent change is eliciting a valid warning from the out-of-tree
flang build bot; fix by using a reference in a range-based for().

Differential Revision: https://reviews.llvm.org/D124682
The file was modifiedflang/lib/Lower/ConvertVariable.cpp
Commit 51e02409f0220c796d34f72b3a4b8ba3d3f34cb9 by Stanislav.Mekhanoshin
[AMDGPU] Produce waitcounts for LDS DMA

MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written
can be accessed. A load from LDS to VMEM does not need a wait.

Differential Revision: https://reviews.llvm.org/D124626
The file was modifiedllvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
The file was addedllvm/test/CodeGen/AMDGPU/lds-dma-waitcnt.mir
Commit 8bdfc73f633dca9859123b8596bcb521700c6a7f by Joseph.Nash
[AMDGPU][clang] Definition of gfx11 subtarget

Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537
The file was modifiedclang/lib/Basic/Targets/AMDGPU.cpp
The file was modifiedclang/include/clang/Basic/Cuda.h
The file was modifiedclang/test/Driver/amdgpu-mcpu.cl
The file was modifiedclang/test/Misc/target-invalid-cpu-note.c
The file was modifiedclang/lib/Basic/Cuda.cpp
The file was modifiedclang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
The file was modifiedclang/test/Driver/amdgpu-macros.cl
The file was modifiedclang/test/CodeGenOpenCL/amdgpu-features.cl
The file was modifiedclang/lib/Basic/Targets/NVPTX.cpp
Commit 268089b6ac4bca2c87b272a61d7dcc6ad3e752e4 by gclayton
Fix the encoding and decoding of UniqueCStringMap<T> objects when saved to cache files.

UniqueCStringMap<T> objects are a std::vector<UniqueCStringMap::Entry> objects where the Entry object contains a ConstString + T. The values in the vector are sorted first by ConstString and then by the T value. ConstString objects are simply uniqued "const char *" values and when we compare we use the actual string pointer as the value we sort by. This caused a problem when we saved the symbol table name indexes and debug info indexes to disk in one process when they were sorted, and then loaded them into another process when decoding them from the cache files. Why? Because the order in which the ConstString objects were created are now completely different and the string pointers will no longer be sorted in the new process the cache was loaded into.

The unit tests created for the initial patch didn't catch the encoding and decoding issues of UniqueCStringMap<T> because they were happening in the same process and encoding and decoding would end up createing sorted UniqueCStringMap<T> objects due to the constant string pool being exactly the same.

This patch does the sort and also reserves the right amount of entries in the UniqueCStringMap::m_map prior to adding them all to avoid doing multiple allocations.

Added a unit test that loads an object file from yaml, and then I created a cache file for the original file and removed the cache file's signature mod time check since we will generate an object file from the YAML, and use that as the object file for the Symtab object. Then we load the cache data from the array of symtab cache bytes so that the ConstString "const char *" values will not match the current process, and verify we can lookup the 4 names from the object file in the symbol table.

Differential Revision: https://reviews.llvm.org/D124572
The file was modifiedlldb/source/Plugins/SymbolFile/DWARF/NameToDIE.cpp
The file was modifiedlldb/unittests/Symbol/SymtabTest.cpp
The file was modifiedlldb/source/Symbol/Symtab.cpp
Commit 2a84a86184392a7e18a958f36db0b2b3da6ae2bd by martin
[lldb] Fix initialization of LazyBool/bool variables m_overwrite/m_overwrite_lazy. NFCI.

This silences a GCC warning after
1f7b58f2a50461493f083b2ed807b25e036286f6 / D122680:

lldb/source/Commands/CommandObjectCommands.cpp:1650:22: warning: enum constant in boolean context [-Wint-in-bool-context]
1650 |   bool m_overwrite = eLazyBoolCalculate;
      |                      ^~~~~~~~~~~~~~~~~~

Differential Revision: https://reviews.llvm.org/D123204
The file was modifiedlldb/source/Commands/CommandObjectCommands.cpp
Commit 68ee5ad0082c0a72ff16a6571c9db7054afd6ea3 by aperry
[flang] Update Google Doc link for Flang Biweekly Sync call notes

Notes from the Flang Biweekly Sync calls have been merged into the same document as the notes from the Flang Technical calls. This patch updates the link in the GettingInvolved document to point to the new location.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D124689
The file was modifiedflang/docs/GettingInvolved.md
Commit 6e689cbaf412effbef392acbb3d123ad2c3b8eb5 by tejohnson
[memprof] Correct comment in test [NFC]

Correct comment referring incorrectly to address sanitizer (from which
the memprof tests were originally forked).
The file was modifiedllvm/test/Instrumentation/HeapProfiler/basic.ll
Commit d85eb4e2d62e51645922ec17678a319b3c7d872c by clattner
[AsmParser] Introduce a new "Argument" abstraction + supporting logic

MLIR has a common pattern for "arguments" that uses syntax
like `%x : i32 {attrs} loc("sourceloc")` which is implemented
in adhoc ways throughout the codebase.  The approach this uses
is verbose (because it is implemented with parallel arrays) and
inconsistent (e.g. lots of things drop source location info).

Solve this by introducing OpAsmParser::Argument and make addRegion
(which sets up BlockArguments for the region) take it.  Convert the
world to propagating this down.  This means that we correctly
capture and propagate source location information in a lot more
cases (e.g. see the affine.for testcase example), and it also
simplifies much code.

Differential Revision: https://reviews.llvm.org/D124649
The file was modifiedmlir/include/mlir/IR/FunctionImplementation.h
The file was modifiedmlir/lib/Parser/Parser.cpp
The file was modifiedmlir/lib/Dialect/Affine/IR/AffineOps.cpp
The file was modifiedmlir/lib/Dialect/PDLInterp/IR/PDLInterp.cpp
The file was modifiedmlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
The file was modifiedmlir/include/mlir/IR/OpImplementation.h
The file was modifiedmlir/test/Dialect/GPU/invalid.mlir
The file was modifiedmlir/lib/Dialect/Async/IR/Async.cpp
The file was modifiedmlir/lib/Dialect/GPU/IR/GPUDialect.cpp
The file was modifiedmlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
The file was modifiedflang/lib/Optimizer/Dialect/FIROps.cpp
The file was modifiedmlir/test/IR/locations.mlir
The file was modifiedmlir/lib/IR/FunctionImplementation.cpp
The file was modifiedmlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
The file was modifiedmlir/lib/Dialect/SPIRV/IR/SPIRVOps.cpp
The file was modifiedmlir/lib/Parser/AttributeParser.cpp
The file was modifiedmlir/test/lib/Dialect/Test/TestDialect.cpp
The file was modifiedmlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
The file was modifiedmlir/lib/Dialect/SCF/SCF.cpp
Commit ac33c335bd9997e80c25c384b5a559cdeac6ba68 by spatel
[InstCombine] add tests for FP<->int casts; NFC

This overlaps with at least some existing tests,
but the smaller types should be faster for alive2
to verify. We know that at least one of these is
currently wrong (miscompile) as shown in #55150.
The file was modifiedllvm/test/Transforms/InstCombine/sitofp.ll
Commit c428a3d2a09e2d144911290920b1fa59953d7898 by congzhecao
[LoopCacheAnalysis] Enable delinearization of fixed sized arrays

Currently loop cache cost (LCC) cannot analyze fix-sized arrays
since it cannot delinearize them. This patch adds the capability
to delinearize fix-sized arrays to LCC. Most of the code is ported
from DependenceAnalysis.cpp and some refactoring will be done in a
next patch.

Reviewed By: #loopoptwg, Meinersbur

Differential Revision: https://reviews.llvm.org/D122857
The file was modifiedllvm/lib/Analysis/LoopCacheAnalysis.cpp
The file was modifiedllvm/include/llvm/Analysis/LoopCacheAnalysis.h
The file was addedllvm/test/Analysis/LoopCacheAnalysis/PowerPC/LoopnestFixedSize.ll

Summary

  1. [BOLT] Add AArch64 builder and worker (details)
Commit 69f0ee9d09665aa5ef5089eedbaca7c8ebb20ff7 by amir.aupov
[BOLT] Add AArch64 builder and worker

Reviewed By: gkistanova

Differential Revision: https://reviews.llvm.org/D124329
The file was modifiedbuildbot/osuosl/master/config/builders.py
The file was modifiedbuildbot/osuosl/master/config/workers.py