SuccessChanges

Summary

  1. [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences (details)
  2. [SystemZ] Support builtin_frame_address with packed stack without backchain. (details)
  3. [AMDGPU] Regenerate shift tests. NFCI. (details)
  4. [AMDGPU] Regenerate fp2int tests. NFCI. (details)
  5. [mlir] Add support for ops with regions in 'gpu-async-region' rewriter. (details)
  6. [LLD] Improve --strip-all help text (details)
  7. [LV] Account for tripcount when calculation vectorization profitability (details)
  8. [ORC] Silence unused variable warnings in Release builds. NFC. (details)
  9. Revert "[ARM] Transforming memcpy to Tail predicated Loop" (details)
  10. [AMDGPU] Fix WQM failure with single block inactive demote (details)
  11. [amdgpu-arch] Fix rpath to run from build dir (details)
  12. [OpenCL] Remove subgroups pragma in enqueue kernel and pipe builtins. (details)
  13. [TableGen] [Clang] Clean up Options.td and add asserts. (details)
  14. [PowerPC] Provide some P8-specific altivec overloads for P7 (details)
  15. [AMDGPU] SIInsertHardClauses: move more stuff into the class. NFC. (details)
  16. [lldb][NFC] Make assert in TestStaticVariables more expressive (details)
  17. Revert "[PowerPC] Provide some P8-specific altivec overloads for P7" (details)
  18. [AIX][TLS] Add support for TLSGD relocations to XCOFF objects (details)
  19. [libc++] Rewrite std::to_address to avoid relying on element_type (details)
  20. [OpenMP] Temporarily require X86 target for parallel_for_codegen.cpp test (details)
  21. [AMDGPU][NFC] Fix typos in SIFormMemoryClauses description (details)
  22. [PowerPC] Re-commit ed87f512bb9eb5c1d44e9a1182ffeaf23d6c5ae8 (details)
  23. [mlir][vector] add pattern to cast away lead unit dimension for broadcast op (details)
  24. [mlir][NFC] Fix warning in VectorTransforms.cpp (details)
  25. [lld-macho][nfc] Convert the mock libSystem.tbd to TBDv4 (details)
  26. [lld-macho] Support loading of zippered dylibs (details)
  27. [SLP] Use empty() instead of size() == 0. NFCI. (details)
  28. [SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI. (details)
  29. [SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI. (details)
  30. [AMDGPU] Fix 64 bit DPP validation (details)
  31. [clangd][ObjC] Highlight Objc Ivar refs (details)
  32. [LangRef][VP] Fix typos in VP sdiv/udiv examples (details)
  33. [RISCV] Cleanup instruction formats used for B extension ternary operations. (details)
  34. [SystemZ] Don't use libcall for 128 bit shifts. (details)
  35. Fix array attribute in bindings for linalg.init_tensor (details)
  36. [AIX][Test][ORC] Skip unsupported ORC C API tests on AIX (details)
Commit 8c9742bd239af602ee2743baa3c4281f24d45df1 by kerry.mclaughlin
[SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences

Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/lib/IR/IRBuilder.cpp
The file was addedllvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
The file was modifiedllvm/include/llvm/IR/IRBuilder.h
The file was addedllvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
Commit a0da66bc1330f9808ed9814aaa9c3c3d3244852d by paulsson
[SystemZ] Support builtin_frame_address with packed stack without backchain.

In order to use __builtin_frame_address(0) with packed stack and no
backchain, the address of where the backchain would have been written is
returned (like GCC).

This address may either contain a saved register or be unused.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D101897
The file was modifiedllvm/test/CodeGen/SystemZ/frameaddr-02.ll
The file was modifiedllvm/lib/Target/SystemZ/SystemZISelLowering.cpp
Commit 20e976e2487f5b52541772e6e92954ebf2dcf13e by llvm-dev
[AMDGPU] Regenerate shift tests. NFCI.
The file was modifiedllvm/test/CodeGen/AMDGPU/srl.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/shl.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/sra.ll
Commit 0fdce16efb281ab52e1aa5a7a760aebcb7a59163 by llvm-dev
[AMDGPU] Regenerate fp2int tests. NFCI.
The file was modifiedllvm/test/CodeGen/AMDGPU/fp_to_uint.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/fp_to_sint.ll
Commit a0d019fc89c57736e54a476aa4db63027a2dace2 by csigg
[mlir] Add support for ops with regions in 'gpu-async-region' rewriter.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D101757
The file was modifiedmlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
Commit 5dd9f44c17ec0d8b6b88bb015560b3c566622fdc by Ben.Dunbobbin
[LLD] Improve --strip-all help text

This is a slight improvement to the help text, as I was slightly
surprised when strip-all did more than remove the symbol table.

Currently, we match gold's help text for strip-all and strip-debug.
I think that the GNU documentation for these options is not particularly
clear. However, I have opted to make only a minor change here and keep
the help text similar to gold's as these are mature options that are
well understood.

ld.bfd (https://sourceware.org/binutils/docs/ld/Options.html) has a
similar implication although it defines strip-debug as a subset of
strip-all. However, felt that noting that strip-all implies strip-debug
is better; because, with the ld.bfd approach you have to read both the
--strip-debug and the --strip-all help text to understand the behaviour
of --strip-all (and the --strip-all help text doesn't indicate that he
--strip-debug help text is related).

Differential Revision: https://reviews.llvm.org/D101890
The file was modifiedlld/ELF/Options.td
The file was modifiedlld/docs/ld.lld.1
Commit 4979c90458628c9463815d81c637f8787f72fff0 by david.green
[LV] Account for tripcount when calculation vectorization profitability

The loop vectorizer will currently assume a large trip count when
calculating which of several vectorization factors are more profitable.
That is often not a terrible assumption to make as small trip count
loops will usually have been fully unrolled. There are cases however
where we will try to vectorize them, and especially when folding the
tail by masking can incorrectly choose to vectorize loops that are not
beneficial, due to the folded tail rounding the iteration count up for
the vectorized loop.

The motivating example here has a trip count of 5, so either performs 5
scalar iterations or 2 vector iterations (with VF=4). At a high enough
trip count the vectorization becomes profitable, but the rounding up to
2 vector iterations vs only 5 scalar makes it unprofitable.

This adds an alternative cost calculation when we know the max trip
count and are folding tail by masking, rounding the iteration count up
to the correct number for the vector width. We still do not account for
anything like setup cost or the mixture of vector and scalar loops, but
this is at least an improvement in a few cases that we have had
reported.

Differential Revision: https://reviews.llvm.org/D101726
The file was addedllvm/test/Transforms/LoopVectorize/ARM/mve-known-trip-count.ll
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Commit 3d746962ed1831987c6a1ab54fe8f6cbb6477e0e by benny.kra
[ORC] Silence unused variable warnings in Release builds. NFC.
The file was modifiedllvm/unittests/ExecutionEngine/Orc/OrcCAPITest.cpp
Commit fc690777fce0bf50a8f424b05993b1e218713ae5 by malhar.jajoo
Revert "[ARM] Transforming memcpy to Tail predicated Loop"

Reverting commit since it causes failure (10462).
This reverts commit b856f4a232cbd43476e9b9f75c80aacfc6f5c152.
The file was removedllvm/test/CodeGen/Thumb2/mve-tp-loop.ll
The file was modifiedllvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp
The file was modifiedllvm/lib/Target/ARM/ARMInstrMVE.td
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/memcall.ll
The file was modifiedllvm/lib/Target/ARM/ARMTargetTransformInfo.h
The file was modifiedllvm/lib/Target/ARM/ARMSubtarget.h
The file was removedllvm/test/CodeGen/Thumb2/mve-tp-loop.mir
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.h
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
Commit 67cfefebbbbb3a5923c47c31293a8f76596de8be by carl.ritson
[AMDGPU] Fix WQM failure with single block inactive demote

Instruction test for inactive kill/demote needs to be based on
actual opcode not whether instruction would be lowered to demote.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D101966
The file was modifiedllvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll
Commit b24e9f82b71f325214c41fdc3f106207cc2244a6 by jonathanchesterfield
[amdgpu-arch] Fix rpath to run from build dir

[amdgpu-arch] Fix rpath to run from build dir

Prior to this, amdgpu-arch has RUNPATH set to $ORIGIN/../lib which works
for some installs, but not from the build directory where clang executes
the tool from when running tests.

This cmake option adds the location of the rocr runtime to the RUNPATH
(note, it amends RUNPATH here, despite the cmake option referring to RPATH)
to create a binary that runs from build or install location.

Before:
RUNPATH [$ORIGIN/../lib]
After:
RUNPATH [$ORIGIN/../lib:$HOME/llvm-install/lib]

Credit to Greg for knowing this trick and pointing to examples of it in use
for the aomp build scripts.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D101926
The file was modifiedclang/tools/amdgpu-arch/CMakeLists.txt
Commit c28a602329a78db5c02cc85679b5035aaf6753b4 by anastasia.stulova
[OpenCL] Remove subgroups pragma in enqueue kernel and pipe builtins.

This patch simplifies the parser and makes the language semantics
consistent. There is no extension pragma requirement in the spec
for the subgroup functions in enqueue kernel or pipes and all other
builtin functions are available without the pragama.

Differential Revision: https://reviews.llvm.org/D100984
The file was modifiedclang/lib/Sema/SemaChecking.cpp
The file was modifiedclang/test/SemaOpenCL/cl20-device-side-enqueue.cl
Commit d40a0b8af771f9b37dd2985fc692443c0ac5473e by Paul C. Anagnostopoulos
[TableGen] [Clang] Clean up Options.td and add asserts.

Differential Revision: https://reviews.llvm.org/D101766
The file was modifiedclang/include/clang/Driver/Options.td
Commit ed87f512bb9eb5c1d44e9a1182ffeaf23d6c5ae8 by nemanja.i.ibm
[PowerPC] Provide some P8-specific altivec overloads for P7

This adds additional support for XL compatibility. There are a number
of functions in altivec.h that produce a single instruction (or a
very short sequence) for Power8 but can be done on Power7 without
scalarization. XL provides these implementations.
This patch adds the following overloads for doubleword vectors:
vec_add
vec_cmpeq
vec_cmpgt
vec_cmpge
vec_cmplt
vec_cmple
vec_sl
vec_sr
vec_sra
The file was modifiedclang/test/CodeGen/builtins-ppc-vsx.c
The file was modifiedclang/lib/Headers/altivec.h
Commit 9e026273b030d77b5429e31fd2d7ce3ca6b68cd8 by jay.foad
[AMDGPU] SIInsertHardClauses: move more stuff into the class. NFC.
The file was modifiedllvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
Commit 3026f75ed0f520f9be7ac354406687a549155ded by Raphael Isemann
[lldb][NFC] Make assert in TestStaticVariables more expressive
The file was modifiedlldb/test/API/lang/cpp/class_static/TestStaticVariables.py
Commit 3761b9a2345aff197707d23a68d4a178489f60e4 by thakis
Revert "[PowerPC] Provide some P8-specific altivec overloads for P7"

This reverts commit ed87f512bb9eb5c1d44e9a1182ffeaf23d6c5ae8.
Breaks check-clang, see e.g.
https://lab.llvm.org/buildbot/#/builders/139/builds/3818
The file was modifiedclang/test/CodeGen/builtins-ppc-vsx.c
The file was modifiedclang/lib/Headers/altivec.h
Commit bb113b984565b01355a7f6bb4a5fa2eb8284c2e1 by wei.huang
[AIX][TLS] Add support for TLSGD relocations to XCOFF objects

- Add branch absolute reloction R_RBA, R_TLS relocation for the variable offset
  for the tlsgd model and R_TLSM for the region handle for the tlsgd model
- Properly set the relocation fixed values for R_TLS and R_TLSM
- Emit the TCEntry with the variant kind in the XCOFFStreamer

Reviewed by: sfertile, nemanjai, DiggerLin

Differential Revision: https://reviews.llvm.org/D100214
The file was addedllvm/test/CodeGen/PowerPC/aix-tls-xcoff-reloc.ll
The file was modifiedllvm/lib/Target/PowerPC/MCTargetDesc/PPCXCOFFObjectWriter.cpp
The file was modifiedllvm/lib/MC/XCOFFObjectWriter.cpp
The file was modifiedllvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp
The file was addedllvm/test/CodeGen/PowerPC/aix-tls-xcoff-reloc-large.ll
Commit fe0e86e6026f79e0b18f877196fbddd1d9e140d8 by Louis Dionne
[libc++] Rewrite std::to_address to avoid relying on element_type

This is a rough reapplication of the change that fixed std::to_address
to avoid relying on element_type (da456167). It is somewhat different
because the fix to avoid breaking Clang (which caused it to be reverted
in 347f69c55) was a bit more involved.

Differential Revision: https://reviews.llvm.org/D101638
The file was modifiedlibcxx/include/iterator
The file was modifiedlibcxx/include/__memory/pointer_traits.h
The file was addedlibcxx/test/libcxx/utilities/memory/pointer.conversion/to_address_std_iterators.pass.cpp
The file was addedlibcxx/test/std/utilities/memory/pointer.conversion/to_address_std_iterators.pass.cpp
The file was addedlibcxx/test/libcxx/utilities/memory/pointer.conversion/to_address.pass.cpp
The file was modifiedlibcxx/test/std/utilities/memory/pointer.conversion/to_address.pass.cpp
Commit e4b790c5e3653053819182a67c593bc65de860ac by david.spickett
[OpenMP] Temporarily require X86 target for parallel_for_codegen.cpp test

Since https://reviews.llvm.org/D101849 this test has been failing
on bots that only enable either Arm or AArch64 targets.

See: https://lab.llvm.org/buildbot/#/builders/107/builds/7601

Temporarily requires X86 for this test while the difference is figured out.
The file was modifiedclang/test/OpenMP/parallel_for_codegen.cpp
Commit 172d746e167b958e80dfae7c113bfb44b974f7c6 by Austin.Kerbow
[AMDGPU][NFC] Fix typos in SIFormMemoryClauses description

NFC.
The file was modifiedllvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
Commit 1faf3b195e71dbc469d658d450949439dbf92f9f by nemanja.i.ibm
[PowerPC] Re-commit ed87f512bb9eb5c1d44e9a1182ffeaf23d6c5ae8

This was reverted in 3761b9a2345aff197707d23a68d4a178489f60e4 just
as I was about to commit the fix. This patch inlcudes the
necessary fix.
The file was modifiedclang/test/CodeGen/builtins-ppc-vsx.c
The file was modifiedclang/test/CodeGen/builtins-ppc-p8vector.c
The file was modifiedclang/lib/Headers/altivec.h
Commit 0b303da6f821dcbcb3f72135b2431aaf94045839 by thomasraoux
[mlir][vector] add pattern to cast away lead unit dimension for broadcast op

Differential Revision: https://reviews.llvm.org/D101955
The file was modifiedmlir/test/Dialect/Vector/vector-transforms.mlir
The file was modifiedmlir/lib/Dialect/Vector/VectorTransforms.cpp
Commit 933551eaeb08d42d7891c8fbb67cb805e24f9727 by thomasraoux
[mlir][NFC] Fix warning in VectorTransforms.cpp
The file was modifiedmlir/lib/Dialect/Vector/VectorTransforms.cpp
Commit 7654d8e1a96cb9dda0318ff5489c17f3780f1944 by jezng
[lld-macho][nfc] Convert the mock libSystem.tbd to TBDv4

It doesn't seem like TBDv3 allows for specifying multiple platforms, so I'm
upgrading us to TBDv4. (We need to support multiple platforms in order to test
that we can handle zippered dylibs; that functionality will be added in an
upcoming diff.)

Differential Revision: https://reviews.llvm.org/D101953
The file was modifiedlld/test/MachO/Inputs/MacOSX.sdk/usr/lib/libSystem.tbd
Commit 9260760235261a5cd150b15a3499f7988da65a02 by jezng
[lld-macho] Support loading of zippered dylibs

ld64 can emit dylibs that support more than one platform (typically macOS and
macCatalyst). This diff allows LLD to read in those dylibs. Note that this is a
super bare-bones implementation -- in particular, I haven't added support for
LLD to emit those multi-platform dylibs, nor have I added a variety of
validation checks that ld64 does. Until we have a use-case for emitting zippered
dylibs, I think this is good enough.

Fixes PR49597.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D101954
The file was modifiedlld/MachO/InputFiles.h
The file was modifiedlld/MachO/InputFiles.cpp
The file was modifiedlld/test/MachO/Inputs/MacOSX.sdk/usr/lib/libSystem.tbd
The file was addedlld/test/MachO/zippered.yaml
Commit 1b47489fd0e1c3ddbeabb421b668b7bc623fd622 by llvm-dev
[SLP] Use empty() instead of size() == 0. NFCI.
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit 2dab05902112042eb4cc2cd16adfbe0a9127c0af by llvm-dev
[SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI.
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit 338c1b701f23888eed67ca7e3214db175940df21 by llvm-dev
[SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI.
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Commit 28f1d018b1c241968d3f426d81c6973b5cae7bcf by Stanislav.Mekhanoshin
[AMDGPU] Fix 64 bit DPP validation

AMDGPUAsmParser::isSupportedDPPCtrl() was failing to correctly
find a DPP register operand, regadless of the position it is
always src0. Moved this check into a new validateDPP() method
where we have full instruction already. In particular it was
failing to reject this case:

v_cvt_u32_f64 v5, v[0:1] quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf

Essentially it was broken for any case where size of dst and
src0 differ.

It also improves the diagnostics with a proper error message.

The check in the InstPrinter also drops verification of the dst
register as it does not have anything to do with the dpp operand.

Differential Revision: https://reviews.llvm.org/D101930
The file was modifiedllvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
The file was modifiedllvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
The file was modifiedllvm/test/MC/AMDGPU/gfx9-asm-err.s
The file was modifiedllvm/test/MC/AMDGPU/gfx90a_err.s
Commit 159dd447fe98f558879343d660b5bfe90779609f by davg
[clangd][ObjC] Highlight Objc Ivar refs

Treat them just like we do for properties - as a `property` semantic
token although ideally we could differentiate the two.

Differential Revision: https://reviews.llvm.org/D101785
The file was modifiedclang-tools-extra/clangd/FindTarget.cpp
The file was modifiedclang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
The file was modifiedclang-tools-extra/clangd/unittests/FindTargetTests.cpp
Commit 2e0ee68dc85c0a2b7e65e489a60ab363393b06a8 by fraser
[LangRef][VP] Fix typos in VP sdiv/udiv examples
The file was modifiedllvm/docs/LangRef.rst
Commit 58323be415ce0713fdd959edeab788252118c533 by craig.topper
[RISCV] Cleanup instruction formats used for B extension ternary operations.

Rename RVInstR4 as used by F/D/Zfh extensions to RVInstR4Frm.
Introduce new RVInstR4 that takes funct3 as a parameter.

Add new format classes for FSRI and FSRIW instead of trying to
bend RVInstR4 to use a shamt overlayed on rs2 and funct2.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100427
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrFormats.td
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfoD.td
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfoF.td
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfoB.td
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
Commit 1c4cb510b4daccc0f4763958567affc2b442f317 by paulsson
[SystemZ] Don't use libcall for 128 bit shifts.

Expand 128 bit shifts instead of using a libcall.

This patch removes the 128 bit shift libcalls and thereby causes
ExpandShiftWithUnknownAmountBit() to be called.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D101993
The file was modifiedllvm/test/CodeGen/SystemZ/shift-12.ll
The file was modifiedllvm/lib/Target/SystemZ/SystemZISelLowering.cpp
Commit 1f109f9d9cddbc90d97b50c154a8474e7e623356 by zinenko
Fix array attribute in bindings for linalg.init_tensor

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D101998
The file was modifiedmlir/python/mlir/dialects/_linalg_ops_ext.py
The file was modifiedmlir/test/python/dialects/linalg/ops.py
Commit e2d774a3dbbbbff21531289889f6906b22f04cfe by hubert.reinterpretcast
[AIX][Test][ORC] Skip unsupported ORC C API tests on AIX

As mentioned before in D78813, currently the XCOFF backend does not
support writing 64-bit object files, which the ORC JIT tests will try to
exercise if we are on AIX. This patch disables the tests on AIX for now.
This is consistent with what's been done, for example, regarding
`armv7`.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D101971
The file was modifiedllvm/unittests/ExecutionEngine/Orc/OrcCAPITest.cpp