Changes

Summary

  1. [clang-format] Fix C# nullable-related errors (details)
  2. [clang-format] Rename common types between C#/JS (details)
  3. [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences (details)
  4. [SystemZ] Support builtin_frame_address with packed stack without backchain. (details)
  5. [AMDGPU] Regenerate shift tests. NFCI. (details)
  6. [AMDGPU] Regenerate fp2int tests. NFCI. (details)
  7. [mlir] Add support for ops with regions in 'gpu-async-region' rewriter. (details)
  8. [LLD] Improve --strip-all help text (details)
  9. [LV] Account for tripcount when calculation vectorization profitability (details)
  10. [ORC] Silence unused variable warnings in Release builds. NFC. (details)
  11. Revert "[ARM] Transforming memcpy to Tail predicated Loop" (details)
  12. [AMDGPU] Fix WQM failure with single block inactive demote (details)
  13. [amdgpu-arch] Fix rpath to run from build dir (details)
  14. [OpenCL] Remove subgroups pragma in enqueue kernel and pipe builtins. (details)
Commit ec725b307f3fdc5656459047bab6e69669d9534f by marek.kurdej+llvm.org
[clang-format] Fix C# nullable-related errors

This fixes two errors:

Previously, clang-format was splitting up type identifiers from the
nullable ?. This changes this behavior so that the type name sticks with
the operator.

Additionally, nullable operators attached to return types in interface
functions were not parsed correctly. Digging deeper, it looks like
interface bodies were being parsed differently than classes and structs,
causing MustBeDeclaration to be incorrect for interface members. They
now share the same logic.

One other change is reintroducing the CSharpNullable type independent of
JsTypeOptionalQuestion. Despite having a similar semantic purpose, their
actual syntax differs quite a bit.

Reviewed By: MyDeveloperDay, curdeius

Differential Revision: https://reviews.llvm.org/D101860
The file was modifiedclang/lib/Format/UnwrappedLineParser.cpp
The file was modifiedclang/lib/Format/FormatToken.h
The file was modifiedclang/lib/Format/TokenAnnotator.cpp
The file was modifiedclang/lib/Format/UnwrappedLineParser.h
The file was modifiedclang/unittests/Format/FormatTestCSharp.cpp
Commit cdf33962d9768fbd8d6b193aff463a21eaa984f3 by marek.kurdej+llvm.org
[clang-format] Rename common types between C#/JS

Reviewed By: curdeius

Differential Revision: https://reviews.llvm.org/D101862
The file was modifiedclang/lib/Format/FormatTokenLexer.cpp
The file was modifiedclang/lib/Format/FormatToken.h
The file was modifiedclang/lib/Format/UnwrappedLineParser.cpp
The file was modifiedclang/lib/Format/TokenAnnotator.cpp
Commit 8c9742bd239af602ee2743baa3c4281f24d45df1 by kerry.mclaughlin
[SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences

Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
The file was addedllvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
The file was addedllvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
The file was modifiedllvm/include/llvm/IR/IRBuilder.h
The file was modifiedllvm/lib/IR/IRBuilder.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Commit a0da66bc1330f9808ed9814aaa9c3c3d3244852d by paulsson
[SystemZ] Support builtin_frame_address with packed stack without backchain.

In order to use __builtin_frame_address(0) with packed stack and no
backchain, the address of where the backchain would have been written is
returned (like GCC).

This address may either contain a saved register or be unused.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D101897
The file was modifiedllvm/test/CodeGen/SystemZ/frameaddr-02.ll
The file was modifiedllvm/lib/Target/SystemZ/SystemZISelLowering.cpp
Commit 20e976e2487f5b52541772e6e92954ebf2dcf13e by llvm-dev
[AMDGPU] Regenerate shift tests. NFCI.
The file was modifiedllvm/test/CodeGen/AMDGPU/shl.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/sra.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/srl.ll
Commit 0fdce16efb281ab52e1aa5a7a760aebcb7a59163 by llvm-dev
[AMDGPU] Regenerate fp2int tests. NFCI.
The file was modifiedllvm/test/CodeGen/AMDGPU/fp_to_uint.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/fp_to_sint.ll
Commit a0d019fc89c57736e54a476aa4db63027a2dace2 by csigg
[mlir] Add support for ops with regions in 'gpu-async-region' rewriter.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D101757
The file was modifiedmlir/lib/Dialect/GPU/Transforms/AsyncRegionRewriter.cpp
Commit 5dd9f44c17ec0d8b6b88bb015560b3c566622fdc by Ben.Dunbobbin
[LLD] Improve --strip-all help text

This is a slight improvement to the help text, as I was slightly
surprised when strip-all did more than remove the symbol table.

Currently, we match gold's help text for strip-all and strip-debug.
I think that the GNU documentation for these options is not particularly
clear. However, I have opted to make only a minor change here and keep
the help text similar to gold's as these are mature options that are
well understood.

ld.bfd (https://sourceware.org/binutils/docs/ld/Options.html) has a
similar implication although it defines strip-debug as a subset of
strip-all. However, felt that noting that strip-all implies strip-debug
is better; because, with the ld.bfd approach you have to read both the
--strip-debug and the --strip-all help text to understand the behaviour
of --strip-all (and the --strip-all help text doesn't indicate that he
--strip-debug help text is related).

Differential Revision: https://reviews.llvm.org/D101890
The file was modifiedlld/ELF/Options.td
The file was modifiedlld/docs/ld.lld.1
Commit 4979c90458628c9463815d81c637f8787f72fff0 by david.green
[LV] Account for tripcount when calculation vectorization profitability

The loop vectorizer will currently assume a large trip count when
calculating which of several vectorization factors are more profitable.
That is often not a terrible assumption to make as small trip count
loops will usually have been fully unrolled. There are cases however
where we will try to vectorize them, and especially when folding the
tail by masking can incorrectly choose to vectorize loops that are not
beneficial, due to the folded tail rounding the iteration count up for
the vectorized loop.

The motivating example here has a trip count of 5, so either performs 5
scalar iterations or 2 vector iterations (with VF=4). At a high enough
trip count the vectorization becomes profitable, but the rounding up to
2 vector iterations vs only 5 scalar makes it unprofitable.

This adds an alternative cost calculation when we know the max trip
count and are folding tail by masking, rounding the iteration count up
to the correct number for the vector width. We still do not account for
anything like setup cost or the mixture of vector and scalar loops, but
this is at least an improvement in a few cases that we have had
reported.

Differential Revision: https://reviews.llvm.org/D101726
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was addedllvm/test/Transforms/LoopVectorize/ARM/mve-known-trip-count.ll
Commit 3d746962ed1831987c6a1ab54fe8f6cbb6477e0e by benny.kra
[ORC] Silence unused variable warnings in Release builds. NFC.
The file was modifiedllvm/unittests/ExecutionEngine/Orc/OrcCAPITest.cpp
Commit fc690777fce0bf50a8f424b05993b1e218713ae5 by malhar.jajoo
Revert "[ARM] Transforming memcpy to Tail predicated Loop"

Reverting commit since it causes failure (10462).
This reverts commit b856f4a232cbd43476e9b9f75c80aacfc6f5c152.
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.h
The file was modifiedllvm/lib/Target/ARM/ARMInstrMVE.td
The file was modifiedllvm/lib/Target/ARM/ARMTargetTransformInfo.h
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/memcall.ll
The file was modifiedllvm/lib/Target/ARM/ARMSubtarget.h
The file was removedllvm/test/CodeGen/Thumb2/mve-tp-loop.ll
The file was modifiedllvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was removedllvm/test/CodeGen/Thumb2/mve-tp-loop.mir
Commit 67cfefebbbbb3a5923c47c31293a8f76596de8be by carl.ritson
[AMDGPU] Fix WQM failure with single block inactive demote

Instruction test for inactive kill/demote needs to be based on
actual opcode not whether instruction would be lowered to demote.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D101966
The file was modifiedllvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll
Commit b24e9f82b71f325214c41fdc3f106207cc2244a6 by jonathanchesterfield
[amdgpu-arch] Fix rpath to run from build dir

[amdgpu-arch] Fix rpath to run from build dir

Prior to this, amdgpu-arch has RUNPATH set to $ORIGIN/../lib which works
for some installs, but not from the build directory where clang executes
the tool from when running tests.

This cmake option adds the location of the rocr runtime to the RUNPATH
(note, it amends RUNPATH here, despite the cmake option referring to RPATH)
to create a binary that runs from build or install location.

Before:
RUNPATH [$ORIGIN/../lib]
After:
RUNPATH [$ORIGIN/../lib:$HOME/llvm-install/lib]

Credit to Greg for knowing this trick and pointing to examples of it in use
for the aomp build scripts.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D101926
The file was modifiedclang/tools/amdgpu-arch/CMakeLists.txt
Commit c28a602329a78db5c02cc85679b5035aaf6753b4 by anastasia.stulova
[OpenCL] Remove subgroups pragma in enqueue kernel and pipe builtins.

This patch simplifies the parser and makes the language semantics
consistent. There is no extension pragma requirement in the spec
for the subgroup functions in enqueue kernel or pipes and all other
builtin functions are available without the pragama.

Differential Revision: https://reviews.llvm.org/D100984
The file was modifiedclang/lib/Sema/SemaChecking.cpp
The file was modifiedclang/test/SemaOpenCL/cl20-device-side-enqueue.cl