1. [ARM][MVE] Add intrinsics for immediate shifts. (reland) (details)
  2. [ARM][LowOverheadLoops] Remove dead loop update instructions. (details)
  3. [lldb][NFC] Cleanup includes in FormatManagerTests.cpp (details)
  4. [Clang] Pragma vectorize_width() implies vectorize(enable) (details)
  5. [PowerPC][NFC] add test case for lwa - loop ds form prep (details)
  6. [AArch64][SVE] Implement intrinsics for non-temporal loads & stores (details)
Commit bd0f271c9e55ab69b45258e4922869099ed18307 by simon.tatham
[ARM][MVE] Add intrinsics for immediate shifts. (reland)
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time immediate.
They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of the
There's a fiddly special case, though. ACLE specifies that the immediate
in `vshrq_n` can take values up to //and including// the bit size of the
vector lane. But LLVM IR thinks that shifting right by the full size of
the lane is UB, and feels free to replace the `lshr` with an `undef`
half way through the optimization pipeline. Hence, to keep this legal in
source code, I have to detect it at codegen time. Logical (unsigned)
right shifts by the element size are handled by simply emitting the zero
vector; arithmetic ones are converted into a shift of one bit less,
which will always give the same output.
In order to do that check, I also had to enhance the tablegen MveEmitter
so that it can cope with converting a builtin function's operand into a
bare integer to pass to a code-generating subfunction. Previously the
only bare integers it knew how to handle were flags generated from
within ``.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen, MarkMurrayARM
Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya,
cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision:
The file was modifiedclang/include/clang/Basic/ (diff)
The file was addedclang/test/CodeGen/arm-mve-intrinsics/vector-shift-imm.c
The file was modifiedclang/lib/CodeGen/CGBuiltin.cpp (diff)
The file was modifiedclang/utils/TableGen/MveEmitter.cpp (diff)
The file was modifiedllvm/include/llvm/IR/ (diff)
The file was modifiedllvm/lib/Target/ARM/ (diff)
The file was modifiedclang/include/clang/Basic/ (diff)
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vector-shift-imm.ll
Commit d97cf1f88902026b6ebe7fb9d844a285c3b113c5 by sjoerd.meijer
[ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was
still lingering around hurting performance. This removes dead loop
update instructions, which in our case are mostly SUBS instructions.
To support this, some helper functions were added to MachineLoopUtils
and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find
uses before a particular loop instruction, respectively.
This is a first version that removes a SUBS instruction when there are
no other uses inside and outside the loop block, but there are some more
interesting cases in
test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows
that there is room for improvement. For example, we can't handle this
case yet:
   dlstp.32  lr, r2
   mov r3, r2
   subs  r2, #4
   vldrh.u32 q2, [r1], #8
   vmov  q1, q0
   vmla.u32  q0, q2, r0
   letp  lr, .LBB0_1
@ %bb.2:
   vctp.32 r3
which is a lot more tricky because r2 is not only used by the subs, but
also by the mov to r3, which is used outside the low-overhead loop by
the vctp instruction, and that requires a bit of a different approach,
and I will follow up on this.
Differential Revision:
The file was modifiedllvm/lib/CodeGen/ReachingDefAnalysis.cpp (diff)
The file was modifiedllvm/lib/Target/ARM/ARMLowOverheadLoops.cpp (diff)
The file was addedllvm/test/CodeGen/Thumb2/LowOverheadLoops/dont-remove-loop-update3.mir
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/vector-arith-codegen.ll (diff)
The file was modifiedllvm/include/llvm/CodeGen/ReachingDefAnalysis.h (diff)
The file was modifiedllvm/lib/CodeGen/MachineLoopUtils.cpp (diff)
The file was modifiedllvm/include/llvm/CodeGen/MachineLoopUtils.h (diff)
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll (diff)
The file was addedllvm/test/CodeGen/Thumb2/LowOverheadLoops/dont-remove-loop-update2.mir
The file was modifiedllvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll (diff)
The file was addedllvm/test/CodeGen/Thumb2/LowOverheadLoops/dont-remove-loop-update.mir
Commit 987e7323fb53f968d5878483610fcf2319cdde86 by Raphael Isemann
[lldb][NFC] Cleanup includes in FormatManagerTests.cpp
The file was modifiedlldb/unittests/DataFormatter/FormatManagerTests.cpp (diff)
Commit 021685491727e023aeae9ca272a2d6cd727e20e4 by sjoerd.meijer
[Clang] Pragma vectorize_width() implies vectorize(enable)
Let's try this again; this has been reverted/recommited a few times.
Last time this got reverted because for this loop:
  void a() {
   #pragma clang loop vectorize(disable)
   for (;;)
vectorisation was incorrectly enabled and the vectorize.enable metadata
was set due to a logic error. But with this fixed, we now imply
vectorisation when:
1) vectorisation is enabled, which means: VectorizeWidth > 1, 2) and
don't want to add it when it is disabled or enabled, otherwise we would
  be incorrectly setting it or duplicating the metadata, respectively.
This should fix PR27643.
Differential Revision:
The file was modifiedclang/test/CodeGenCXX/pragma-loop.cpp (diff)
The file was modifiedclang/lib/CodeGen/CGLoopInfo.cpp (diff)
The file was addedclang/test/CodeGenCXX/pragma-loop-pr27643.cpp
Commit bf4580b7e740a9deeba2608e4c2772181f33a67b by czhengsz
[PowerPC][NFC] add test case for lwa - loop ds form prep
The file was modifiedllvm/test/CodeGen/PowerPC/loop-instr-form-prepare.ll (diff)
Commit 3f5bf35f868d1e33cd02a5825d33ed4675be8cb1 by kerry.mclaughlin
[AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary: Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the MONonTemporal flag
set when used with the intrinsics above.
Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
cfe-commits, llvm-commits
Tags: #llvm
Differential Revision:
The file was addedllvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (diff)
The file was modifiedllvm/lib/Target/AArch64/ (diff)
The file was addedllvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp (diff)
The file was modifiedllvm/include/llvm/IR/ (diff)
The file was modifiedllvm/lib/Target/AArch64/ (diff)