SuccessChanges

Summary

  1. [ARM,MVE] Add intrinsics for vector comparisons. (details)
  2. [ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i. (details)
Commit 4a4dd85e5ab51aa8c01c690cd14205af157178e7 by simon.tatham
[ARM,MVE] Add intrinsics for vector comparisons.
This adds the `vcmp` family of ACLE MVE intrinsics: vector/vector,
vector/scalar, and the predicated forms of both. All are represented
using standard existing IR: vector/scalar comparisons are represented by
making a vector out of the scalar first, and predicated forms are
represented by taking the bitwise AND of the input predicate and the
output of the comparison. Existing LLVM-side tests demonstrate that ISel
will pattern-match all of that back down to single MVE VCMPs.
The idiom of handling a vector/scalar operation by generating IR to
expand the scalar into a second vector is going to be needed for a lot
of MVE intrinsics, so to make that easy, I've provided a helper function
that automatically works out the element count.
The comparison intrinsics are the first ones that have to //return// a
predicate, in the user-facing `mve_pred16_t` format. This means we have
to use the `arm_mve_pred_v2i` low-level intrinsic to convert it back
from the logical `<n x i1>` form used in IR. I've done that explicitly
in the code gen specification for the builtins, because it happens much
more rarely in the ACLE API than passing a Predicate as input, so it
didn't seem worth automating in MveEmitter.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70297
The file was modifiedclang/include/clang/Basic/arm_mve.td
The file was modifiedclang/lib/CodeGen/CGBuiltin.cpp
The file was addedclang/test/CodeGen/arm-mve-intrinsics/compare.c
The file was modifiedclang/include/clang/Basic/arm_mve_defs.td
Commit f4f77aa53e5b872bd8a93c3a193714d8eba9578c by simon.tatham
[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.
  mve_pred16_t pred = vcmpeqq(v1, v2);
v_out = vaddq_m(v_inactive, v3, v4, pred);
then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic will
call `@llvm.arm.mve.pred.i2v` to convert it straight back again. This
will be visible in the generated code as a `vmrs`/`vmsr` pair that move
the predicate value pointlessly out of `p0` and back into it again.
To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result, you
now get just the generated code you wanted:
  vpt.u16 eq, q1, q2
vaddt.u16 q0, q3, q4
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70313
The file was addedllvm/test/Transforms/InstCombine/ARM/mve-v2i2v.ll
The file was addedllvm/test/CodeGen/Thumb2/mve-vpt-from-intrinsics.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp