SuccessChanges

Summary

  1. [AMDGPU] Skip additional folding on the same operand. (details)
  2. [ARM] Begin adding IR intrinsics for MVE instructions. (details)
  3. [ARM] Add some sample IR MVE intrinsics with C++ isel. (details)
  4. [ARM] Add IR intrinsics for MVE VLD[24] and VST[24]. (details)
  5. [clang] New __attribute__((__clang_arm_mve_alias)). (details)
  6. [clang,ARM] Initial ACLE intrinsics for MVE. (details)
  7. [InstCombine] Known-bits optimization for ARM MVE VADC. (details)
  8. [NFC][XCOFF][AIX] Serialize object file writing for each CsectGroup (details)
Commit b2a65f0d70f529ce52004934867461fa5329da63 by michael.hliao
[AMDGPU] Skip additional folding on the same operand.
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr,
t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69355
The file was modifiedllvm/test/CodeGen/AMDGPU/operand-folding.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIFoldOperands.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/fold-imm-copy.mir
Commit 1b45297e013e1edae5028d844ca7cb591c79b07d by simon.tatham
[ARM] Begin adding IR intrinsics for MVE instructions.
This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.
This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns: the
predicated versions of the VADD and VSUB instructions (both integer and
FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).
When using VPT-predicated instructions in automatic code generation, it
will be convenient to specify the predicate value as a vector of the
appropriate number of i1. To make it easy to specify all sizes of an
instruction in one go and give each one the matching predicate vector
type, I've added a system of Tablegen informational records describing
MVE's vector types: each one gives the underlying LLVM IR ValueType
(which may not be the same if the MVE vector is of explicitly signed or
unsigned integers) and an appropriate vNi1 to use as the predicate
vector.
(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one of
the new `MVEVectorVTInfo` records, we can remove some of the existing
template parameters and replace them with references to the vector type
info's fields.)
The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and v2i,
to convert between an integer and a vector of i1 by just changing the
register class.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67158
The file was modifiedllvm/lib/Target/ARM/ARMInstrMVE.td
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was modifiedllvm/include/llvm/IR/IntrinsicsARM.td
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll
Commit ceeff95ca48f0c1460c8feb4eebced9a5cd12b58 by simon.tatham
[ARM] Add some sample IR MVE intrinsics with C++ isel.
This adds some initial example IR intrinsics for MVE instructions that
deliver multiple output values, and hence, have to be instruction-
selected by custom C++ code instead of Tablegen patterns.
I've added the writeback gather load instructions (taking a vector of
base addresses and a single common offset, returning a vector of loaded
values and an updated vector of base addresses); one example from the
long shift family (taking and returning a 64-bit value in two GPRs); and
the VADC instruction (which propagates a carry bit from each vector-lane
addition to the next, taking an input carry flag in FPSCR and outputting
the final one in FPSCR as well).
To support the VPT-predicated forms of these instructions, I've written
some helper functions to add the cluster of MVE predicate operands to
the end of a MachineInstr. `AddMVEPredicateToOps` is used when the
instruction actually is predicated (so it takes a predicate mask
argument), and `AddEmptyMVEPredicateToOps` is for when the instruction
is unpredicated (so it fills in $noreg for the mask). Each one comes in
a form suitable for `vpred_n`, and one for `vpred_r` which takes the
extra 'inactive' parameter.
For VADC, the representation of the carry flag in the IR intrinsic is a
word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e. with
the carry flag in bit 29 of the word. (The user-facing ACLE intrinsic
will want it to be in bit 0, but I'll do that on the clang side.)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68699
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/scalar-shifts.ll
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vldr.ll
The file was modifiedllvm/include/llvm/IR/IntrinsicsARM.td
The file was modifiedllvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll
Commit e0ef4ebe2f6ac3523ee25081b36c114c0f0ea695 by simon.tatham
[ARM] Add IR intrinsics for MVE VLD[24] and VST[24].
The VST2 and VST4 instructions take two or four vector registers as
input, and store part of each register to memory in an interleaved
pattern. They come in variants indicating which part of each register
they store (VST20 and VST21; VST40 to VST43 inclusive); the intention is
that issuing each of those variants in turn has the combined effect of
loading or storing the whole set of registers to a memory block of equal
size. The corresponding VLD2 and VLD4 instructions load from memory in
the same interleaved format: each one overwrites only part of its output
register set, and again, the idea is that if you use VLD4{0,1,2,3} or
VLD2{0,1} together, you end up having written to the whole of each
register.
I've implemented the stores and loads quite differently. The loads were
easiest to implement as a single intrinsic that expands to all four
VLD4x instructions or both VLD2x, delivering four complete output
registers. (Implementing each individual load as a separate instruction
taking four input registers to partially overwrite is possible in
theory, but pointless, and when I tried it, I found it would need extra
work to get the register allocation not to be horrible.) Since that
intrinsic delivers multiple outputs, it has to be instruction-selected
in custom C++.
But the store instructions are easier to model individually, because
they don't overwrite any register at all and you can write a DAG Isel
pattern in Tablegen for each one.
Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load
instructions, delivers four full output vectors, and is handled by C++
code, whereas `int_arm_mve_vst4q` expands to just one store instruction,
takes four input vectors and a constant indicating which lanes to store,
and is handled entirely in Tablegen. (And similarly for vld2q/vst2q.)
This is asymmetric, but it was the easiest way to do each one.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68700
The file was modifiedllvm/lib/Target/ARM/ARMInstrMVE.td
The file was modifiedllvm/include/llvm/IR/IntrinsicsARM.td
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vld24.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
Commit 7c11da0cfd3396150c13657a2217f2044f314734 by simon.tatham
[clang] New __attribute__((__clang_arm_mve_alias)).
This allows you to declare a function with a name of your choice (say
`foo`), but have clang treat it as if it were a builtin function (say
`__builtin_foo`), by writing
  static __inline__
__attribute__((__clang_arm_mve_alias(__builtin_foo)))
int foo(args);
I'm intending to use this for the ACLE intrinsics for MVE, which have to
be polymorphic on their argument types and also need to be implemented
by builtins. To avoid having to implement the polymorphism with several
layers of nested _Generic and make error reporting hideous, I want to
make all the user-facing intrinsics correspond directly to clang
builtins, so that after clang resolves
__attribute__((overloadable)) polymorphism it's already holding the
right BuiltinID for the intrinsic it selected.
However, this commit itself just introduces the new attribute, and
doesn't use it for anything.
To avoid unanticipated side effects if this attribute is used to make
aliases to other builtins, there's a restriction mechanism: only
(BuiltinID, alias) pairs that are approved by the function
ArmMveAliasValid() will be permitted. At present, that function doesn't
permit anything, because the Tablegen that will generate its list of
valid pairs isn't yet implemented. So the only test of this facility is
one that checks that an unapproved builtin _can't_ be aliased.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67159
The file was modifiedclang/test/Misc/pragma-attribute-supported-attributes-list.test
The file was modifiedclang/lib/AST/Decl.cpp
The file was modifiedclang/lib/Sema/SemaDeclAttr.cpp
The file was addedclang/test/Sema/arm-mve-alias-attribute.c
The file was modifiedclang/include/clang/Basic/AttrDocs.td
The file was modifiedclang/include/clang/Basic/DiagnosticSemaKinds.td
The file was modifiedclang/include/clang/Basic/Attr.td
Commit 08074cc96557dcb7ec91d7cd84c412414fa9a516 by simon.tatham
[clang,ARM] Initial ACLE intrinsics for MVE.
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on, and
demonstrates that it works by implementing a representative sample of
the ACLE intrinsics, more or less matching the ones introduced in LLVM
IR by D67158,D68699,D68700.
Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON, the
ACLE spec for <arm_mve.h> includes a polymorphism system, so that you
can write plain vaddq() and disambiguate by the vector types you pass to
it.
Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.
So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.
Doing it like this means that you get the polymorphism for free just by
using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that you
get really nice error messages if the user passes the wrong combination
of types: clang will show the declarations from the header file and
explain why each one doesn't match.
(The obvious alternative approach would be to have wrapper functions in
<arm_mve.h> which pass their arguments to the underlying builtins. But
that doesn't work in the case where one of the arguments has to be a
constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67161
The file was addedclang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c
The file was addedclang/test/CodeGen/arm-mve-intrinsics/vld24.c
The file was addedclang/test/CodeGen/arm-mve-intrinsics/vldr.c
The file was addedclang/include/clang/Basic/arm_mve_defs.td
The file was modifiedclang/lib/CodeGen/CGBuiltin.cpp
The file was addedclang/test/CodeGen/arm-mve-intrinsics/vminvq.c
The file was modifiedclang/lib/Sema/SemaChecking.cpp
The file was modifiedclang/lib/Sema/SemaType.cpp
The file was modifiedclang/include/clang/Sema/Sema.h
The file was addedclang/test/CodeGen/arm-mve-intrinsics/vcvt.c
The file was addedclang/include/clang/Basic/arm_mve.td
The file was modifiedclang/lib/Headers/CMakeLists.txt
The file was modifiedclang/utils/TableGen/CMakeLists.txt
The file was modifiedclang/include/clang/Basic/DiagnosticSemaKinds.td
The file was modifiedclang/utils/TableGen/TableGen.cpp
The file was addedclang/test/CodeGen/arm-mve-intrinsics/vaddq.c
The file was addedclang/utils/TableGen/MveEmitter.cpp
The file was modifiedclang/lib/CodeGen/CodeGenFunction.h
The file was modifiedclang/utils/TableGen/TableGenBackends.h
The file was modifiedclang/lib/Sema/SemaDeclAttr.cpp
The file was addedclang/test/CodeGen/arm-mve-intrinsics/vadc.c
The file was modifiedclang/include/clang/Basic/BuiltinsARM.def
The file was modifiedclang/include/clang/Basic/CMakeLists.txt
Commit e5f485c3bd9c719b4d78524a5b18c1d2524b62bf by simon.tatham
[InstCombine] Known-bits optimization for ARM MVE VADC.
The MVE VADC instruction reads and writes the carry bit at bit 29 of the
FPSCR register. The corresponding ACLE intrinsic is specified to work
with an integer in which the carry bit is stored at bit 0. So if a user
writes a code sequence in C that passes the carry from one VADC to the
next, like this,
    s0 = vadcq_u32(a0, b0, &carry);
   s1 = vadcq_u32(a1, b1, &carry);
then clang will generate IR for each of those operations that shifts the
carry bit up into bit 29 before the VADC, and after it, shifts it back
down and masks off all but the low bit. But in this situation what you
really wanted was two consecutive VADC instructions, so that the second
one directly reads the value left in FPSCR by the first, without wasting
several instructions on pointlessly clearing the other flag bits in
between.
This commit explains to InstCombine that the other bits of the flags
operand don't matter, and adds a test that demonstrates that all the
code between the two VADC instructions can be optimized away as a
result.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67162
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
The file was addedllvm/test/CodeGen/Thumb2/mve-intrinsics/vadc-multiple.ll
Commit 78207e1f234eede120beaf730dfb7bb9d4e00a1b by jasonliu
[NFC][XCOFF][AIX] Serialize object file writing for each CsectGroup
Summary:
Right now we handle each CsectGroup(ProgramCodeCsects, BSSCsects)
individually when assigning indices, writing symbol table, and writing
section raw data. However, there is already a pattern there, and we
could common up those actions for every CsectGroup. This will
make adding new CsectGroup(Read Write data, Read only data, TC/TOC,
mergeable string) easier, and less error prone.
Reviewed by: sfertile, daltenty, DiggerLin
Approved by: daltenty
Differential Revision: https://reviews.llvm.org/D69112
The file was modifiedllvm/lib/MC/XCOFFObjectWriter.cpp