SuccessChanges

Summary

  1. [DSE] Move isOverwrite into DSEState. NFC (details)
  2. [GVN] Clobber partially aliased loads. (details)
  3. New tag for ittapi - fix an error related to cross-compiling ITTAPI in LLVM with mingw (details)
  4. [llvm][AsmPrinter] Restore source location to register clobber warning (details)
  5. [AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling (details)
  6. [AMDGPU] Fix codegen of image intrinsics for g16 and a16 (details)
  7. [docs] Added llvm/cmake section (details)
  8. [NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests (details)
  9. [X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom (details)
  10. [NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests (details)
  11. [X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom (details)
  12. [NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests (details)
  13. [NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests (details)
  14. [NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests (details)
  15. [X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom (details)
  16. [X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom (details)
  17. [X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom (details)
  18. [libcxx] [test] Change the generic_string_alloc test to test conversions to all char types (details)
  19. [llvm-mc][AArch64] HINT instruction disassembled as BTI (details)
  20. [AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions (details)
  21. NFCI: Remove VF argument from isScalarWithPredication (details)
  22. AArch64: support i128 cmpxchg in GlobalISel. (details)
  23. [Test] Add test on missing opportunity in Loop Deletion (details)
Commit f7cb654763ec353da56702f4f34ddc3570fb709a by david.green
[DSE] Move isOverwrite into DSEState. NFC

This moves the isOverwrite function into the DSEState so that it can
share the analyses and members from the state.

A few extra loop tests were also added to test stores in and around
multi block loops for D100464.
The file was modifiedllvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
The file was modifiedllvm/test/Transforms/DeadStoreElimination/multiblock-loops.ll
Commit fdae3fc8b3e9570073e7b6ae803195195fbf8bc2 by daniil.fukalov
[GVN] Clobber partially aliased loads.

Use offsets stored in `AliasResult` implemented in D98718.

Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
The file was modifiedllvm/test/Transforms/GVN/PRE/rle.ll
The file was modifiedllvm/lib/Transforms/Scalar/GVN.cpp
The file was modifiedllvm/lib/Analysis/MemoryDependenceAnalysis.cpp
The file was modifiedllvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
Commit 444f02d73c6d21bf9fe990ee922d995a33ef8de5 by alexey.bader
New tag for ittapi - fix an error related to cross-compiling ITTAPI in LLVM with mingw

Fix was implemented in the ittap repo to solve an error about cross-compiling ITTAPI in LLVM with mingw.
The problem occurred in the cross-compilation environment for Julia's dependencies.
The corresponding issue item in ittapi repo: https://github.com/intel/ittapi/issues/19
A new tag was created in ittapi repo for that fix.

This patch contains changes to update the ittapi tag in LLVM.

Reviewed By: bader

Differential Revision: https://reviews.llvm.org/D102471
The file was modifiedllvm/lib/ExecutionEngine/IntelJITEvents/CMakeLists.txt
Commit 2db090a2ebd76f120bfae4fbe4b7241667aa585e by david.spickett
[llvm][AsmPrinter] Restore source location to register clobber warning

Since 5de2d189e6ad466a1f0616195e8c524a4eb3cbc0 this particular warning
hasn't had the location of the source file containing the inline
assembly.

Fix this by reporting via LLVMContext. Which means that we no longer
have the "instantiated into assembly here" lines but they were going to
point to the start of the inline asm string anyway.

This message is already tested via IR in llvm. However we won't have
the required location info there so I've added a C file test in clang
to cover it.
(though strictly, this is testing llvm code)

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102244
The file was modifiedllvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
The file was addedclang/test/Misc/inline-asm-clobber-warning.c
Commit 72d570ca085c809edd70d355cad7129092afbf90 by david.stuttard
[AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling

A16 support for image instructions assembly/disassembly (gfx10) was missing

Also refactor MIMG op addr size calcs to common function

We'd got 3 places where the same operation was being done.

One test is now marked XFAIL until a related codegen patch is in place

Differential Revision: https://reviews.llvm.org/D102231

Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
The file was modifiedllvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
The file was modifiedllvm/lib/Target/AMDGPU/SIInstrInfo.cpp
The file was modifiedllvm/lib/Target/AMDGPU/MIMGInstructions.td
The file was modifiedllvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
The file was modifiedllvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
The file was modifiedllvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
The file was modifiedllvm/test/MC/AMDGPU/gfx10_asm_mimg.s
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll
The file was addedllvm/test/MC/Disassembler/AMDGPU/mimg_gfx10.txt
Commit 31b62aa162b464efadef942a801706127dd8a443 by david.stuttard
[AMDGPU] Fix codegen of image intrinsics for g16 and a16

For gfx10 gradient (g16) and address (a16) can be independent. Previous
implementation assumed that a16 implied g16.

There are some other changes that fix the verification (as well as asm/disasm)
that are required for the included test to pass - the XFAIL will be removed in
those changes.

This also includes required fixes for GlobalISel

Differential Revision: https://reviews.llvm.org/D102066

Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
The file was addedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.a16.dim.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit 4763c8c9e3c7ef2946fa575d5c6bf8dd7fb88639 by shivam98.tkg
[docs] Added llvm/cmake section

Added information about the cmake inside llvm.

Reviewed By: xgupta, jroelofs

Differential Revision: https://reviews.llvm.org/D101925
The file was modifiedllvm/docs/GettingStarted.rst
Commit a9fb321a67943b9fffac6ff2d56ad5acb458b4f4 by lebedev.ri
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests
The file was addedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-xmm.s
Commit 26c1bffe675747d90513033f773c0aef63172608 by lebedev.ri
[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom

Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle,
this one is certainly a zero-cycle instruction, in addition to both of them
being dependency breaking.

As confirmed by exegesis measurements, and ref docs.
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-xmm.s
The file was modifiedllvm/lib/Target/X86/X86ScheduleZnver3.td
Commit 2a7c52ff7f8345cbf9956ddbe289326bdde0589b by lebedev.ri
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests
The file was addedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-ymm.s
Commit 59554c01ab7e3f6a9316bbac83544f10f742fa44 by lebedev.ri
[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis, and ref docs.
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-ymm.s
The file was modifiedllvm/lib/Target/X86/X86ScheduleZnver3.td
Commit fdc65e46b618acdb06d2bc59e57325b0112c3f71 by lebedev.ri
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
Commit 57eee56d0a9783e5fae7030bf732ffeadf1180e6 by lebedev.ri
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-xmm.s
Commit 3567c7eda1fce98dd33341002c2062a2338761df by lebedev.ri
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-ymm.s
Commit 9c596bc5416a16247c4dfb4a564d93c6cb97fb9f by lebedev.ri
[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom

Same as with it's float friend, unlike their AVX versions.
As confirmed by exegesis, and ref docs.
The file was modifiedllvm/lib/Target/X86/X86ScheduleZnver3.td
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
Commit 336b9dbe88c1f44e16bf98113e821b4eddcb0d0d by lebedev.ri
[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.
The file was modifiedllvm/lib/Target/X86/X86ScheduleZnver3.td
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-xmm.s
Commit 43a7f130a7440f2c5eaa30faf69b90e2a9c571a0 by lebedev.ri
[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-avx-ymm.s
The file was modifiedllvm/lib/Target/X86/X86ScheduleZnver3.td
Commit c12c8124e14217779eb5b8d3a2a92a6469a799e7 by martin
[libcxx] [test] Change the generic_string_alloc test to test conversions to all char types

On windows, the native path char type is wchar_t - therefore, this test
didn't actually do the conversion that the test was supposed to exercise.

The charset conversions on windows do cause extra allocations outside of
the provided allocator though, so that bit of the test has to be waived
now that the test actually does something. (Other tests have similar
TEST_NOT_WIN32() for allocation checks for charset conversions.)

Also fix a typo, and amend the path.native.obs/string_alloc test to
test char8_t, too.

Differential Revision: https://reviews.llvm.org/D102360
The file was modifiedlibcxx/test/std/input.output/filesystems/class.path/path.member/path.generic.obs/generic_string_alloc.pass.cpp
The file was modifiedlibcxx/test/std/input.output/filesystems/class.path/path.member/path.native.obs/string_alloc.pass.cpp
Commit 10798709713a9b5d4ff8d8f5961b3c2fdb81d887 by alexandros.lamprineas
[llvm-mc][AArch64] HINT instruction disassembled as BTI

The Arm Architecture Reference Manual says that the SystemHintOp_BTI
opcode is prefered when CRm:op2 matches 0100:xx0, but llvm-mc
currently accepts 0100:xxx, which isn't right.

Differential Revision: https://reviews.llvm.org/D102415
The file was modifiedllvm/lib/Target/AArch64/AArch64SystemOperands.td
The file was modifiedllvm/lib/Target/AArch64/AArch64InstrFormats.td
The file was modifiedllvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
The file was modifiedllvm/test/MC/Disassembler/AArch64/armv8.5a-bti.txt
The file was modifiedllvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
Commit 7f81c5a5bae8329379b226d4efff6a44dfdeece9 by jay.foad
[AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions

A consequence is that checkInstOffsetsDoNotOverlap can now distinguish
sp+offset from fp+offset, so it knows that it shouldn't try to work out
whether the accesses overlap just by comparing the offsets. For example
in these two instructions:

MIR:
BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5)
%4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5)

ISA:
buffer_store_dword v0, off, s[0:3], s32 offset:4
buffer_load_dword v0, off, s[0:3], s34

Differential Revision: https://reviews.llvm.org/D73957
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.load.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/salu-to-valu.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIInstrInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.tbuffer.load.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/amdpal-callable.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/call-argument-types.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.tbuffer.load.ll
Commit 459c48e04f25a40a81e9e11ccb9c17a88dc39999 by sander.desmalen
NFCI: Remove VF argument from isScalarWithPredication

As discussed in D102437, the VF argument to isScalarWithPredication
seems redundant, so this is intended to be a non-functional change. It
seems wrong to query the widening decision at this point. Removing the
operand and code to get the widening decision causes no unit/regression
tests to fail. I've also found no issues running the LLVM test-suite.

This subsequently removes the VF argument from isPredicatedInst as well,
since it is no longer required.
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Commit 4789fc75d3501f14cfbd5b102f173721d498ff58 by Tim Northover
AArch64: support i128 cmpxchg in GlobalISel.

There are three essentially different cases to handle:

  * -O1, no LSE. The IR is expanded to ldxp/stxp and we need patterns to select
    them.
  * -O0, no LSE. We get G_ATOMIC_CMPXCHG, and need to produce CMP_SWAP_N
    pseudos. The registers are all 64-bit so this is easy.
  * LSE. We get G_ATOMIC_CMPXCHG and need to produce a CASP instruction with
    XSeqPair registers.

The last case is by far the hardest, and and adds 128-bit GPR support as a
byproduct.
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.h
The file was modifiedllvm/lib/Target/AArch64/AArch64GenRegisterBankInfo.def
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
The file was addedllvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic-128.ll
The file was addedllvm/test/CodeGen/AArch64/GlobalISel/legalize-cmpxchg-128.mir
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/regbank-extract.mir
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
The file was modifiedllvm/lib/Target/AArch64/AArch64InstrGISel.td
The file was modifiedllvm/lib/Target/AArch64/AArch64RegisterBanks.td
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
Commit e51ef7f0706ada40d33819d1b3bdca2351e4fb4e by mkazantsev
[Test] Add test on missing opportunity in Loop Deletion

We can break the backedge in some cases when we can evaluate some of the
values and conditions on the 1st iteration.
The file was addedllvm/test/Transforms/LoopDeletion/eval_first_iteration.ll