Commit
b2729fda60dbda595e7b5974279d8f860bce75ab
by nicolas.vasilache[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:
```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```
The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.
This results in roughly 20% fewer cycles as reported by llvm-mca:
After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted): ``` Iterations: 100 Instructions: 5900 Total Cycles: 2415 Total uOps: 7300
Dispatch Width: 6 uOps Per Cycle: 3.02 IPC: 2.44 Block RThroughput: 24.0
Cycles with backend pressure increase [ 89.90% ] Throughput Bottlenecks: Resource Pressure [ 89.65% ] - SKXPort1 [ 0.04% ] - SKXPort2 [ 12.42% ] - SKXPort3 [ 12.42% ] - SKXPort5 [ 89.52% ] Data Dependencies: [ 37.06% ] - Register Dependencies [ 37.06% ] - Memory Dependencies [ 0.00% ] ```
After this revision (inline_asm version, vblendps instructions are indeed emitted): ``` Iterations: 100 Instructions: 6300 Total Cycles: 2015 Total uOps: 7700
Dispatch Width: 6 uOps Per Cycle: 3.82 IPC: 3.13 Block RThroughput: 20.0
Cycles with backend pressure increase [ 83.47% ] Throughput Bottlenecks: Resource Pressure [ 83.18% ] - SKXPort0 [ 14.49% ] - SKXPort1 [ 14.54% ] - SKXPort2 [ 19.70% ] - SKXPort3 [ 19.70% ] - SKXPort5 [ 83.03% ] - SKXPort6 [ 14.49% ] Data Dependencies: [ 39.75% ] - Register Dependencies [ 39.75% ] - Memory Dependencies [ 0.00% ] ```
An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).
Differential Revision: https://reviews.llvm.org/D114393
|
 | mlir/include/mlir/Dialect/X86Vector/Transforms.h |
 | mlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp |
 | mlir/test/Dialect/Vector/vector-transpose-lowering.mlir |
 | mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp |
 | utils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel |
 | mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir |
 | mlir/test/lib/Dialect/Vector/CMakeLists.txt |
Commit
06d0d449d8555ae5f1ac33e8d4bb4ae40eb080d3
by martin[COFF] [ARM64] Create symbols with regular intervals for relocations against temporary symbols
For relocations against temporary symbols (that don't persist in the object file), we normally adjust them to reference the start of the section.
For adrp relocations, the immediate offset from the referenced symbol is stored in the opcode as the 21 bit signed immediate; this means that the symbol referenced must be within +/- 1 MB from the referenced symbol.
Create label symbols with regular intervals (1 MB intervals). For relocations against temporary symbols, pick the preceding added offset symbol and make the relocation against that instead of against the start of the section.
This should fix the root issue behind https://bugs.llvm.org/show_bug.cgi?id=52378.
Differential Revision: https://reviews.llvm.org/D114340
|
 | llvm/lib/MC/WinCOFFObjectWriter.cpp |
 | llvm/test/MC/AArch64/coff-relocations-offset.s |
Commit
7c15da67614eca9272553ecfe8c1a0f6f68c134b
by martin[LLD] [COFF] Interpret the immediate in ARM64 adr/adrp relocations as signed 21 bit
This matches how MS link.exe interprets this relocation.
Differential Revision: https://reviews.llvm.org/D114347
|
 | lld/COFF/Chunks.cpp |
 | lld/test/COFF/arm64-relocs-imports.test |
Commit
4e5488afb27a64d12a76b770cc86bab8074e9c57
by martin[AArch64] [COFF] Move jump tables back to the readonly section
This essentially reverts f5884d255e78305d41c28c6e001a460ff83981d8 (D57277).
That commit was made as a workaround since LLVM back then didn't support cross-section relative relocations (IMAGE_REL_ARM64_REL32) in COFF for ARM64. Support for this was implemented later, in d5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37 (D99572) and 382c505d9cfca8adaec47aea2da7bbcbc00fc05c (D102217).
The commit that moved jump tables to the function section noted that it woud be ideal to utilize IMAGE_REL_ARM64_REL32.
Differential Revision: https://reviews.llvm.org/D113576
|
 | llvm/test/CodeGen/AArch64/win64-jumptable.ll |
 | llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp |
Commit
d703b922961e0d02a5effdd4bfbb23ad50a3cc9f
by martin[LLD] [COFF] Omit section symbols and IMAGE_SYM_CLASS_LABEL from the PE symbol table
The section symbols aren't of much practical use when looking at a linked image. This shrinks one observed mingw style unstripped binary by 14%.
IMAGE_SYM_CLASS_LABEL is in spirit the same as a temporary assembler label that isn't emitted on the object file level at all.
Differential Revision: https://reviews.llvm.org/D113866
|
 | lld/COFF/Writer.cpp |
 | lld/test/COFF/symtab.test |
 | lld/test/COFF/strtab-size.s |
Commit
dc79d73605305f9dfaa7eb777b6ed317363bdb04
by david.green[ARM] Add an test for showing the incorrect aliasing info around masked loads/stores. NFC
|
 | llvm/test/CodeGen/Thumb2/mve-masked-store-mmo.ll |
Commit
8ea3e70fb02e59ddfd6a050344c7d177b11104f7
by david.green[X86] Regenerate X86/vmaskmov-offset.ll check lines as per new mir format. NFC
|
 | llvm/test/CodeGen/X86/vmaskmov-offset.ll |