Changes

Summary

  1. clang-ve-ninja: Add hpce-ve-main, hpce-ve-staging workers (details)
Commit 06987827915a936cb61dda7fc352c336f7f78ac6 by simon.moll
clang-ve-ninja: Add hpce-ve-main, hpce-ve-staging workers
The file was modifiedbuildbot/osuosl/master/config/builders.py (diff)
The file was modifiedbuildbot/osuosl/master/config/workers.py (diff)

Summary

  1. [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm) (details)
  2. [COFF] [ARM64] Create symbols with regular intervals for relocations against temporary symbols (details)
  3. [LLD] [COFF] Interpret the immediate in ARM64 adr/adrp relocations as signed 21 bit (details)
  4. [AArch64] [COFF] Move jump tables back to the readonly section (details)
  5. [LLD] [COFF] Omit section symbols and IMAGE_SYM_CLASS_LABEL from the PE symbol table (details)
  6. [ARM] Add an test for showing the incorrect aliasing info around masked loads/stores. NFC (details)
  7. [X86] Regenerate X86/vmaskmov-offset.ll check lines as per new mir format. NFC (details)
Commit b2729fda60dbda595e7b5974279d8f860bce75ab by nicolas.vasilache
[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)

This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Differential Revision: https://reviews.llvm.org/D114393
The file was modifiedmlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
The file was modifiedmlir/test/Dialect/Vector/vector-transpose-lowering.mlir
The file was modifiedmlir/test/lib/Dialect/Vector/CMakeLists.txt
The file was modifiedmlir/include/mlir/Dialect/X86Vector/Transforms.h
The file was modifiedmlir/lib/Dialect/X86Vector/Transforms/AVXTranspose.cpp
The file was addedmlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
The file was modifiedutils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel
Commit 06d0d449d8555ae5f1ac33e8d4bb4ae40eb080d3 by martin
[COFF] [ARM64] Create symbols with regular intervals for relocations against temporary symbols

For relocations against temporary symbols (that don't persist in
the object file), we normally adjust them to reference the start of
the section.

For adrp relocations, the immediate offset from the referenced
symbol is stored in the opcode as the 21 bit signed immediate; this
means that the symbol referenced must be within +/- 1 MB from the
referenced symbol.

Create label symbols with regular intervals (1 MB intervals). For
relocations against temporary symbols, pick the preceding added
offset symbol and make the relocation against that instead of
against the start of the section.

This should fix the root issue behind
https://bugs.llvm.org/show_bug.cgi?id=52378.

Differential Revision: https://reviews.llvm.org/D114340
The file was addedllvm/test/MC/AArch64/coff-relocations-offset.s
The file was modifiedllvm/lib/MC/WinCOFFObjectWriter.cpp
Commit 7c15da67614eca9272553ecfe8c1a0f6f68c134b by martin
[LLD] [COFF] Interpret the immediate in ARM64 adr/adrp relocations as signed 21 bit

This matches how MS link.exe interprets this relocation.

Differential Revision: https://reviews.llvm.org/D114347
The file was modifiedlld/test/COFF/arm64-relocs-imports.test
The file was modifiedlld/COFF/Chunks.cpp
Commit 4e5488afb27a64d12a76b770cc86bab8074e9c57 by martin
[AArch64] [COFF] Move jump tables back to the readonly section

This essentially reverts f5884d255e78305d41c28c6e001a460ff83981d8
(D57277).

That commit was made as a workaround since LLVM back then didn't
support cross-section relative relocations (IMAGE_REL_ARM64_REL32)
in COFF for ARM64. Support for this was implemented later,
in d5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37 (D99572) and
382c505d9cfca8adaec47aea2da7bbcbc00fc05c (D102217).

The commit that moved jump tables to the function section noted
that it woud be ideal to utilize IMAGE_REL_ARM64_REL32.

Differential Revision: https://reviews.llvm.org/D113576
The file was modifiedllvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
The file was modifiedllvm/test/CodeGen/AArch64/win64-jumptable.ll
Commit d703b922961e0d02a5effdd4bfbb23ad50a3cc9f by martin
[LLD] [COFF] Omit section symbols and IMAGE_SYM_CLASS_LABEL from the PE symbol table

The section symbols aren't of much practical use when looking at
a linked image. This shrinks one observed mingw style unstripped
binary by 14%.

IMAGE_SYM_CLASS_LABEL is in spirit the same as a temporary assembler
label that isn't emitted on the object file level at all.

Differential Revision: https://reviews.llvm.org/D113866
The file was modifiedlld/COFF/Writer.cpp
The file was modifiedlld/test/COFF/strtab-size.s
The file was modifiedlld/test/COFF/symtab.test
Commit dc79d73605305f9dfaa7eb777b6ed317363bdb04 by david.green
[ARM] Add an test for showing the incorrect aliasing info around masked loads/stores. NFC
The file was addedllvm/test/CodeGen/Thumb2/mve-masked-store-mmo.ll
Commit 8ea3e70fb02e59ddfd6a050344c7d177b11104f7 by david.green
[X86] Regenerate X86/vmaskmov-offset.ll check lines as per new mir format. NFC
The file was modifiedllvm/test/CodeGen/X86/vmaskmov-offset.ll

Summary

  1. clang-ve-ninja: Add hpce-ve-main, hpce-ve-staging workers (details)
Commit 06987827915a936cb61dda7fc352c336f7f78ac6 by simon.moll
clang-ve-ninja: Add hpce-ve-main, hpce-ve-staging workers
The file was modifiedbuildbot/osuosl/master/config/workers.py
The file was modifiedbuildbot/osuosl/master/config/builders.py