Changes

Summary

  1. [lld-macho] Implement cstring deduplication (details)
  2. [lld-macho] Implement -force_load_swift_libs (details)
  3. BPF: fix relocation types in lib/Object/RelocationResolver.cpp (details)
  4. [RISCV] Use AVL Operand instead of GPR for tied mask pseudo for vwadd.wv and similar. (details)
  5. [RISCV] Masked compares should use a tail agnostic policy. (details)
  6. [RISCV] Use 0 for Log2SEW for vle1/vse1 intrinsics to enable vsetvli optimization. (details)
Commit 04259cde15a9fad31421e9161f160ee23f84ebdd by jezng
[lld-macho] Implement cstring deduplication

Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.

**Alignment Issues**

Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.

I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.

Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.

In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.

With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.

It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.

**Performance Numbers**

CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.99          4.14         4.015        4.0365     0.0492336
  Difference at 95.0% confidence
          0.0865 +/- 0.027245
          2.18987% +/- 0.689746%
          (Student's t, pooled s = 0.0425673)

As expected, cstring merging incurs some non-trivial overhead.

When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.89          4.02         3.935        3.9435   0.043197831
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D102964
The file was modifiedlld/MachO/SyntheticSections.cpp
The file was modifiedlld/MachO/Writer.cpp
The file was modifiedlld/test/MachO/invalid/reserved-section-name.s
The file was modifiedlld/MachO/InputSection.cpp
The file was modifiedlld/MachO/ConcatOutputSection.cpp
The file was modifiedlld/MachO/Config.h
The file was modifiedlld/MachO/Driver.cpp
The file was modifiedlld/MachO/InputFiles.cpp
The file was modifiedlld/MachO/SyntheticSections.h
The file was modifiedlld/test/MachO/x86-64-relocs.s
The file was modifiedlld/MachO/InputFiles.h
The file was addedlld/test/MachO/cstring-dedup.s
The file was modifiedlld/test/MachO/subsections-section-relocs.s
The file was modifiedlld/MachO/Symbols.cpp
The file was addedlld/test/MachO/invalid/cstring-dedup.s
The file was modifiedlld/MachO/UnwindInfoSection.cpp
The file was modifiedlld/MachO/ConcatOutputSection.h
The file was modifiedlld/MachO/InputSection.h
The file was modifiedlld/MachO/Options.td
Commit 447dfbe005a7884766c7d073566eae710aa9ed5b by jezng
[lld-macho] Implement -force_load_swift_libs

It causes libraries whose names start with "swift" to be force-loaded.
Note that unlike the more general `-force_load`, this flag only applies
to libraries specified via LC_LINKER_OPTIONS, and not those passed on
the command-line. This is what ld64 does.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D103709
The file was addedlld/test/MachO/force-load-swift-libs.ll
The file was modifiedlld/MachO/Config.h
The file was modifiedlld/MachO/Options.td
The file was modifiedlld/MachO/Driver.cpp
Commit 8ce45f97283460a330201a199e686e8e088805e0 by yhs
BPF: fix relocation types in lib/Object/RelocationResolver.cpp

Commit 6a2ea84600ba ("BPF: Add more relocation kinds")
added new relocations R_BPF_64_ABS64 and R_BPF_64_ABS32
for normal 64-bit and 32-bit data relocations.
This is to replace some of functionalities with
R_BPF_64_64 and R_BPF_64_32 so that new R_BPF_64_64
and R_BPF_64_32 semantics are for ld_imm64 and
call instructions only.

The BPF support in lib/Object/RelocationResolver.cpp
is used to perform normal data relocations for
the case like DWARFObjInMemory with an object file
(search function getRelocationResolver() in file
DebugInfo/DWARF/DWARFContext.cpp) or llvm-readobj
to dump ".stack_sizes" section data.
In all these casees, normal 64-bit and 32-bit relocations
are performed and such resolution resolution
is exactly what implemented in RelocationResolver.cpp.

But Commit 6a2ea84600ba missed to change
R_BPF_64_64/R_BPF_64_32 to R_BPF_64_ABS64/R_BPF_64_ABS32.
This patch fixed the issue and added a test for it
with llvm-readobj dumping ".stack_sizes" section.

Differential Revision: https://reviews.llvm.org/D103864
The file was addedllvm/test/Object/BPF/lit.local.cfg
The file was addedllvm/test/Object/BPF/yaml2obj-elf-bpf-rel.yaml
The file was modifiedllvm/lib/Object/RelocationResolver.cpp
Commit 7a105b5768575c62802fd662366b5a759eb88447 by craig.topper
[RISCV] Use AVL Operand instead of GPR for tied mask pseudo for vwadd.wv and similar.

I mistakenly copied this from an older version of our internal
repo.
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
Commit ae3ab4f0ec6190ebc19775002b59a36c06848bf3 by craig.topper
[RISCV] Masked compares should use a tail agnostic policy.

Writes of a mask result are always tail agnostic.

Unfortunately, this seems to have made codegen worse. I can only
think this must be because the vsetvli was acting as some sort
of barrier that prevented some code movement in the scheduler.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D103331
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsle-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsne-rv64.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsltu-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsgtu-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfgt-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfne-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsge-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsgeu-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfgt-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfge-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsltu-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsleu-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmslt-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfne-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsne-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmslt-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfle-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmflt-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsgt-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsgtu-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmflt-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsle-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsleu-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmseq-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfeq-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmseq-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsgt-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfle-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfeq-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmfge-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsge-rv32.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vmsgeu-rv32.ll
Commit 7c4e9a68264ffeef6178865be76c45c4fb6390af by craig.topper
[RISCV] Use 0 for Log2SEW for vle1/vse1 intrinsics to enable vsetvli optimization.

Missed in D103299.
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vse1-rv64.ll
The file was modifiedllvm/test/CodeGen/RISCV/rvv/vse1-rv32.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp