Changes

Summary

  1. [NFC][scudo] Print errno of fork failure (details)
  2. [AIX] Define __STDC_NO_ATOMICS__ and __STDC_NO_THREADS__ predefined macros (details)
  3. [AMDGPU] Add v5f32/VReg_160 support for MIMG instructions (details)
  4. Revert "[AIX] Define __STDC_NO_ATOMICS__ and __STDC_NO_THREADS__ predefined macros" (details)
  5. [AIX] Define __STDC_NO_ATOMICS__ and __STDC_NO_THREADS__ (details)
  6. [AMDGPU] Allow oversize vaddr in GFX10 MIMG assembly (details)
  7. [yaml2obj] Fix buildbot-issue-4886 (details)
  8. [lld-macho] Implement cstring deduplication (details)
  9. [lld-macho] Implement -force_load_swift_libs (details)
Commit b41b76b303cdfde389f70e0a311698d7391423f1 by Vitaly Buka
[NFC][scudo] Print errno of fork failure

This fork fails sometime on sanitizer-x86_64-linux-qemu bot.
The file was modifiedcompiler-rt/lib/scudo/standalone/tests/wrappers_c_test.cpp
Commit e6629be31e67190f0a524f009752d73410894560 by cbowler
[AIX] Define __STDC_NO_ATOMICS__ and __STDC_NO_THREADS__ predefined macros

Differential Revision: https://reviews.llvm.org/D103707
The file was modifiedclang/test/Preprocessor/init-ppc.c
The file was modifiedclang/lib/Basic/Targets/OSTargets.h
Commit f8816c7400250961af4810956248c3636a5fcb04 by carl.ritson
[AMDGPU] Add v5f32/VReg_160 support for MIMG instructions

Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are
required for a MIMG address operand.

Maintain _V8 instruction variants of pseudo instructions allowing
assembly prior to GFX10 to work as-is.  Currently the validator
can tell for GFX10 what the correct size is, so will disallow
oversize address registers.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103672
The file was modifiedllvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.encode.ll
The file was modifiedllvm/test/MC/Disassembler/AMDGPU/mimg_gfx10.txt
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.dim.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.o.dim.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.o.dim.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.sample.g16.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.nsa.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.cpp
The file was modifiedllvm/test/MC/AMDGPU/gfx10_asm_mimg.s
The file was modifiedllvm/test/MC/Disassembler/AMDGPU/gfx10_mimg.txt
The file was modifiedllvm/lib/Target/AMDGPU/MIMGInstructions.td
The file was modifiedllvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
The file was modifiedllvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.dim.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.o.dim.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.a16.dim.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIInstrInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/nsa-reassign.mir
Commit f97e01e61af1115bb9cb68dbdc70cc7f9f884759 by cebowleratibm
Revert "[AIX] Define __STDC_NO_ATOMICS__ and __STDC_NO_THREADS__ predefined macros"

This reverts commit e6629be31e67190f0a524f009752d73410894560.
The file was modifiedclang/lib/Basic/Targets/OSTargets.h
The file was modifiedclang/test/Preprocessor/init-ppc.c
Commit f38eff777e46f42884d82815d0b39766520ac2bf by cebowleratibm
[AIX] Define __STDC_NO_ATOMICS__ and __STDC_NO_THREADS__

Revert/reapply to fix Git authorship metadata

Differential Revision: https://reviews.llvm.org/D103707
The file was modifiedclang/test/Preprocessor/init-ppc.c
The file was modifiedclang/lib/Basic/Targets/OSTargets.h
Commit c8bbfb8cf5eac991351c03c41a15abaac4b78ecf by carl.ritson
[AMDGPU] Allow oversize vaddr in GFX10 MIMG assembly

As a follow up to D103672, we should allow vaddr to be larger than
required when assembling GFX10 MIMG instructions.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D103733
The file was modifiedllvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
The file was modifiedllvm/test/MC/AMDGPU/gfx10_err_pos.s
The file was modifiedllvm/test/MC/AMDGPU/gfx10_asm_mimg.s
Commit 310d2b4957c8c11387354c58d15e3249971accc7 by esme.yi
[yaml2obj] Fix buildbot-issue-4886

XCOFFEmitter.cpp:67:16: runtime error: null pointer passed as argument 2,
which is declared to never be null
The file was modifiedllvm/lib/ObjectYAML/XCOFFEmitter.cpp
Commit 04259cde15a9fad31421e9161f160ee23f84ebdd by jezng
[lld-macho] Implement cstring deduplication

Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.

**Alignment Issues**

Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.

I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.

Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.

In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.

With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.

It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.

**Performance Numbers**

CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.99          4.14         4.015        4.0365     0.0492336
  Difference at 95.0% confidence
          0.0865 +/- 0.027245
          2.18987% +/- 0.689746%
          (Student's t, pooled s = 0.0425673)

As expected, cstring merging incurs some non-trivial overhead.

When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.89          4.02         3.935        3.9435   0.043197831
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D102964
The file was modifiedlld/MachO/Driver.cpp
The file was modifiedlld/MachO/InputFiles.cpp
The file was modifiedlld/MachO/InputSection.cpp
The file was modifiedlld/MachO/SyntheticSections.cpp
The file was modifiedlld/test/MachO/subsections-section-relocs.s
The file was modifiedlld/test/MachO/x86-64-relocs.s
The file was modifiedlld/test/MachO/invalid/reserved-section-name.s
The file was modifiedlld/MachO/SyntheticSections.h
The file was addedlld/test/MachO/invalid/cstring-dedup.s
The file was modifiedlld/MachO/Symbols.cpp
The file was addedlld/test/MachO/cstring-dedup.s
The file was modifiedlld/MachO/ConcatOutputSection.h
The file was modifiedlld/MachO/UnwindInfoSection.cpp
The file was modifiedlld/MachO/Config.h
The file was modifiedlld/MachO/InputSection.h
The file was modifiedlld/MachO/Writer.cpp
The file was modifiedlld/MachO/ConcatOutputSection.cpp
The file was modifiedlld/MachO/Options.td
The file was modifiedlld/MachO/InputFiles.h
Commit 447dfbe005a7884766c7d073566eae710aa9ed5b by jezng
[lld-macho] Implement -force_load_swift_libs

It causes libraries whose names start with "swift" to be force-loaded.
Note that unlike the more general `-force_load`, this flag only applies
to libraries specified via LC_LINKER_OPTIONS, and not those passed on
the command-line. This is what ld64 does.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D103709
The file was modifiedlld/MachO/Config.h
The file was modifiedlld/MachO/Options.td
The file was addedlld/test/MachO/force-load-swift-libs.ll
The file was modifiedlld/MachO/Driver.cpp