AbortedChanges

Changes from Git (git http://labmaster3.local/git/llvm-project.git)

Summary

  1. [libcxxabi] Fix layout of __cxa_exception for win64 (details)
  2. [ELF] Decrease alignment of ThunkSection on 64-bit targets from 8 to 4 (details)
  3. [LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS. (details)
  4. AMDGPU: Fix handling of infinite loops in fragment shaders (details)
  5. [libcxx] [Windows] Store the lconv struct returned from localeconv in (details)
Commit 165a6367631d643f8f186d7a809013652cf321d6 by hans
[libcxxabi] Fix layout of __cxa_exception for win64
Win64 isn't LP64, it's LLP64, but there's no __LLP64__ predefined - just
check _WIN64 in addition to __LP64__.
This fixes compilation after static asserts about the struct layout were
added in f2a436058fcbc11291e73badb44e243f61046183.
Differential Revision: https://reviews.llvm.org/D73838
(cherry picked from commit 09dc884eb2e4a433eb8c5ed20a17108091279295)
The file was modifiedlibcxxabi/src/cxa_exception.h
Commit db51c41a646c79e60cc763f0ecd58494112a32d1 by hans
[ELF] Decrease alignment of ThunkSection on 64-bit targets from 8 to 4
ThunkSection contains 4-byte instructions on all targets that use
thunks. Thunks should not be used in any performance sensitive places,
and locality/cache line/instruction fetching arguments should not apply.
We use 16 bytes as preferred function alignments for modern PowerPC
cores. In any case, 8 is not optimal.
Differential Revision: https://reviews.llvm.org/D72819
(cherry picked from commit 870094decfc9fe80c8e0a6405421b7d09b97b02b)
The file was modifiedlld/test/ELF/aarch64-cortex-a53-843419-thunk.s
The file was modifiedlld/test/ELF/aarch64-thunk-script.s
The file was modifiedlld/test/ELF/ppc64-dtprel.s
The file was modifiedlld/test/ELF/ppc64-ifunc.s
The file was modifiedlld/test/ELF/ppc64-toc-restore.s
The file was modifiedlld/test/ELF/aarch64-call26-thunk.s
The file was modifiedlld/test/ELF/aarch64-jump26-thunk.s
The file was modifiedlld/test/ELF/aarch64-thunk-pi.s
The file was modifiedlld/test/ELF/ppc64-tls-gd.s
The file was modifiedlld/ELF/SyntheticSections.cpp
The file was modifiedlld/test/ELF/ppc64-long-branch.s
Commit 852b37f83b2dd31ff4d708c2a789857418171f93 by hans
[LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS.
In D71281 a fix was put in to round up the size of a ThunkSection to the
nearest 4KiB when performing errata patching. This fixed a problem with
a very large instrumented program that had thunks and patches mutually
trigger each other. Unfortunately it triggers an assertion failure in an
AArch64 allyesconfig build of the kernel. There is a specific assertion
preventing an InputSectionDescription being larger than 4KiB. This will
always trigger if there is at least one Thunk needed in that
InputSectionDescription, which is possible for an allyesconfig build.
Abstractly the problem case is:
.text : {
         *(.text) ;
         ...
         . = ALIGN(SZ_4K);
         __idmap_text_start = .;
         *(.idmap.text)
         __idmap_text_end = .;
         ...
       } The assertion checks that __idmap_text_end - __idmap_start is <
4 KiB. Note that there is more than one InputSectionDescription in the
OutputSection so we can't just restrict the fix to OutputSections
smaller than 4 KiB.
The fix presented here limits the D71281 to InputSectionDescriptions
that meet the following conditions: 1.) The OutputSection is bigger than
the thunkSectionSpacing so adding thunks will affect the addresses of
following code. 2.) The InputSectionDescription is larger than 4 KiB.
This will prevent any assertion failures that an InputSectionDescription
is < 4 KiB in size.
We do this at ThunkSection creation time as at this point we know that
the addresses are stable and up to date prior to adding the thunks as
assignAddresses() will have been called immediately prior to thunk
generation.
The fix reverts the two tests affected by D71281 to their original state
as they no longer need the 4KiB size roundup. I've added simpler tests
to check for D71281 when the OutputSection size is larger than the
ThunkSection spacing.
Fixes https://github.com/ClangBuiltLinux/linux/issues/812
Differential Revision: https://reviews.llvm.org/D72344
(cherry picked from commit 01ad4c838466bd5db180608050ed8ccb3b62d136)
The file was modifiedlld/ELF/SyntheticSections.cpp
The file was modifiedlld/ELF/Relocations.cpp
The file was modifiedlld/ELF/SyntheticSections.h
The file was addedlld/test/ELF/arm-fix-cortex-a8-thunk-align.s
The file was modifiedlld/test/ELF/aarch64-cortex-a53-843419-thunk.s
The file was addedlld/test/ELF/aarch64-cortex-a53-843419-thunk-align.s
The file was modifiedlld/test/ELF/arm-fix-cortex-a8-thunk.s
Commit 5f6fec2404c5135247ae9e4e515e8d9d3242f790 by hans
AMDGPU: Fix handling of infinite loops in fragment shaders
Summary: Due to the fact that kill is just a normal intrinsic, even
though it's supposed to terminate the thread, we can end up with
provably infinite loops that are actually supposed to end successfully.
The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but
because there's no obvious place to make the loop branch to, it just
makes it return immediately, which skips the exports that are supposed
to happen at the end and hangs the GPU if all the threads end up being
killed.
While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess
if we did that when the kill is inside control flow. For now, we just
add a null export at the end to make sure that it always exports
something, which fixes the immediate problem without penalizing the more
common case. This means that we sometimes do two "done" exports when
only some of the threads enter the discard loop, but from tests the
hardware seems ok with that.
This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye,
hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70781
(cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960)
The file was addedllvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
Commit ca6b341bd5d7159d9e398eef1a787b649c5bc888 by hans
[libcxx] [Windows] Store the lconv struct returned from localeconv in
locale_t
This fixes using non-default locales, which currently can crash when
e.g. formatting numbers.
Within the localeconv_l function, the per-thread locale is temporarily
changed with __libcpp_locale_guard, then localeconv() is called,
returning an lconv * struct pointer.
When localeconv_l returns, the __libcpp_locale_guard dtor restores the
per-thread locale back to the original. This invalidates the contents of
the earlier returned lconv struct, and all C strings that are pointed to
within it are also invalidated.
Thus, to have an actually working localeconv_l function, the function
needs to allocate some sort of storage for the returned contents, that
stays valid for as long as the caller needs to use the returned struct.
Extend the libcxx/win32 specific locale_t class with storage for a deep
copy of a lconv struct, and change localeconv_l to take a reference to
the locale_t, to allow it to store the returned lconv struct there.
This works fine for libcxx itself, but wouldn't necessarily be right for
a caller that uses libcxx's localeconv_l function.
This fixes around 11 of libcxx's currently failing tests on windows.
Differential Revision: https://reviews.llvm.org/D69505
(cherry picked from commit 7db4f2c6945a24a7d81dad3362700353e2ec369e)
The file was modifiedlibcxx/src/support/win32/locale_win32.cpp
The file was modifiedlibcxx/include/support/win32/locale_win32.h