SuccessChanges

Changes from Git (git http://labmaster3.local/git/llvm-zorg.git)

Summary

  1. [zorg] Remove flang-aarch64-ubuntu-out-of-tree-new-driver (details)
Commit de74f83353c32d08126bddb778b9d89cc2708e46 by diana.picus
[zorg] Remove flang-aarch64-ubuntu-out-of-tree-new-driver

The new driver is on by default now, so this builder does the same thing
as flang-aarch64-ubuntu-out-of-tree.

Differential Revision: https://reviews.llvm.org/D102319
The file was modifiedbuildbot/osuosl/master/config/workers.py
The file was modifiedbuildbot/osuosl/master/config/builders.py

Changes from Git (git http://labmaster3.local/git/llvm-project.git)

Summary

  1. [libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case. (details)
  2. [OpenMP] Fix crashing critical section with hint clause (details)
  3. [SLP] Fix "gathering" of insertelement instructions (details)
  4. [mlir-opt] Don't enable `printOpOnDiagnostic` if it was explicitly disabled. (details)
  5. [scudo] Add unmapTestOnly() to secondary. (details)
  6. PR50456: Properly handle multiple escaped newlines in a '*/'. (details)
  7. [dsymutil] Compute the output location once per input file (NFC) (details)
  8. [dsymutil] Use EXIT_SUCCESS and EXIT_FAILURE (NFC) (details)
  9. [dsymutil] Emit an error when the Mach-O exceeds the 4GB limit. (details)
  10. [NFC][scudo] Avoid cast in test (details)
  11. [NFC][OMP] Fix 'unused' warning (details)
  12. Add a range-based wrapper for std::unique(begin, end, binary_predicate) (details)
  13. lld-coff: Simplify a few lambda uses after 7975dd033cb9 (details)
  14. [NFC][scudo] Add paramenters DCHECKs (details)
  15. Revert "Do not create LLVM IR `constant`s for objects with dynamic initialisation" (details)
  16. [libomptarget] [amdgpu] Added LDS usage to the kernel trace (details)
  17. Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" (details)
  18. Making Instrumentation aware of LoopNest Pass (details)
  19. [lld:elf] Weaken the requirement for a computed binding to be STB_LOCAL (details)
  20. [Sema] Always search the full function scope context if a potential availability violation is encountered (details)
  21. [cfe] Support target-specific escaped character in inline asm (details)
  22. [JITLink] Enable creation and management of mutable block content. (details)
  23. AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions (details)
  24. [Test] Add test for unreachable backedge with duplicating predecessors (details)
  25. [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (details)
  26. [JITLink] Suppress expect-death test in release mode. (details)
  27. [RISCV] Optimize xor/or with immediate in the zbs extension (details)
  28. [analyzer][ctu] Avoid parsing invocation list again and again during on-demand parsing of CTU (details)
  29. Revert "[analyzer][ctu] Avoid parsing invocation list again and again during on-demand parsing of CTU" (details)
  30. [GlobalISel] Fix MachineIRBuilder not using the DstOp argument for G_SHUFFLE_VECTOR. (details)
  31. [analyzer][ctu] Reland "Avoid parsing invocation list again and again.. (details)
  32. [libomptarget][nfc] Accept callable for hsa iterate_symbols (details)
  33. [TRE] Reland: allow TRE for non-capturing calls. (details)
  34. [mlir] Check only last dim stride in transfer op lowering (details)
  35. [clang][ARM] Remove non-existent arm1136jz-s CPU (details)
  36. [GlobalISel] Silence unused variable warning in Release builds. NFC. (details)
  37. [llvm][ARM] Remove non-existent arm1176j-s CPU (details)
  38. [clang][ARM] Remove non-existent arm9312 CPU (details)
  39. [ARM][NEON] Combine base address updates for vld1x intrinsics (details)
  40. [llvm-exegesis] Loop unrolling for loop snippet repetitor mode (details)
  41. [IR] Allow Value::replaceUsesWithIf() to process constants (details)
Commit ca17b26d4d7a1bd95346184d3f3ccdf006c33781 by Dhruva.Chakrabarti
[libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case.

Fix the case where NumTeams was set incorrectly instead of NumThreads

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103037
The file was modifiedopenmp/libomptarget/plugins/amdgpu/src/rtl.cpp
Commit 95cefacfe1c1fe920f22b749f17f630925bd6094 by hansang.bae
[OpenMP] Fix crashing critical section with hint clause

Runtime was using the default lock type without using the hint.

Differential Revision: https://reviews.llvm.org/D102955
The file was addedopenmp/runtime/test/critical/omp_critical_with_hint.c
The file was modifiedopenmp/runtime/src/kmp_csupport.cpp
Commit b2cd89501164ebadcddfd92304fe7b5675e59748 by anton.a.afanasyev
[SLP] Fix "gathering" of insertelement instructions

For rare exceptional case vector tree node (insertelements for now only)
is marked as `NeedToGather`, this case is processed by patch. Follow-up
of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135.

Differential Revision: https://reviews.llvm.org/D102675
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
The file was modifiedllvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll
Commit 60c735d409bf8077b2611d102b9a5e5af48464d5 by riddleriver
[mlir-opt] Don't enable `printOpOnDiagnostic` if it was explicitly disabled.

We are currently explicitly setting the flag solely based on the value of `-verify`, which ends up ignoring the situation where the user explicitly disabled this option from the command line.

Differential Revision: https://reviews.llvm.org/D102952
The file was modifiedmlir/lib/Support/MlirOptMain.cpp
Commit 1fb6a0307240b0c543ec5babb35e39db2c39052b by 31459023+hctim
[scudo] Add unmapTestOnly() to secondary.

When trying to track down a vaddr-poisoning bug, I found that that the
secondary cache isn't emptied on test teardown. We should probably do
that to make the tests hermetic. Otherwise, repeating the tests lots of
times using --gtest_repeat fails after the mmap vaddr space is
exhausted.

To repro:
$ ninja check-scudo_standalone # build
$ ./projects/compiler-rt/lib/scudo/standalone/tests/ScudoUnitTest-x86_64-Test \
--gtest_filter=ScudoSecondaryTest.*:-ScudoSecondaryTest.SecondaryCombinations \
--gtest_repeat=10000

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D102874
The file was modifiedcompiler-rt/lib/scudo/standalone/secondary.h
The file was modifiedcompiler-rt/lib/scudo/standalone/tests/secondary_test.cpp
The file was modifiedcompiler-rt/lib/scudo/standalone/combined.h
Commit de6164ec4da0cfea1b0d0e472c432ea1be4d9c29 by richard
PR50456: Properly handle multiple escaped newlines in a '*/'.
The file was modifiedclang/lib/Lex/Lexer.cpp
The file was modifiedclang/test/Lexer/block_cmt_end.c
Commit aab488ac2a56d5829c6d51471987e5c630951074 by Jonas Devlieghere
[dsymutil] Compute the output location once per input file (NFC)

Compute the location of the output file just once outside the loop over
the different architectures.
The file was modifiedllvm/tools/dsymutil/dsymutil.cpp
Commit 7bf7b80b1958944f449960325f9a5e446f8d1d22 by Jonas Devlieghere
[dsymutil] Use EXIT_SUCCESS and EXIT_FAILURE (NFC)
The file was modifiedllvm/tools/dsymutil/dsymutil.cpp
Commit 1ec03f3de5d580d85cc256058cc0d2dd254b9e1a by Jonas Devlieghere
[dsymutil] Emit an error when the Mach-O exceeds the 4GB limit.

The Mach-O object file format is limited to 4GB because its used of
32-bit offsets in the header. It is possible for dsymutil to (silently)
emit an invalid binary. Instead of having consumers deal with this, emit
an error instead.
The file was modifiedllvm/tools/dsymutil/dsymutil.cpp
Commit f5bde3d476c2c6aee4f126d84982e8d2f0f7e408 by Vitaly Buka
[NFC][scudo] Avoid cast in test
The file was modifiedcompiler-rt/lib/scudo/standalone/tests/common_test.cpp
Commit 676a789a5bc6d42838d01a1cddf8281dc1e058de by Vitaly Buka
[NFC][OMP] Fix 'unused' warning
The file was modifiedllvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
Commit a08673d04a99efe200fb53f3ef57b5cfb8e513bb by dblaikie
Add a range-based wrapper for std::unique(begin, end, binary_predicate)
The file was modifiedllvm/include/llvm/ADT/STLExtras.h
The file was modifiedllvm/unittests/ADT/STLExtrasTest.cpp
Commit e5b66a373414036db22d19647d913c2571df2701 by dblaikie
lld-coff: Simplify a few lambda uses after 7975dd033cb9
The file was modifiedlld/COFF/Chunks.cpp
Commit a0169b2ed198154117e82bf24ae7238454c2e9a2 by Vitaly Buka
[NFC][scudo] Add paramenters DCHECKs

Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D103042
The file was modifiedcompiler-rt/lib/scudo/standalone/memtag.h
Commit d881319cc5606baa7668405a296d0960a83a1e4c by thakis
Revert "Do not create LLVM IR `constant`s for objects with dynamic initialisation"

This reverts commit 13dd65b3a1a3ac049b5f3a9712059f7c61649bea.
Breaks check-clang on macOS, see https://reviews.llvm.org/D102693
The file was removedclang/test/CodeGenCXX/clang-sections-1.cpp
The file was removedclang/test/CodeGenCXX/const-dynamic-init.cpp
The file was modifiedclang/lib/Sema/SemaDecl.cpp
Commit 96d70f4d289b2a5a43bc7bd6285816c792e55c35 by Dhruva.Chakrabarti
[libomptarget] [amdgpu] Added LDS usage to the kernel trace

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103059
The file was modifiedopenmp/libomptarget/plugins/amdgpu/src/rtl.cpp
Commit e77d24f70a8a0cea7de8101b7374fa45ffe18136 by konndennsa
Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"

This reverts commit d65c32fb41b03a35a2a16330ba1ea15cf6818f04.
The file was modifiedllvm/lib/Passes/PassRegistry.def
The file was modifiedllvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
The file was modifiedllvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
The file was modifiedllvm/include/llvm/Transforms/Utils/UnrollLoop.h
The file was modifiedllvm/lib/Passes/PassBuilder.cpp
The file was modifiedllvm/include/llvm/Transforms/Scalar/LoopUnrollAndJamPass.h
The file was modifiedllvm/test/Transforms/LoopUnrollAndJam/innerloop.ll
Commit a2ae14514a26bda6d8dae11e56f3186931b8d8c7 by aeubanks
Making Instrumentation aware of LoopNest Pass

Intrumentation callbacks are not made aware of LoopNest passes. From the loop pass manager, we can pass the outermost loop of the LoopNest to instrumentation in case of LoopNest passes.

The current patch made the change in two places in StandardInstrumentation.cpp. I will submit a proper patch where the OuterMostLoop is passed from the LoopPassManager to the call backs. That way we will avoid making changes at multiple places in StandardInstrumentation.cpp.

A testcase also will be submitted.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102463
The file was addedllvm/test/Other/loopnest-callback.ll
The file was modifiedllvm/include/llvm/Transforms/Scalar/LoopPassManager.h
Commit 2f6516605615e99221181ec2a1947f9ad55aa9d3 by nathan
[lld:elf] Weaken the requirement for a computed binding to be STB_LOCAL

Given the following scenario:

```
// Cat.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

extern "C" int puts(char const *);
void Cat::makeNoise() const { puts("Meow"); }
void doThingWithCat(Animal *a) { static_cast<Cat *>(a)->makeNoise(); }

// CatUser.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

void doThingWithCat(Animal *a);

void useDoThingWithCat() {
  Cat *d = new Cat;
  doThingWithCat(d);
}

// cat.ver
{
  global: _Z17useDoThingWithCatv;
  local: *;
};

$ clang++ Cat.cpp CatUser.cpp -fpic -flto=thin -fwhole-program-vtables
-shared -O3 -fuse-ld=lld -Wl,--lto-whole-program-visibility
-Wl,--version-script,cat.ver
```

We cannot devirtualize `Cat::makeNoise`. The issue is complex:

Due to `-fsplit-lto-unit` and usage of type metadata, we place the Cat
vtable declaration into module 0 and the Cat vtable definition with type
metadata into module 1, causing duplicate entries (Undefined followed by
Defined) in the `lto::InputFile::symbols()` output.
In `BitcodeFile::parse`, after processing the `Undefined` then the
`Defined`, the final state is `Defined`.
In `BitcodeCompiler::add`, for the first symbol, `computeBinding`
returns `STB_LOCAL`, then we reset it to `Undefined` because it is
prevailing (`versionId` is `preserved`). For the second symbol, because
the state is now `Undefined`, `computeBinding` returns `STB_GLOBAL`,
causing `ExportDynamic` to be true and suppressing devirtualization.

In D77280, the `computeBinding` change used a stricter `isDefined()`
condition to make weak``Lazy` symbol work.
This patch relaxes the condition to weaker `!isLazy()` to keep it
working while making the devirtualization work as well.

Differential Revision: https://reviews.llvm.org/D98686
The file was addedlld/test/ELF/lto/devirt_split_unit_localize.ll
The file was modifiedlld/ELF/Symbols.cpp
Commit a5a3efa82a77ab7a1c9787ef97b547a4a81f2440 by logan.r.smith0
[Sema] Always search the full function scope context if a potential availability violation is encountered

This fixes both https://bugs.llvm.org/show_bug.cgi?id=50309 and https://bugs.llvm.org/show_bug.cgi?id=50310.

Previously, lambdas inside functions would mark their own bodies for later analysis when encountering a potentially unavailable decl, without taking into consideration that the entire lambda itself might be correctly guarded inside an @available check. The same applied to inner class member functions. Blocks happened to work as expected already, since Sema::getEnclosingFunction() skips through block scopes.

This patch instead simply and conservatively marks the entire outermost function scope for search, and removes some special-case logic that prevented DiagnoseUnguardedAvailabilityViolations from traversing down into lambdas and nested functions. This correctly accounts for arbitrarily nested lambdas, inner classes, and blocks that may be inside appropriate @available checks at any ancestor level. It also treats all potential availability violations inside functions consistently, without being overly sensitive to the current DeclContext, which previously caused issues where e.g. nested struct members were warned about twice.

DiagnoseUnguardedAvailabilityViolations now has more work to do in some cases, particularly in functions with many (possibly deeply) nested lambdas and classes, but the big-O is the same, and the simplicity of the approach and the fact that it fixes at least two bugs feels like a strong win.

Differential Revision: https://reviews.llvm.org/D102338
The file was modifiedclang/test/SemaObjC/unguarded-availability.m
The file was modifiedclang/lib/Sema/SemaAvailability.cpp
The file was modifiedclang/include/clang/Sema/Sema.h
The file was modifiedclang/lib/Sema/SemaExpr.cpp
Commit 6685a3f3e4c497a3a0fd06aa4e77cb442325d1ba by minyihh
[cfe] Support target-specific escaped character in inline asm

GCC allows each target to define a set of non-letter and non-digit
escaped characters for inline assembly that will be replaced by another
string (They call this "punctuation" characters. The existing "%%" and
"%{" -- replaced by '%' and '{' at the end -- can be seen as special
cases shared by all targets).
This patch implements this feature by adding a new hook in `TargetInfo`.

Differential Revision: https://reviews.llvm.org/D103036
The file was modifiedclang/lib/AST/Stmt.cpp
The file was modifiedclang/lib/Basic/Targets/M68k.h
The file was modifiedclang/include/clang/Basic/TargetInfo.h
The file was modifiedclang/lib/Basic/Targets/M68k.cpp
The file was addedclang/test/CodeGen/m68k-asm.c
Commit 82ad2b6e94b6e9285de38aab9e2e5d87b06a377b by Lang Hames
[JITLink] Enable creation and management of mutable block content.

This patch introduces new operations on jitlink::Blocks: setMutableContent,
getMutableContent and getAlreadyMutableContent. The setMutableContent method
will set the block content data and size members and flag the content as
mutable. The getMutableContent method will return a mutable copy of the existing
content value, auto-allocating and populating a new mutable copy if the existing
content is marked immutable. The getAlreadyMutableMethod asserts that the
existing content is already mutable and returns it.

setMutableContent should be used when updating the block with totally new
content backed by mutable memory. It can be used to change the size of the
block. The argument value should *not* be shared with any other block.

getMutableContent should be used when clients want to modify the existing
content and are unsure whether it is mutable yet.

getAlreadyMutableContent should be used when clients want to modify the existing
content and know from context that it must already be immutable.

These operations reduce copy-modify-update boilerplate and unnecessary copies
introduced when clients couldn't me sure whether the existing content was
mutable or not.
The file was modifiedllvm/lib/ExecutionEngine/JITLink/ELF_x86_64.cpp
The file was modifiedllvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
The file was modifiedllvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.h
The file was modifiedllvm/lib/ExecutionEngine/JITLink/MachO_x86_64.cpp
The file was modifiedllvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
The file was modifiedllvm/lib/ExecutionEngine/JITLink/MachO_arm64.cpp
The file was modifiedllvm/unittests/ExecutionEngine/JITLink/LinkGraphTests.cpp
The file was modifiedllvm/include/llvm/ExecutionEngine/JITLink/x86_64.h
Commit 90d784053f070f734de5b23da892e470a3b4e738 by Christudasan.Devadasan
AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100726
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
The file was addedllvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
The file was addedllvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
The file was modifiedllvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
Commit ce245246043d3c4f12515b2c773ed6c9174345b5 by mkazantsev
[Test] Add test for unreachable backedge with duplicating predecessors
The file was modifiedllvm/test/Transforms/LoopDeletion/eval_first_iteration.ll
Commit 2531fd70d19aa5d61feb533bbdeee7717a4129eb by mkazantsev
[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration

This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: reames
The file was modifiedllvm/test/Transforms/LoopDeletion/eval_first_iteration.ll
The file was modifiedllvm/lib/Transforms/Scalar/LoopDeletion.cpp
The file was modifiedllvm/test/Transforms/LoopDeletion/zero-btc.ll
The file was modifiedllvm/test/Transforms/LoopDeletion/noop-loops-with-subloops.ll
Commit 0ab14f19685eefa38cf2598071a18b0e117c4b30 by Lang Hames
[JITLink] Suppress expect-death test in release mode.
The file was modifiedllvm/unittests/ExecutionEngine/JITLink/LinkGraphTests.cpp
Commit bf77317049a880af541e31ba7ea43cb229ee4c0f by powerman1st
[RISCV] Optimize xor/or with immediate in the zbs extension

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102893
The file was modifiedllvm/test/CodeGen/RISCV/rv64zbs.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVInstrInfoB.td
The file was modifiedllvm/test/CodeGen/RISCV/rv32zbs.ll
Commit db8af0f21dc9aad4d336754c857c24470afe53e3 by balazs.benics
[analyzer][ctu] Avoid parsing invocation list again and again during on-demand parsing of CTU

During CTU, the *on-demand parsing* will read and parse the invocation
list to know how to compile the file being imported. However, it seems
that the invocation list will be parsed again if a previous parsing
has failed.
Then, parse again and fail again. This patch tries to overcome the
problem by storing the error code during the first parsing, and
re-create the stored error during the later parsings.

Reviewed By: steakhal

Patch By: OikawaKirie!

Differential Revision: https://reviews.llvm.org/D101763
The file was modifiedclang/lib/CrossTU/CrossTranslationUnit.cpp
The file was addedclang/test/Analysis/ctu-on-demand-parsing-multiple-invocation-list-parsing.cpp
The file was modifiedclang/include/clang/CrossTU/CrossTranslationUnit.h
Commit f05b70c23687fdf3de349ab1dd99ad79c4c40e85 by balazs.benics
Revert "[analyzer][ctu] Avoid parsing invocation list again and again during on-demand parsing of CTU"

This reverts commit db8af0f21dc9aad4d336754c857c24470afe53e3.

clang-x86_64-debian-fast fails on this.

+ : 'RUN: at line 4'
+ /usr/bin/ccache
/b/1/clang-x86_64-debian-fast/llvm.src/clang/test/Analysis/ctu-on-demand-parsing-multiple-invocation-list-parsing.cpp
-fPIC -shared -o
/b/1/clang-x86_64-debian-fast/llvm.obj/tools/clang/test/Analysis/Output/ctu-on-demand-parsing-multiple-invocation-list-parsing.cpp.tmp/mock_open.so
ccache: error: execv of
/b/1/clang-x86_64-debian-fast/llvm.src/clang/test/Analysis/ctu-on-demand-parsing-multiple-invocation-list-parsing.cpp
failed: Permission denied
The file was modifiedclang/include/clang/CrossTU/CrossTranslationUnit.h
The file was modifiedclang/lib/CrossTU/CrossTranslationUnit.cpp
The file was removedclang/test/Analysis/ctu-on-demand-parsing-multiple-invocation-list-parsing.cpp
Commit ff30436dc5e54b85b5b942a3a84d0720f657b36f by Amara Emerson
[GlobalISel] Fix MachineIRBuilder not using the DstOp argument for G_SHUFFLE_VECTOR.
The file was modifiedllvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
Commit d59b4acf80d59c461decd41400988febaf0af8ca by balazs.benics
[analyzer][ctu] Reland "Avoid parsing invocation list again and again..

..during on-demand parsing of CTU"

During CTU, the *on-demand parsing* will read and parse the invocation
list to know how to compile the file being imported. However, it seems
that the invocation list will be parsed again if a previous parsing
has failed.
Then, parse again and fail again. This patch tries to overcome the
problem by storing the error code during the first parsing, and
re-create the stored error during the later parsings.

Reland without test.

Reviewed By: steakhal

Patch By: OikawaKirie!

Differential Revision: https://reviews.llvm.org/D101763
The file was modifiedclang/include/clang/CrossTU/CrossTranslationUnit.h
The file was modifiedclang/lib/CrossTU/CrossTranslationUnit.cpp
Commit 75492e20fb7c8e3fc4bc0ff8a5eda844056652cb by jonathanchesterfield
[libomptarget][nfc] Accept callable for hsa iterate_symbols

[libomptarget][nfc] Accept callable for hsa iterate_symbols
Candidate refactor to simplify D102692

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D103030
The file was modifiedopenmp/libomptarget/plugins/amdgpu/impl/system.cpp
Commit 10c2e261598a9c1b641b5adb10d87d937aba8b58 by a.v.lapshin
[TRE] Reland: allow TRE for non-capturing calls.

The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap.
This patch does the same as D82085 plus fixes bootstrap error.

The problem with D82085 is that it does not create copies for byval
operands, while replacing function call with a branch.

Consider following example:

```
    int zoo ( S p1 );

    int foo ( int count, S p1 ) {
      if ( count > 10 )
        return zoo(p1);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      // lifetime.start p1.byvalue.temp
      return foo(count+1, p1);
      // lifetime.end p1.byvalue.temp
    }
```

After recursive call to foo is replaced with a jump into
start of the function, its parameters could be passed to
zoo function. i.e. temporarily variable created for byvalue
parameter "p1" could be passed to zoo. Finally zoo receives
broken operand:

```
    int foo ( int count, S p1 ) {
    :tailrecurse
      p1_tr = phi p1, p1.byvalue.temp
      if ( count > 10 )
        return zoo(p1_tr);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      lifetime.start p1.byvalue.temp
      memcpy (p1.byvalue.temp, p1_tr)
      count = count + 1
      lifetime.end p1.byvalue.temp
      br tailrecurse
    }
```

To prevent using p1.byvalue.temp after its scope finished by
lifetime.end marker this patch copies value from p1.byvalue.temp
into another temporarily variable and then copies this variable
into the input parameter for next iteration.

This patch passes bootstrap build and bootstrap build with AddressSanitizer.

Differential Revision: https://reviews.llvm.org/D85614
The file was addedllvm/test/Transforms/TailCallElim/tre-multiple-exits.ll
The file was addedllvm/test/Transforms/TailCallElim/tre-byval-parameter-2.ll
The file was addedllvm/test/Transforms/TailCallElim/tre-byval-parameter.ll
The file was modifiedllvm/test/Transforms/TailCallElim/basic.ll
The file was modifiedllvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
The file was addedllvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll
Commit 5017b0f88b81083d3f723e7a8e5cc19b1c4eb366 by springerm
[mlir] Check only last dim stride in transfer op lowering

Lower a 1D vector transfer op to LLVM if the last dim stride is 1. Also fixes a bug in the original unit stride computation.

Differential Revision: https://reviews.llvm.org/D102897
The file was modifiedmlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
The file was modifiedmlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
The file was modifiedmlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
Commit 5f4d383a59351711d4f64cbb6a04ef9ffc0d8f88 by david.spickett
[clang][ARM] Remove non-existent arm1136jz-s CPU

There is an ARM1136JF-S and an ARM1136J-S but I could find
no references to an ARM1136JZ-S. In CPU manuals or the manual
for Arm Compiler 5.

See:
https://developer.arm.com/documentation/ddi0211/latest/
https://developer.arm.com/documentation/dui0472/latest/

Using this CPU you get:
$ ./bin/clang --target=arm-linux-gnueabihf -march=armv3m -mcpu=arm1136jz-s -c /tmp/test.c -o /tmp/test.o
'arm1136jz-s' is not a recognized processor for this target (ignoring processor)

Since the llvm target does not know what it is.

This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D103019
The file was modifiedllvm/unittests/Support/TargetParserTest.cpp
The file was modifiedllvm/include/llvm/Support/ARMTargetParser.def
Commit 6359842bc088857799318dad366099eeca92a4d5 by benny.kra
[GlobalISel] Silence unused variable warning in Release builds. NFC.
The file was modifiedllvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
Commit 0cd2629d97e70f34adb8d0d2ac4a4d280e3bab86 by david.spickett
[llvm][ARM] Remove non-existent arm1176j-s CPU

This was removed in https://reviews.llvm.org/D52594 for clang.

The one test using it has been updated to use the mpcore
CPU as the linked clang change does.

This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D103022
The file was modifiedllvm/lib/Target/ARM/ARM.td
The file was modifiedllvm/unittests/Support/TargetParserTest.cpp
The file was modifiedllvm/test/CodeGen/ARM/build-attributes.ll
Commit de7729d47a8ba0060dd6a6190d20d698539f76fe by david.spickett
[clang][ARM] Remove non-existent arm9312 CPU

I cannot find documentation on this CPU, and it
is not supported by the Arm Compiler 5 product either.

It was likely a mistake or a different name for the
"ep9312", which is an Arm based Cirrus Logic chip.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D103024
The file was modifiedllvm/include/llvm/Support/ARMTargetParser.def
The file was modifiedllvm/unittests/Support/TargetParserTest.cpp
Commit 44843e2a046ef9959166e53d6c0cfb3b286fd4ce by kbessonova
[ARM][NEON] Combine base address updates for vld1x intrinsics

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102855
The file was modifiedllvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
The file was modifiedllvm/lib/Target/ARM/ARMInstrNEON.td
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was modifiedllvm/test/CodeGen/ARM/arm-vld1.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.h
The file was removedllvm/test/CodeGen/ARM/pr45824.ll
Commit 78eaff2ef8a984859a04f944522280360ee825aa by lebedev.ri
[llvm-exegesis] Loop unrolling for loop snippet repetitor mode

I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
The file was modifiedllvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
The file was modifiedllvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
The file was modifiedllvm/tools/llvm-exegesis/lib/BenchmarkResult.h
The file was modifiedllvm/tools/llvm-exegesis/llvm-exegesis.cpp
The file was modifiedllvm/unittests/tools/llvm-exegesis/X86/SnippetRepetitorTest.cpp
The file was modifiedllvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
The file was modifiedllvm/docs/CommandGuide/llvm-exegesis.rst
The file was modifiedllvm/tools/llvm-exegesis/lib/SnippetRepetitor.h
Commit 8f681d5b272eeb5c0d13d225313f4ea9517f59f5 by Stanislav.Mekhanoshin
[IR] Allow Value::replaceUsesWithIf() to process constants

The change is currently NFC, but exploited by the depending D102954.
Code to handle constants is borrowed from the general implementation
of Value::doRAUW().

Differential Revision: https://reviews.llvm.org/D103051
The file was modifiedllvm/include/llvm/IR/Value.h
The file was modifiedllvm/lib/IR/Value.cpp