SuccessChanges

Summary

  1. [MemDep] Use BatchAA when computing pointer dependencies (details)
  2. Reland "[DebugInfo] Move constructor homing case in shouldOmitDefinition." (details)
  3. [ValueTracking] Add a noundef test for D86477; NFC (details)
  4. [ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands (details)
  5. [SystemZ][z/OS] Add z/OS Target and define macros (details)
  6. [lldb] Make Reproducer compatbile with SubsystemRAII  (NFC) (details)
  7. [Hexagon] Check if EVT is simple type in HVX lowering (details)
  8. [AMDGPU] Switch to named simm16 in vscnt insertion (details)
  9. [OpenMP] Pack first-private arguments to improve efficiency of data transfer (details)
Commit 3a54b6a4b71c21cf3bab4f132cbc2904fb9d997e by nikita.ppv
[MemDep] Use BatchAA when computing pointer dependencies

We're not changing IR while running a single MemDep query, so it's
safe to cache alias analysis results using BatchAA. This adds BatchAA
usage to getSimplePointerDependencyFrom(), which is non-intrusive --
covering larger parts (like a whole processNonLocalLoad query) is
also possible, but requires threading BatchAA through a bunch of APIs.

For the ThinLTO configuration, this is a 1% geomean improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D85583
The file was modifiedllvm/lib/Analysis/MemoryDependenceAnalysis.cpp (diff)
Commit b1009ee84fc0242bcebd07889306bf39d9b7170f by akhuang
Reland "[DebugInfo] Move constructor homing case in shouldOmitDefinition."

For some reason the ctor homing case was before the template
specialization case, and could have returned false too early.
I moved the code out into a separate function to avoid this.

This reverts commit 05777ab941063192b9ccb1775358a83a2700ccc1.
The file was modifiedclang/test/CodeGenCXX/debug-info-template-explicit-specialization.cpp (diff)
The file was modifiedclang/lib/CodeGen/CGDebugInfo.cpp (diff)
Commit 8e51bb249bc2a71ecd13092bc0e1e246995feba6 by aqjune
[ValueTracking] Add a noundef test for D86477; NFC
The file was modifiedllvm/test/Transforms/InstSimplify/freeze-noundef.ll (diff)
Commit f753f5b05033bf1d8b89b19b753b78c89de41ae3 by aqjune
[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands

This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.

Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86477
The file was modifiedllvm/lib/Analysis/ValueTracking.cpp (diff)
The file was modifiedllvm/test/Transforms/InstSimplify/freeze-noundef.ll (diff)
The file was modifiedllvm/lib/Transforms/Instrumentation/PoisonChecking.cpp (diff)
The file was modifiedllvm/include/llvm/Analysis/ValueTracking.h (diff)
Commit 97ccf93b3615ff4c0d5fe116e6a7c7b616d8ec0c by hubert.reinterpretcast
[SystemZ][z/OS] Add z/OS Target and define macros

This patch adds the z/OS target and defines macros as a stepping stone
towards enabling a native build on z/OS.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D85324
The file was modifiedclang/lib/Basic/Targets/OSTargets.h (diff)
The file was addedclang/test/Preprocessor/init-zos.c
The file was modifiedclang/lib/Basic/Targets.cpp (diff)
Commit 521220690ab7741e382344319b2a9d458be3eb41 by Jonas Devlieghere
[lldb] Make Reproducer compatbile with SubsystemRAII  (NFC)

Make Reproducer compatbile with SubsystemRAII and use it in
LocateSymbolFileTest.
The file was modifiedlldb/source/Utility/Reproducer.cpp (diff)
The file was modifiedlldb/include/lldb/Utility/Reproducer.h (diff)
The file was modifiedlldb/unittests/Symbol/LocateSymbolFileTest.cpp (diff)
Commit 2da1eefb58a11f459461dc040c678da5fc7252b0 by kparzysz
[Hexagon] Check if EVT is simple type in HVX lowering
The file was modifiedllvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp (diff)
The file was modifiedllvm/lib/Target/Hexagon/HexagonISelLowering.cpp (diff)
The file was addedllvm/test/CodeGen/Hexagon/hvx-isel-vselect-v256i16.ll
Commit 817c831f023af3a5a08e72f7454c4fbb771edc1c by Stanislav.Mekhanoshin
[AMDGPU] Switch to named simm16 in vscnt insertion

Differential Revision: https://reviews.llvm.org/D86568
The file was modifiedllvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (diff)
Commit 0775c1dfbce69d1d13414995de2e77acc942b7eb by tianshilei1992
[OpenMP] Pack first-private arguments to improve efficiency of data transfer

In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive.

Let's take the test case as example.
```
int main() {
  int data1[3] = {1}, data2[3] = {2}, data3[3] = {3};
  int sum[16] = {0};
#pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3)
  for (int i = 0; i < 16; ++i) {
    for (int j = 0; j < 3; ++j) {
      sum[i] += data1[j];
      sum[i] += data2[j];
      sum[i] += data3[j];
    }
  }
}
```
Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)*3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way:
1. First collect all information for those *first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later.
2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address.
3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`.

The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable?

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D86307
The file was addedopenmp/libomptarget/test/mapping/private_mapping.c
The file was modifiedopenmp/libomptarget/src/omptarget.cpp (diff)