Changes

Summary

  1. [OpenMP] Improve ref count debug messages (details)
  2. [OpenMP] Fix delete map type in ref count debug messages (details)
  3. [DAGCombine] Check reassoc flags in aggressive fsub fusion (details)
  4. [libc] add benchmarks for memcmp and bzero (details)
  5. [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn. (details)
  6. [InstCombine] convert FP min/max with negated op to fabs (details)
  7. [RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling. (details)
  8. [UpdateCCTestChecks][NFC] Permit other comments in common.py (details)
  9. [InstCombine] Eliminate casts to optimize ctlz operation (details)
  10. [ARM] Limit v6m unrolling with multiple live outs (details)
  11. [ValueTracking] look through bitcast of vector in computeKnownBits (details)
  12. [clang-format] Add IfMacros option (details)
  13. Update Bazel BUILD files up to be9a87fe9b (details)
  14. [Demangle][Rust] Hide implementation details NFC (details)
  15. [LAA] Make getPointersDiff() API compatible with opaque pointers (details)
  16. [ConstantFold] Allow propagation of poison for and/or i1 (details)
Commit 48421ac441bf64ec940b13c2dee1bc1a7671e878 by jdenny.ornl
[OpenMP] Improve ref count debug messages

For example, without this patch:

```
$ cat test.c
int main() {
  int x;
  #pragma omp target enter data map(alloc: x)
  #pragma omp target exit data map(release: x)
  ;
  return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1
```

There are two problems in this example:

* `RefCount` is not reported when a mapping is created, but it might
  be 1 or infinite.  In this case, because it's created by `omp target
  enter data`, it's 1.  Seeing that would make later `RefCount`
  messages easier to understand.
* `RefCount` is still 1 at the `omp target exit data`, but it's
  reported as `updated`.  The reason it's still 1 is that, upon
  deletions, the reference count is generally not updated in
  `DeviceTy::getTgtPtrBegin`, where the report is produced.  Instead,
  it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually
  removed from the mapping table.

This patch makes the following changes:

* Report the reference count when creating a mapping.
* Where an existing mapping is reported, always report a reference
  count action:
    * `update suppressed` when `UpdateRefCount=false`
    * `incremented`
    * `decremented`
    * `deferred final decrement`, which replaces the misleading
      `updated` in the above example
* Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does
  not zero the reference count.  (Please advise if these comments miss
  the point.)
* For unified shared memory, don't report confusing messages like
  `RefCount=` or `RefCount= updated` given that reference counts are
  irrelevant in this case.  Instead, just report `for unified shared
  memory`.
* Use `INFO` not `DP` consistently for `Mapping exists` messages.
* Fix device table dumps to print `INF` instead of `-1` for an
  infinite reference count.

Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D104559
The file was modifiedopenmp/docs/design/Runtimes.rst
The file was modifiedopenmp/libomptarget/src/private.h
The file was modifiedopenmp/libomptarget/test/offloading/info.c
The file was modifiedopenmp/libomptarget/src/device.h
The file was modifiedopenmp/libomptarget/src/device.cpp
Commit 9fa5e3280d0bfdb90e3f2823f5bc63446628682d by jdenny.ornl
[OpenMP] Fix delete map type in ref count debug messages

For example, without this patch:

```
$ cat test.c
int main() {
  int x;
  #pragma omp target enter data map(alloc: x)
  #pragma omp target enter data map(alloc: x)
  #pragma omp target enter data map(alloc: x)
  #pragma omp target exit data map(delete: x)
  ;
  return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists\|last'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented)
Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last
```

`RefCount` is reported as decremented to 2, but it ought to be reset
because of the `delete` map type, and `is not last` is incorrect.

This patch migrates the reset of reference counts from
`DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then
correctly reports the reset.  Based on the `IsLast` result from
`DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is
last` for any deletion.  `DeviceTy::deallocTgtPtr` is responsible only
for the final reference count decrement and mapping removal.

An obscure side effect of this patch is that a `delete` map type when
the reference count is infinite yields `DelEntry=IsLast=false` in
`targetDataEnd` and so no longer results in a
`DeviceTy::deallocTgtPtr` call.  Without this patch, that call is a
no-op anyway besides some unnecessary locking and mapping table
lookups.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104560
The file was modifiedopenmp/libomptarget/src/device.h
The file was modifiedopenmp/libomptarget/src/omptarget.cpp
The file was modifiedopenmp/libomptarget/src/device.cpp
Commit c125af82a5ff5dbbbcb8ebc5cde156d41e6ac281 by Jinsong Ji
[DAGCombine] Check reassoc flags in aggressive fsub fusion

The is from discussion in https://reviews.llvm.org/D104247#inline-993387

The contract and reassoc flags shouldn't imply each other .

All the aggressive fsub fusion reassociate operations,
we should guard them with reassoc flag check.

Reviewed By: mcberg2017

Differential Revision: https://reviews.llvm.org/D104723
The file was modifiedllvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
The file was modifiedllvm/test/CodeGen/PowerPC/fma-assoc.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/fpext-free.ll
Commit 87065c0d242d955e6f3fddf5cbc790d025c3521c by gchatelet
[libc] add benchmarks for memcmp and bzero

Differential Revision: https://reviews.llvm.org/D104511
The file was modifiedlibc/benchmarks/CMakeLists.txt
The file was modifiedlibc/src/string/CMakeLists.txt
The file was modifiedlibc/benchmarks/LibcMemoryBenchmarkMain.cpp
Commit 5dfdc1812d9b9c043204d39318f6446424d8f2d7 by jonathanchesterfield
[OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn.

This fixes issues with various return types(bool/int) and was already
in place for nvptx headers, adjusted to work for amdgcn. This does
not affect hip as the change is guarded with OPENMP_AMDGCN.
Similar to D85879.

Reviewed By: jdoerfert, JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D104677
The file was modifiedclang/test/Headers/hip-header.hip
The file was modifiedclang/lib/Headers/__clang_hip_cmath.h
The file was modifiedclang/test/Headers/openmp_device_math_isnan.cpp
Commit 1e9b6b89a7b5c49612018b120c2c142106056f82 by spatel
[InstCombine] convert FP min/max with negated op to fabs

This is part of improving floating-point patterns seen in:
https://llvm.org/PR39480

We don't require any FMF because the 2 potential corner cases
(-0.0 and NaN) are correctly handled without FMF:
1. -0.0 is treated as strictly less than +0.0 with
   maximum/minimum, so fabs/fneg work as expected.
2. +/- 0.0 with maxnum/minnum is indeterminate, so
   transforming to fabs/fneg is more defined.
3. The sign of a NaN may be altered by this transform,
   but that is allowed in the default FP environment.

If there are FMF, they are propagated from the min/max call to
one or both new operands which seems to agree with Alive2:
https://alive2.llvm.org/ce/z/bem_xC
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
The file was modifiedllvm/test/Transforms/InstCombine/maximum.ll
The file was modifiedllvm/test/Transforms/InstCombine/minimum.ll
The file was modifiedllvm/test/Transforms/InstCombine/minnum.ll
The file was modifiedllvm/test/Transforms/InstCombine/maxnum.ll
Commit a37cf17834d39411ed1d669098b428f8374c5b45 by craig.topper
[RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling.

This is consistent with our other masked vector instructions.
Previously we found cases where not doing this broke fast reg
alloc.
The file was modifiedllvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
Commit 38b7b1d4a2930cc82e69a8069fad4b363f73a212 by jdenny.ornl
[UpdateCCTestChecks][NFC] Permit other comments in common.py

Some parts of common.py already permit comment styles besides `;`.
Handle the remaining cases.  Specifically, a future patch will extend
update_cc_test_checks.py to call add_global_checks.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104713
The file was modifiedllvm/utils/UpdateTestChecks/common.py
Commit ad0085d3381a28041244fe6847f6ac1ce8dd052e by spatel
[InstCombine] Eliminate casts to optimize ctlz operation

If a ctlz operation is performed on higher datatype and then
downcasted, then this can be optimized by doing a ctlz operation
on a lower datatype and adding the difference bitsize to the result
of ctlz to provide the same output:

https://alive2.llvm.org/ce/z/8uup9M

The original problem is shown in
https://llvm.org/PR50173

Differential Revision: https://reviews.llvm.org/D103788
The file was modifiedllvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
Commit 8cfc08013299d873edd364436aa78e7effb28dd4 by david.green
[ARM] Limit v6m unrolling with multiple live outs

v6m cores only have a limited number of registers available. Unrolling
can mean we spend more on stack spills and reloads than we save from the
unrolling. This patch adds an extra heuristic to put a limit on the
unroll count for loops with multiple live out values, as measured from
the LCSSA phi nodes.

Differential Revision: https://reviews.llvm.org/D104659
The file was addedllvm/test/Transforms/LoopUnroll/ARM/v6munroll.ll
The file was modifiedllvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
Commit 656001e7b2b939d9bce4fb58831d314dc67ddf7a by spatel
[ValueTracking] look through bitcast of vector in computeKnownBits

This borrows as much as possible from the SDAG version of the code
(originally added with D27129 and since updated with big endian support).

In IR, we can test more easily for correctness than we did in the
original patch. I'm using the simplest cases that I could find for
InstSimplify: we computeKnownBits on variable shift amounts to see if
they are zero or in range. So shuffle constant elements into a vector,
cast it, and shift it.

The motivating x86 example from https://llvm.org/PR50123 is also here.
We computeKnownBits in the caller code, but we only check if the shift
amount is in range. That could be enhanced to catch the 2nd x86 test -
if the shift amount is known too big, the result is 0.

Alive2 understands the datalayout and agrees that the tests here are
correct - example:
https://alive2.llvm.org/ce/z/KZJFMZ

Differential Revision: https://reviews.llvm.org/D104472
The file was modifiedllvm/test/Transforms/InstSimplify/shift-knownbits.ll
The file was modifiedllvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll
The file was modifiedllvm/lib/Analysis/ValueTracking.cpp
Commit be9a87fe9bc395074c383c07fbd9c0bce953985f by vlovich
[clang-format] Add IfMacros option

https://bugs.llvm.org/show_bug.cgi?id=49354

Differential Revision: https://reviews.llvm.org/D102730
The file was modifiedclang/lib/Format/Format.cpp
The file was modifiedclang/include/clang/Format/Format.h
The file was modifiedclang/lib/Format/FormatTokenLexer.cpp
The file was modifiedclang/docs/ReleaseNotes.rst
The file was modifiedclang/docs/ClangFormatStyleOptions.rst
The file was modifiedclang/lib/Format/FormatToken.h
The file was modifiedclang/unittests/Format/FormatTest.cpp
The file was modifiedclang/lib/Format/TokenAnnotator.cpp
Commit b58dfd87da5cb19693764869a9a158f88c3d4bde by gcmn
Update Bazel BUILD files up to be9a87fe9b

Differential Revision: https://reviews.llvm.org/D104791
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
Commit 6cc6ada143236f16faf8b383d73e00e709fa6a9f by tomasz.miasko
[Demangle][Rust] Hide implementation details NFC

Move content of the "public" header into the implementation file.

This also renames two enumerations that were previously used through
`rust_demangle::` scope, to avoid breaking a build bot with older
version of GCC that rejects uses of enumerator through `E::A` if there
is a variable with the same name as enumeration `E` in the scope.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D104362
The file was modifiedllvm/lib/Demangle/RustDemangle.cpp
The file was removedllvm/include/llvm/Demangle/RustDemangle.h
Commit 00d3f7cc3c264adc360d0282ba8a27de2a004b94 by nikita.ppv
[LAA] Make getPointersDiff() API compatible with opaque pointers

Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.

The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.

Differential Revision: https://reviews.llvm.org/D104784
The file was modifiedllvm/include/llvm/Analysis/LoopAccessAnalysis.h
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was modifiedllvm/lib/Analysis/LoopAccessAnalysis.cpp
The file was addedllvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
Commit 2fd3037ac615643fe8058292d2b89bb19a49cb2f by aqjune
[ConstantFold] Allow propagation of poison for and/or i1

They were disallowed due to its bad interaction with select i1 -> and/or i1.
The transformation is now disabled by D101191, so let's revive this.
The file was modifiedllvm/lib/IR/ConstantFold.cpp