SuccessChanges

Summary

  1. [ARM] Add additional fmin/fmax with nan tests (NFC) (details)
  2. [DAGCombiner] Fold fmin/fmax of NaN (details)
  3. [DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber. (details)
  4. [AArch64][GlobalISel] Share address mode selection code for memops (details)
  5. Mark masked.{store,scatter,compressstore} intrinsics as write-only (details)
  6. [AMDGPU] Fix for folding v2.16 literals. (details)
  7. [libunwind] Bare-metal DWARF: set dso_base to 0 (details)
  8. [ValueTracking] isKnownNonZero, computeKnownBits for freeze (details)
  9. [Asan] Return nullptr for invalid chunks (details)
  10. AMDGPU: Fix inserting waitcnts before kill uses (details)
  11. AMDGPU: Skip all meta instructions in hazard recognizer (details)
  12. AMDGPU: Hoist check for VGPRs (details)
Commit 5a4a05c8116ebdcb434cd15796a255cf024a6bf0 by nikita.ppv
[ARM] Add additional fmin/fmax with nan tests (NFC)

Adding these to ARM which has both FMINNUM and FMINIMUM.
The file was addedllvm/test/CodeGen/ARM/fminmax-folds.ll
Commit 0a5dc7effb191eff740e0e7ae7bd8e1f6bdb3ad9 by nikita.ppv
[DAGCombiner] Fold fmin/fmax of NaN

fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.

This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.

Differential Revision: https://reviews.llvm.org/D87415
The file was modifiedllvm/test/CodeGen/ARM/fminmax-folds.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
The file was modifiedllvm/test/CodeGen/X86/fminnum.ll
The file was modifiedllvm/test/CodeGen/X86/fmaxnum.ll
Commit 9969c317ff0877ed6155043422c70e1d4c028a35 by flo
[DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber.

Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.

Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.

Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87386
The file was modifiedllvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
The file was modifiedllvm/test/Transforms/DeadStoreElimination/MSSA/atomic-todo.ll
The file was modifiedllvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
Commit 480e7f43a22578beaa2edc7a271e77793222a1c3 by Jessica Paquette
[AArch64][GlobalISel] Share address mode selection code for memops

We were missing support for the G_ADD_LOW + ADRP folding optimization in the
manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD.

As a result, we were missing cases like this:

```
@foo = external hidden global i32*
define void @baz(i32* %0) {
store i32* %0, i32** @foo
ret void
}
```

https://godbolt.org/z/16r7ad

This functionality already existed in the addressing mode functions for the
importer. So, this patch makes the manual selection code use
`selectAddrModeIndexed` rather than duplicating work.

This is a 0.2% geomean code size improvement for CTMark at -O3.

There is one code size increase (0.1% on lencod) which is likely because
`selectAddrModeIndexed` doesn't look through constants.

Differential Revision: https://reviews.llvm.org/D87397
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/select-store.mir
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
Commit 8b7c8f2c549d301fcea75d8e6e98a8ee160d5ff4 by kparzysz
Mark masked.{store,scatter,compressstore} intrinsics as write-only
The file was modifiedllvm/include/llvm/IR/Intrinsics.td
The file was modifiedllvm/test/Analysis/BasicAA/intrinsics.ll
The file was modifiedllvm/test/Analysis/TypeBasedAliasAnalysis/intrinsics.ll
Commit c259d3a061c8fc0f9520208eb265d4352a0ad447 by dfukalov
[AMDGPU] Fix for folding v2.16 literals.

It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
The file was modifiedllvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
The file was modifiedllvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/shrink-add-sub-constant.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIFoldOperands.cpp
Commit 09d492902f178f60b3ab986360eadde9b5c8d359 by rprichard
[libunwind] Bare-metal DWARF: set dso_base to 0

Previously, DwarfFDECache::findFDE used 0 as a special value meaning
"search the entire cache, including dynamically-registered FDEs".
Switch this special value to -1, which doesn't make sense as a DSO
base.

Fixes PR47335.

Reviewed By: compnerd, #libunwind

Differential Revision: https://reviews.llvm.org/D86748
The file was modifiedlibunwind/src/UnwindCursor.hpp
The file was modifiedlibunwind/src/AddressSpace.hpp
Commit a6183d0f028cb73eccc82a7cce9534708a149762 by aqjune
[ValueTracking] isKnownNonZero, computeKnownBits for freeze

This implements support for isKnownNonZero, computeKnownBits when freeze is involved.

```
  br (x != 0), BB1, BB2
BB1:
  y = freeze x
```

In the above program, we can say that y is non-zero. The reason is as follows:

(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D75808
The file was modifiedllvm/unittests/Analysis/ValueTrackingTest.cpp
The file was modifiedllvm/test/Transforms/InstSimplify/known-non-zero.ll
The file was modifiedllvm/lib/Analysis/ValueTracking.cpp
Commit 91c28bbe74f24e0e84edf84daae7659c11e7afd6 by Vitaly Buka
[Asan] Return nullptr for invalid chunks

CHUNK_ALLOCATED. CHUNK_QUARANTINE are only states
which make AsanChunk useful for GetAsanChunk callers.
In either case member of AsanChunk are not useful.

Fix few cases which didn't expect nullptr. Most of the callers are already
expects nullptr.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D87135
The file was modifiedcompiler-rt/lib/asan/asan_allocator.cpp
Commit 82cbc9330a4dc61e867864d96b0dbec74abaca89 by Matthew.Arsenault
AMDGPU: Fix inserting waitcnts before kill uses
The file was addedllvm/test/CodeGen/AMDGPU/waitcnt-meta-instructions.mir
The file was modifiedllvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
Commit 85490874b23ba1337210dbcb700b258ffb751b78 by Matthew.Arsenault
AMDGPU: Skip all meta instructions in hazard recognizer

This was not adding a necessary nop due to thinking the kill counted.
The file was modifiedllvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
The file was addedllvm/test/CodeGen/AMDGPU/hazard-recognizer-meta-insts.mir
Commit e15215e04154e1bc8ea57d46f36b054adf49a3ed by Matthew.Arsenault
AMDGPU: Hoist check for VGPRs
The file was modifiedllvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp