SuccessChanges

Summary

  1. [ARM][NEON] Combine base address updates for vld1x intrinsics (details)
  2. [llvm-exegesis] Loop unrolling for loop snippet repetitor mode (details)
  3. [IR] Allow Value::replaceUsesWithIf() to process constants (details)
  4. [lldb] Re-eanble and rewrite TestCPPStaticMembers (details)
  5. [lldb] Disable minimal import mode for RecordDecls that back FieldDecls (details)
  6. [AArch64] Add tests for lowering of vector load + single extract. (details)
  7. [mlir] Fold memref.dim of OffsetSizeAndStrideOpInterface outputs (details)
  8. [MLIR][Affine][LICM] Mark users of `iter_args` variant (details)
  9. [AMDGPU] Remove dead declaration (NFC). (details)
  10. [CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs on AVX2/AVX512 targets (details)
  11. Fix MSVC "truncation of constant value" warning. NFCI. (details)
Commit 44843e2a046ef9959166e53d6c0cfb3b286fd4ce by kbessonova
[ARM][NEON] Combine base address updates for vld1x intrinsics

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102855
The file was modifiedllvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was modifiedllvm/lib/Target/ARM/ARMInstrNEON.td
The file was removedllvm/test/CodeGen/ARM/pr45824.ll
The file was modifiedllvm/test/CodeGen/ARM/arm-vld1.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.h
The file was modifiedllvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
Commit 78eaff2ef8a984859a04f944522280360ee825aa by lebedev.ri
[llvm-exegesis] Loop unrolling for loop snippet repetitor mode

I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
The file was modifiedllvm/tools/llvm-exegesis/lib/SnippetRepetitor.h
The file was modifiedllvm/docs/CommandGuide/llvm-exegesis.rst
The file was modifiedllvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
The file was modifiedllvm/tools/llvm-exegesis/llvm-exegesis.cpp
The file was modifiedllvm/tools/llvm-exegesis/lib/BenchmarkResult.h
The file was modifiedllvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
The file was modifiedllvm/unittests/tools/llvm-exegesis/X86/SnippetRepetitorTest.cpp
The file was modifiedllvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
Commit 8f681d5b272eeb5c0d13d225313f4ea9517f59f5 by Stanislav.Mekhanoshin
[IR] Allow Value::replaceUsesWithIf() to process constants

The change is currently NFC, but exploited by the depending D102954.
Code to handle constants is borrowed from the general implementation
of Value::doRAUW().

Differential Revision: https://reviews.llvm.org/D103051
The file was modifiedllvm/lib/IR/Value.cpp
The file was modifiedllvm/include/llvm/IR/Value.h
Commit 8b656b88462f51396c8c4772e0012549f76f204f by Raphael Isemann
[lldb] Re-eanble and rewrite TestCPPStaticMembers

It's not clear why the whole test got disabled, but the linked bug report
has since been fixed and the only part of it that still fails is the test
for the too permissive lookup. This re-enables the test, rewrites it to use
the modern test functions we have and splits the failing part into its
own test that we can skip without disabling the rest.
The file was modifiedlldb/test/API/lang/cpp/static_members/main.cpp
The file was modifiedlldb/test/API/lang/cpp/static_members/TestCPPStaticMembers.py
Commit 3bf96b0329be554c67282b0d7d8da6a864b9e38f by Raphael Isemann
[lldb] Disable minimal import mode for RecordDecls that back FieldDecls

Clang adds a Decl in two phases to a DeclContext. First it adds it invisible and
then it makes it visible (which will add it to the lookup data structures). It's
important that we can't do lookups into the DeclContext we are currently adding
the Decl to during this process as once the Decl has been added, any lookup will
automatically build a new lookup map and add the added Decl to it. The second
step would then add the Decl a second time to the lookup which will lead to
weird errors later one. I made adding a Decl twice to a lookup an assertion
error in D84827.

In the first step Clang also does some computations on the added Decl if it's
for example a FieldDecl that is added to a RecordDecl.

One of these computations is checking if the FieldDecl is of a record type
and the record type has a deleted constexpr destructor which will delete
the constexpr destructor of the record that got the FieldDecl.

This can lead to a bug with the way we implement MinimalImport in LLDB
and the following code:

```
struct Outer {
  typedef int HookToOuter;
  struct NestedClass {
    HookToOuter RefToOuter;
  } NestedClassMember; // We are adding this.
};
```

1. We just imported `Outer` minimally so far.
2. We are now asked to add `NestedClassMember` as a FieldDecl.
3. We import `NestedClass` minimally.
4. We add `NestedClassMember` and clang does a lookup for a constexpr dtor in
   `NestedClass`. `NestedClassMember` hasn't been added to the lookup.
5. The lookup into `NestedClass` will now load the members of `NestedClass`.
6. We try to import the type of `RefToOuter` which will try to import the `HookToOuter` typedef.
7. We import the typedef and while importing we check for conflicts in `Outer` via a lookup.
8. The lookup into `Outer` will cause the invisible `NestedClassMember` to be added to the lookup.
9. We continue normally until we get back to the `addDecl` call in step 2.
10. We now add `NestedClassMember` to the lookup even though we already did that in step 8.

The fix here is disabling the minimal import for RecordTypes from FieldDecls. We
actually already did this, but so far we only force the definition of the type
to be imported *after* we imported the FieldDecl. This just moves that code
*before* we import the FieldDecl so prevent the issue above.

Reviewed By: shafik, aprantl

Differential Revision: https://reviews.llvm.org/D102993
The file was addedlldb/test/API/lang/cpp/reference-to-outer-type/Makefile
The file was addedlldb/test/API/lang/cpp/reference-to-outer-type/main.cpp
The file was modifiedlldb/source/Plugins/ExpressionParser/Clang/ClangASTSource.cpp
The file was modifiedlldb/source/Plugins/ExpressionParser/Clang/ClangASTImporter.cpp
The file was addedlldb/test/API/lang/cpp/reference-to-outer-type/TestCppReferenceToOuterClass.py
Commit 536447eb203c3f096d8d4d451d609ef7357c9c43 by flo
[AArch64] Add tests for lowering of vector load + single extract.

Currently the vector load + extract gets lowered to a single scalar
store, not accounting for the fact that the index could be
out-of-bounds, which is poison, not UB.

See PR50382.
The file was modifiedllvm/test/CodeGen/AArch64/arm64-indexed-vector-ldst.ll
Commit 9ccdc2e23be18eca0b09f055fd17115c0366166c by tpopp
[mlir] Fold memref.dim of OffsetSizeAndStrideOpInterface outputs

This previously handled memref::SubviewOp, but this can be extended to
all ops implementing the interface.

Differential Revision: https://reviews.llvm.org/D103076
The file was modifiedmlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
The file was modifiedmlir/test/Dialect/MemRef/canonicalize.mlir
Commit eff269fc9f8b8c79e08e8295aa22da8950bcc341 by uday
[MLIR][Affine][LICM] Mark users of `iter_args` variant

Prevent users of `iter_args` of an affine for loop from being hoisted
out of it. Otherwise, LICM leads to a violation of the SSA dominance
(as demonstrated in the added test case).

Fixes: https://bugs.llvm.org/show_bug.cgi?id=50103

Reviewed By: bondhugula, ayzhuang

Differential Revision: https://reviews.llvm.org/D102984
The file was modifiedmlir/lib/Dialect/Affine/Transforms/AffineLoopInvariantCodeMotion.cpp
The file was modifiedmlir/test/Dialect/Affine/affine-loop-invariant-code-motion.mlir
Commit e3b8e6d48251a3b85f925fe695ef961013ddb940 by Christudasan.Devadasan
[AMDGPU] Remove dead declaration (NFC).
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
Commit 68ef68f8ac5cad1fdcd9c7b0e2a8f134d9f595ae by llvm-dev
[CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs on AVX2/AVX512 targets

Determined from llvm-mca analysis, AVX2+ capable targets have a higher throughput for VPBLENDVB and VPMOVZX ops, making it cheaper to perform shift+select patterns for vXi8 shifts or extend/shift/truncate for vXi16 shifts. Similarly AVX512BW can perform vXi8 as extend/shift/truncate patterns.
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-lshr-cost.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/div.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/fshr.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/rem.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-lshr-cost-inseltpoison.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-ashr-cost.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-shl-cost-inseltpoison.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/fshl.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-ashr-cost-inseltpoison.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/vshift-shl-cost.ll
Commit ed14062be0c1769130b046880199bdba3c6a2ee2 by llvm-dev
Fix MSVC "truncation of constant value" warning. NFCI.
The file was modifiedllvm/lib/ExecutionEngine/JITLink/MachO_x86_64.cpp