Changes

Summary

  1. Revert "[GVNSink] Regenerate test checks (NFC)" (details)
  2. Force insert zero-idiom and break false dependency of dest register for several instructions. (details)
  3. [SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension (details)
  4. [AST] Support template declaration found through using-decl for QualifiedTemplateName. (details)
  5. [SimplifyCFG] Handle branch on same condition in pred more directly (details)
  6. [clangd] tweak tile should start with a capital letter. (details)
  7. [OpenCL] Guard read_write images with TypeExtension (details)
  8. [flang] Do not ICE on recursive function definition in function result (details)
  9. [AMDGPU][GFX90A+] Disabled ds_ordered_count and exp (details)
  10. [BOLT] Fix build with GCC 7.3.0 (details)
  11. [BOLT] Add R_AARCH64_PREL16/32/64 relocations support (details)
  12. Add async dependencies support for gpu.launch op (details)
  13. [AMDGPU][MC][NFC][GFX940] Corrected an error position (details)
  14. [libcxx][ranges] add views::join adaptor object. added test coverage to join_view (details)
  15. [mlir] Make `Regions`s `cloneInto` multithread-readable (details)
  16. [Debugify] Limit number of processed functions for original mode (details)
  17. [lldb] Adjust libc++ string formatter for changes in D123580 (details)
  18. [libc++] Use bit field for checking if string is in long or short mode (details)
  19. Revert "[RISCV] Precommit test for D122634" (details)
  20. [PhaseOrdering] Remove RUN lines for legacy PM (NFC) (details)
  21. Fix Sphinx build (details)
  22. [InstCombine] Add nonpow2 (negative) test for D123374 (details)
  23. [NVPTX] Fix LIT tests with default nameTableKind (details)
  24. [clang-tidy] Fix behavior of `modernize-use-using` with nested structs/unions (details)
  25. [mlir] Fix `Region`s `takeBody` method if the region is not empty (details)
  26. [InstCombine] Split up test for store with undef (NFC) (details)
  27. [InstCombine] Add tests for memset with undef/poison value (NFC) (details)
  28. [X86] Add test case for SetCCMOVMSK combine. (details)
  29. [AMDGPU]: Fix failing assertion in SIMachineScheduler (details)
  30. [InstCombine] Remove dead code (NFC) (details)
  31. [AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY (details)
  32. [AArch64] Add `foldOverflowCheck` DAG combine (details)
  33. [InstCombine] Fix typo in test (NFC) (details)
  34. AMDGPU/GlobalISel: Precommit test for D124163 (details)
  35. AMDGPU/GlobalISel: Fix isVCC for uniform s1 with reg class on wave32 (details)
  36. [LLVM-ML] Add standard LLVM debug flags (details)
  37. [mlir] Connect Transform dialect to PDL (details)
  38. [llvm-ar] Fix thin archive being wrongly converted to a full archive (details)
  39. [clangd] Correctly identify self-contained headers included rercursively (details)
  40. [clangd] Include Cleaner: suppress unused warnings for IWYU pragma: export (details)
  41. [PS4] Driver: use correct --shared option (details)
  42. [InstCombine] add tests for C << (X - C1); NFC (details)
  43. [InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X (details)
  44. [compiler-rt][Darwin] Add arm64 to simulator platforms (details)
  45. [fuchsia] Don't include duplicate profiling symbols for Fuchsia (details)
  46. [M68k] Regenerate cmp.ll tests (details)
  47. Revert D121279 "[MLIR][GPU] Add canonicalizer for gpu.memcpy" (details)
  48. [X86] Add test case for Issue #54911 (details)
  49. [clangd] Add beforeExecute() callback to FeatureModules. (details)
  50. [lld/mac] Warn that writing zippered outputs isn't implemented (details)
  51. [Frontend] Simplify PrecompiledPreamble::PCHStorage. NFC (details)
  52. [InstCombine] add baseline test for (X * C2) << C1 --> X * (C2 << C1) without one use; NFC (details)
  53. Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X" (details)
  54. [clang][HIP] Updating driver to enable archive/bitcode to bitcode linking when targeting HIPAMD toolchain (details)
  55. [InstCombine] Add one use limitation for  (X * C2) << C1 --> X * (C2 << C1) (details)
  56. [AMDGPU] Refine 64 bit misaligned LDS ops selection (details)
  57. [RISCV] Add special case to constant materialization to remove trailing zeros first. (details)
  58. [mlir] enable doc generation for the transform dialect (details)
  59. Revert "[InstCombine] Add one use limitation for  (X * C2) << C1 --> X * (C2 << C1)" (details)
Commit 15fc293b11181177c9410e8715c2186bbe1390ed by npopov
Revert "[GVNSink] Regenerate test checks (NFC)"

This reverts commit 3b132300728e7ed06e59e449ceb8175305869a49.

It looks like GVNSink is currently non-deterministic, due to an
std::sort() on BasicBlock* pointers in ModelledPHI. This becomes
visible in the generated checks.
The file was modifiedllvm/test/Transforms/GVNSink/sink-common-code.ll
Commit 3e6b904f0a5075a3f33683ce38b5a4fd18280e5e by gen.pei
Force insert zero-idiom and break false dependency of dest register for several instructions.

The related instructions are:

VPERMD/Q/PS/PD
VRANGEPD/PS/SD/SS
VGETMANTSS/SD/SH
VGETMANDPS/PD - mem version only
VPMULLQ
VFMULCSH/PH
VFCMULCSH/PH

Differential Revision: https://reviews.llvm.org/D116072
The file was addedllvm/test/CodeGen/X86/perm.avx512-false-deps.ll
The file was addedllvm/test/CodeGen/X86/range-false-deps.ll
The file was addedllvm/test/CodeGen/X86/mulc-false-deps.ll
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.cpp
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.h
The file was modifiedllvm/lib/Target/X86/X86.td
The file was addedllvm/test/CodeGen/X86/perm.avx2-false-deps.ll
The file was addedllvm/test/CodeGen/X86/getmant-false-deps.ll
The file was addedllvm/test/CodeGen/X86/pmullq-false-deps.ll
Commit 8988254667fff67d1f585396aa0e9933f5ba69ad by npopov
[SimplifyCFG] Make FoldCondBranchOnPHI more amenable to extension

This general threading transform can be performed whenever we know
a constant value for the condition in a predecessor, which would
currently just be the case of a phi node with constant arguments.
The file was modifiedllvm/lib/Transforms/Utils/SimplifyCFG.cpp
Commit 1234b1c6d8113d50beef5801be607ad1d502b2f7 by hokein.wu
[AST] Support template declaration found through using-decl for QualifiedTemplateName.

This is a followup of https://reviews.llvm.org/D123127, adding support
for the QualifiedTemplateName.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D123775
The file was modifiedclang/include/clang/AST/TemplateName.h
The file was modifiedclang/lib/AST/ASTImporter.cpp
The file was modifiedclang/lib/Sema/SemaDecl.cpp
The file was modifiedclang/lib/Sema/SemaTemplate.cpp
The file was modifiedclang/unittests/AST/TemplateNameTest.cpp
The file was modifiedclang/lib/AST/QualTypeNames.cpp
The file was modifiedclang/lib/Sema/TreeTransform.h
The file was modifiedclang/lib/AST/ASTContext.cpp
The file was modifiedclang/include/clang/AST/ASTContext.h
The file was modifiedclang/include/clang/AST/PropertiesBase.td
Commit 3df86e799e46bc1139372a2f40c31333716e3ad6 by npopov
[SimplifyCFG] Handle branch on same condition in pred more directly

Rather than creating a PHI node and then using the PHI threading
code, directly handle this case in
FoldCondBranchOnValueKnownInPredecessor().

This change is supposed to be NFC-ish, but may cause changes due
to different transform order.
The file was modifiedllvm/lib/Transforms/Utils/SimplifyCFG.cpp
Commit 82cddb173f3781860814fedd86dc83f11ac12e02 by hokein.wu
[clangd] tweak tile should start with a capital letter.

to consistent with other tweaks.
The file was modifiedclang-tools-extra/clangd/refactor/tweaks/SpecialMembers.cpp
Commit 87a258366e5d4f3786c6c2b9fe5dbeb736def909 by sven.vanhaastregt
[OpenCL] Guard read_write images with TypeExtension

Ensure that any `read_write` image type carries the
`__opencl_c_read_write_images` upon construction of the `ImageType`.
The file was modifiedclang/lib/Sema/OpenCLBuiltins.td
Commit 488b9fd1030b1e75a6c3580d0a632009315e31f5 by d.dudkin
[flang] Do not ICE on recursive function definition in function result

The following code causes the compiler to ICE in several places due to
lack of support of recursive procedure definitions through the function
result.

  function foo() result(r)
    procedure(foo), pointer :: r
  end function foo
The file was modifiedflang/lib/Evaluate/characteristics.cpp
The file was modifiedflang/test/Semantics/resolve102.f90
The file was modifiedflang/lib/Semantics/check-declarations.cpp
The file was modifiedflang/include/flang/Semantics/symbol.h
Commit b4231ac4bef653a798162f186d2ba7b4e88e7ff7 by d-pre
[AMDGPU][GFX90A+] Disabled ds_ordered_count and exp

Differential Revision: https://reviews.llvm.org/D124087
The file was modifiedllvm/lib/Target/AMDGPU/DSInstructions.td
The file was modifiedllvm/test/MC/Disassembler/AMDGPU/gfx90a_ldst_acc.txt
The file was modifiedllvm/test/MC/AMDGPU/gfx90a_ldst_acc.s
The file was modifiedllvm/test/MC/AMDGPU/gfx90a_err.s
The file was modifiedllvm/test/MC/AMDGPU/gfx940_err.s
The file was modifiedllvm/lib/Target/AMDGPU/EXPInstructions.td
Commit 63686af1e1d1848ccbc3deb87012050922a0c51b by och95
[BOLT] Fix build with GCC 7.3.0

The gcc 7.3.0 version raises "could not covert" error without std::move
used explicitly.

Differential Revision: https://reviews.llvm.org/D124009
The file was modifiedbolt/lib/Rewrite/RewriteInstance.cpp
The file was modifiedbolt/lib/Core/BinaryContext.cpp
The file was modifiedbolt/lib/Rewrite/MachORewriteInstance.cpp
Commit 48e894a536417b07fc68093e1e6fa52b768e3753 by och95
[BOLT] Add R_AARCH64_PREL16/32/64 relocations support

Reviewed By: yota9, rafauler

Differential Revision: https://reviews.llvm.org/D122294
The file was modifiedllvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
The file was modifiedbolt/lib/Core/Relocation.cpp
The file was modifiedbolt/test/lit.cfg.py
The file was modifiedbolt/include/bolt/Core/BinaryFunction.h
The file was modifiedllvm/lib/Object/RelocationResolver.cpp
The file was addedbolt/test/runtime/AArch64/r_aarch64_prelxx.s
Commit f47a38f51724fab217838aa09cb029c7e0392285 by uday
Add async dependencies support for gpu.launch op

Add async dependencies support for gpu.launch op: this allows specifying
a list of async tokens ("streams") as dependencies for the launch.

Update the GPU kernel outlining pass lowering to propagate async
dependencies from gpu.launch to gpu.launch_func op. Previously, a new
stream was being created and destroyed for a kernel launch. The async
deps support allows the kernel launch to be serialized on an existing
stream.

Differential Revision: https://reviews.llvm.org/D123499
The file was modifiedmlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
The file was modifiedmlir/test/Dialect/GPU/ops.mlir
The file was modifiedmlir/test/Dialect/GPU/outlining.mlir
The file was modifiedmlir/test/Dialect/GPU/invalid.mlir
The file was modifiedmlir/include/mlir/Dialect/GPU/GPUOps.td
The file was modifiedmlir/lib/Dialect/GPU/IR/GPUDialect.cpp
Commit 81af32b9a3ec0d0925f66ec972ce68c8d6f4ffe4 by d-pre
[AMDGPU][MC][NFC][GFX940] Corrected an error position

Differential Revision: https://reviews.llvm.org/D124099
The file was addedllvm/test/MC/AMDGPU/gfx940_err_pos.s
The file was modifiedllvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
Commit 3d3103b733d4346d583a3ada3aabdaa9de4f0446 by nikolasklauser
[libcxx][ranges] add views::join adaptor object. added test coverage to join_view

- added views::join adaptor object
- added test for the adaptor object
- fixed some join_view's tests. e.g iter_swap test
- added some negative tests for join_view to test that operations do not exist when constraints aren't met
- added tests that locks down issues that were already addressed in previous change
  - LWG3500 `join_view::iterator::operator->()` is bogus
  - LWG3313 `join_view::iterator::operator--` is incorrectly constrained
  - LWG3517 `join_view::iterator`'s `iter_swap` is underconstrained
  - P2328R1 join_view should join all views of ranges
- fixed some issues in join_view and added tests
  - LWG3535 `join_view::iterator::iterator_category` and `::iterator_concept` lie
  - LWG3474 Nesting ``join_views`` is broken because of CTAD
- added tests for an LWG issue that isn't resolved in the standard yet, but the previous code has workaround.
  - LWG3569 Inner iterator not default_initializable

Reviewed By: #libc, var-const

Spies: var-const, libcxx-commits

Differential Revision: https://reviews.llvm.org/D123466
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/types.h
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/increment.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/sentinel/eq.pass.cpp
The file was modifiedlibcxx/docs/Status/Cxx2bIssues.csv
The file was modifiedlibcxx/include/__ranges/join_view.h
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/ctor.parent.outer.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/member_types.compile.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/general.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/iter.swap.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/ctor.other.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/eq.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/begin.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/end.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/arrow.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/sentinel/ctor.other.pass.cpp
The file was addedlibcxx/test/std/ranges/range.adaptors/range.join.view/adaptor.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/ctad.compile.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/ctor.default.pass.cpp
The file was modifiedlibcxx/docs/Status/RangesIssues.csv
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/decrement.pass.cpp
The file was modifiedlibcxx/docs/Status/Cxx20Issues.csv
The file was modifiedlibcxx/test/std/ranges/range.adaptors/range.join.view/iterator/iter.move.pass.cpp
Commit a41aaf166fed03e18021885d0951f1dec63b25b9 by markus.boeck02
[mlir] Make `Regions`s `cloneInto` multithread-readable

Prior to this patch, `cloneInto` would do a simple walk over the blocks and contained operations and clone and map them as it encounters them. As finishing touch it then remaps any successor and operands it has remapped during that process.

This is generally fine, but sadly leads to a lot of uses of both operations and blocks from the source region, in the cloned operations in the target region. Those uses lead to writes in the use-def list of the operations, making `cloneInto` never thread safe.

This patch reimplements `cloneInto` in three steps to avoid ever creating any extra uses on elements in the source region:
* It first creates the mapping of all blocks and block operands
* It then clones all operations to create the mapping of all operation results, but does not yet clone any regions or set the operands
* After all operation results have been mapped, it now sets the operations operands and clones their regions.

That way it is now possible to call `cloneInto` from multiple threads if the Region or Operation is isolated-from-above. This allows creating copies of  functions or to use `mlir::inlineCall` with the same source region from multiple threads. In the general case, the method is thread-safe if through cloning, no new uses of `Value`s from outside the cloned Operation/Region are created. This can be ensured by mapping any outside operands via the `BlockAndValueMapping` to `Value`s owned by the caller thread.

While I was at it, I also reworked the `clone` method of `Operation` a little bit and added a proper options class to avoid having a `cloneWithoutRegionsAndOperands` method, and be more extensible in the future. `cloneWithoutRegions` is now also a simple wrapper that calls `clone` with the proper options set. That way all the operation cloning code is now contained solely within `clone`.

Differential Revision: https://reviews.llvm.org/D123917
The file was modifiedmlir/lib/IR/Region.cpp
The file was modifiedmlir/include/mlir/IR/Operation.h
The file was modifiedmlir/include/mlir/IR/Region.h
The file was modifiedmlir/lib/IR/Operation.cpp
Commit c5600aef888b9c32c578edc9c807d61d72a37c08 by djordje.todorovic
[Debugify] Limit number of processed functions for original mode

Debugify in OriginalDebugInfo mode, does (DebugInfo) collect-before-pass & check-after-pass
for each instruction, which is pretty expensive. When used to analyze DebugInfo losses
in large projects (like LLVM), this raises the build time unacceptably.
This patch introduces a limit for the number of processed functions per compile unit.
By default, the limit is set to UINT_MAX (practically unlimited), and by using the introduced
option  -debugify-func-limit  the limit could be set to any positive integer number.

Differential revision: https://reviews.llvm.org/D115714
The file was modifiedllvm/test/Transforms/Util/Debugify/loc-only-original-mode.ll
The file was modifiedllvm/lib/Transforms/Utils/Debugify.cpp
The file was modifiedllvm/docs/HowToUpdateDebugInfo.rst
Commit 1056c56786c10866ffd7e878f8c75ad1f0914c07 by pavel
[lldb] Adjust libc++ string formatter for changes in D123580

The code needs more TLC, but for now I've tried making only the changes
that are necessary to get the tests passing -- postponing the more
invasive changes after I create a more comprehensive test.

In a couple of places I have changed the index-based element accesses to
name-based ones (as these are less sensitive to code perturbations). I'm
not sure why the code was using indexes in the first place, but I've
(manually) tested the change with various libc++ versions, and found no
issues with this approach.

Differential Revision: https://reviews.llvm.org/D124113
The file was modifiedlldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
Commit 29c8c070a1770fc510ccad3be753f6f50336f8cc by nikolasklauser
[libc++] Use bit field for checking if string is in long or short mode

This makes the code a bit simpler and (I think) removes the undefined behaviour from the normal string layout.

Reviewed By: ldionne, Mordante, #libc

Spies: labath, dblaikie, JDevlieghere, krytarowski, jgorbe, jingham, saugustine, arichardson, libcxx-commits

Differential Revision: https://reviews.llvm.org/D123580
The file was modifiedlibcxx/utils/gdb/libcxx/printers.py
The file was modifiedlibcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp
The file was modifiedlibcxx/include/string
The file was addedlibcxx/test/libcxx/strings/basic.string/string.capacity/max_size.pass.cpp
Commit b1620d40d0f4acbf6cb2a289718ce344eb171e57 by pc.wang
Revert "[RISCV] Precommit test for D122634"

This reverts commit 360d44e86defea94fb5608765fbdbfdb2a36f4c6.
The file was removedllvm/test/CodeGen/RISCV/machine-outliner-position.mir
The file was removedllvm/test/CodeGen/RISCV/machine-outliner-throw.ll
The file was removedllvm/test/CodeGen/RISCV/machine-outliner-cfi.mir
Commit 20cf4f8af8da9ad758bbb7972e5aa9cae6c5f995 by npopov
[PhaseOrdering] Remove RUN lines for legacy PM (NFC)
The file was modifiedllvm/test/Transforms/PhaseOrdering/pr45687.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/globalaa-retained.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/X86/hoist-load-of-baseptr.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/reassociate-after-unroll.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/X86/earlycse-after-simplifycfg-two-entry-phi-node-folding.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/pr36760.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/X86/speculation-vs-tbaa.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/pr39282.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/simplifycfg-switch-lowering-vs-correlatedpropagation.ll
Commit 408226f20ab51508350b2e683c39e1710ea73491 by aaron
Fix Sphinx build
The file was modifiedclang/include/clang/Basic/AttrDocs.td
Commit ac213375d9639974a29db99c5a9d145d2397385c by llvm-dev
[InstCombine] Add nonpow2 (negative) test for D123374
The file was modifiedllvm/test/Transforms/InstCombine/add-mask.ll
Commit 96e7487013776c26f0a5203b2e4b104b61efcedf by andrew.savonichev
[NVPTX] Fix LIT tests with default nameTableKind

Default nameTableKind results in the following DWARF section:

    .section .debug_pubnames
    {
      .b32 LpubNames_end0-LpubNames_start0    // Length of Public Names Info
      LpubNames_start0:
      [...]
      LpubNames_end0:
    }

Without -mattr=+ptx75 ptxas complains about labels and label
expressions:

error   : Feature 'labels1 - labels2 expression in .section' requires
PTX ISA .version 7.5 or later
error   : Feature 'Defining labels in .section' requires PTX ISA
.version 7.0 or later

The patch modifies dbg-value-const-byref.ll to let it run without PTX
7.5 (available from CUDA 11.0), and adds a new test just for this
case.

Differential revision: https://reviews.llvm.org/D124108
The file was addedllvm/test/DebugInfo/NVPTX/debug-name-table.ll
The file was modifiedllvm/test/DebugInfo/NVPTX/dbg-value-const-byref.ll
Commit 95d77383f2ba8d3136856b52520d3f73f9bc89e7 by fabian.wolff
[clang-tidy] Fix behavior of `modernize-use-using` with nested structs/unions

Fixes https://github.com/llvm/llvm-project/issues/50334.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D113804
The file was modifiedclang-tools-extra/test/clang-tidy/checkers/modernize-use-using.cpp
The file was modifiedclang-tools-extra/clang-tidy/modernize/UseUsingCheck.h
The file was modifiedclang-tools-extra/clang-tidy/modernize/UseUsingCheck.cpp
Commit 850b2c6b3c73a48cce05c163c49fcea89491a5e1 by markus.boeck02
[mlir] Fix `Region`s `takeBody` method if the region is not empty

The current implementation of takeBody first clears the Region, before then taking ownership of the blocks of the other regions. The issue here however, is that when clearing the region, it does not take into account references of operations to each other. In particular, blocks are deleted from front to back, and operations within a block are very likely to be deleted despite still having uses, causing an assertion to trigger [0].

This patch fixes that issue by simply calling dropAllReferences()before clearing the blocks.

[0] https://github.com/llvm/llvm-project/blob/9a8bb4bc635de9d56706262083c15eb1e0cf3e87/mlir/lib/IR/Operation.cpp#L154

Differential Revision: https://reviews.llvm.org/D123913
The file was modifiedmlir/test/lib/IR/CMakeLists.txt
The file was addedmlir/test/lib/IR/TestRegions.cpp
The file was modifiedmlir/tools/mlir-opt/mlir-opt.cpp
The file was modifiedmlir/include/mlir/IR/Region.h
The file was addedmlir/test/IR/test-take-body.mlir
Commit 9001edc5355c75becb7b47afb63b867dbeea2737 by npopov
[InstCombine] Split up test for store with undef (NFC)
The file was modifiedllvm/test/Transforms/InstCombine/store.ll
Commit 662f57ee21a45919ed85f912c3a05153d125d081 by npopov
[InstCombine] Add tests for memset with undef/poison value (NFC)
The file was modifiedllvm/test/Transforms/InstCombine/memset.ll
Commit fa4347261e76e37f9632fe010e2e7beae97d117b by yuanke.luo
[X86] Add test case for SetCCMOVMSK combine.

Create 2 users for MOVMSK to test if compiler would perform the combine
"MOVMSK(CONCAT(X,Y)) == 0 ->  MOVMSK(OR(X,Y))".
The file was modifiedllvm/test/CodeGen/X86/vector-compare-any_of.ll
Commit 607f8ced39259b672f4141190f4e7e5ab3375cd1 by jay.foad
[AMDGPU]: Fix failing assertion in SIMachineScheduler

This fixes the assertion failure "Loop in the Block Graph!".

SIMachineScheduler groups instructions into blocks (also referred to
as coloring or groups) and then performs a two-level scheduling:
inter-block scheduling, and intra-block scheduling.

This approach requires that the dependency graph on the blocks which
is obtained by contracting the blocks in the original dependency graph
is acyclic. In other words: Whenever A and B end up in the same block,
all vertices on a path from A to B must be in the same block.

When compiling an example consisting of an export followed by
a buffer store, we see a dependency between these two. This dependency
may be false, but that is a different issue.
This dependency was not correctly accounted for by SiMachineScheduler.

A new test case si-scheduler-exports.ll demonstrating this is
also added in this commit.

The problematic part of SiMachineScheduler was a post-optimization of
the block assignment that tried to group all export instructions into
a separate export block for better execution performance. This routine
correctly checked that any paths from exports to exports did not
contain any non-exports, but not vice-versa: In case of an export with
a non-export successor dependency, that single export was moved
to a separate block, which could then be both a successor and a
predecessor block of a non-export block.

As fix, we now skip export grouping if there are exports with direct
non-export successor dependencies. This fixes the issue at hand,
but is slightly pessimistic:
We *could* group all exports into a separate block that have neither
direct nor indirect export successor dependencies.
We will review the potential performance impact and potentially
revisit with a more sophisticated implementation.

Note that just grouping all exports without direct non-export successor
dependencies could still lead to illegal blocks, since non-export A
could depend on export B that depends on export C. In that case,
export C has no non-export successor, but still may not be grouped
into an export block.
The file was modifiedllvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
The file was addedllvm/test/CodeGen/AMDGPU/si-scheduler-exports.ll
Commit 46c2b41d02e385acfbfeba64e76d6f448fb0b275 by npopov
[InstCombine] Remove dead code (NFC)

This was a leftover condition without code.
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
Commit 13403a70e45b2d22878ba59fc211f8dba3a8deba by karl.meakin
[AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY

Differential Revision: https://reviews.llvm.org/D123322
The file was modifiedllvm/test/CodeGen/AArch64/sadd_sat_vec.ll
The file was modifiedllvm/test/CodeGen/AArch64/adc.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AArch64/neon-abd.ll
The file was modifiedllvm/test/CodeGen/AArch64/atomicrmw-O0.ll
The file was modifiedllvm/test/CodeGen/AArch64/usub_sat_vec.ll
The file was modifiedllvm/test/CodeGen/AArch64/arm64-atomic-128.ll
The file was modifiedllvm/test/CodeGen/AArch64/uadd_sat_vec.ll
The file was modifiedllvm/test/CodeGen/AArch64/addcarry-crash.ll
The file was modifiedllvm/test/CodeGen/AArch64/nzcv-save.ll
The file was modifiedllvm/test/CodeGen/AArch64/arm64-vabs.ll
The file was modifiedllvm/test/CodeGen/AArch64/neg-abs.ll
The file was modifiedllvm/test/CodeGen/AArch64/icmp-shift-opt.ll
The file was modifiedllvm/test/CodeGen/AArch64/vec_uaddo.ll
The file was modifiedllvm/test/CodeGen/AArch64/ssub_sat_vec.ll
The file was modifiedllvm/test/CodeGen/AArch64/vecreduce-add-legalization.ll
The file was modifiedllvm/test/CodeGen/AArch64/i128-math.ll
Commit 81904454f7cdebecedb1185d8112b630a7124350 by karl.meakin
[AArch64] Add `foldOverflowCheck` DAG combine

Differential Revision: https://reviews.llvm.org//D123779
The file was modifiedllvm/test/CodeGen/AArch64/atomicrmw-O0.ll
The file was modifiedllvm/test/CodeGen/AArch64/usub_sat_vec.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AArch64/uadd_sat_vec.ll
The file was modifiedllvm/test/CodeGen/AArch64/i128-math.ll
The file was modifiedllvm/test/CodeGen/AArch64/icmp-shift-opt.ll
The file was modifiedllvm/test/CodeGen/AArch64/sadd_sat_vec.ll
The file was modifiedllvm/test/CodeGen/AArch64/neon-abd.ll
The file was modifiedllvm/test/CodeGen/AArch64/addcarry-crash.ll
The file was modifiedllvm/test/CodeGen/AArch64/neg-abs.ll
The file was modifiedllvm/test/CodeGen/AArch64/adc.ll
The file was modifiedllvm/test/CodeGen/AArch64/vec_uaddo.ll
The file was modifiedllvm/test/CodeGen/AArch64/ssub_sat_vec.ll
The file was modifiedllvm/test/CodeGen/AArch64/vecreduce-add-legalization.ll
The file was modifiedllvm/test/CodeGen/AArch64/arm64-atomic-128.ll
The file was modifiedllvm/test/CodeGen/AArch64/arm64-vabs.ll
The file was modifiedllvm/test/CodeGen/AArch64/nzcv-save.ll
Commit ead231dec0fc6c13f5f8209eaacfb06e0d0be433 by npopov
[InstCombine] Fix typo in test (NFC)

This is a copy paste mistake, this variant of the test was supposed
to use poison instead of undef.
The file was modifiedllvm/test/Transforms/InstCombine/memset.ll
Commit 4e0dacb2cf325158c3c672f45202ab166aec99b0 by petar.avramovic
AMDGPU/GlobalISel: Precommit test for D124163
The file was addedllvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-i1-copy.mir
The file was addedllvm/test/CodeGen/AMDGPU/GlobalISel/i1-copy.ll
Commit e06290e53f2880962fef582f118482d70f1c27f0 by petar.avramovic
AMDGPU/GlobalISel: Fix isVCC for uniform s1 with reg class on wave32

Fix isVCC for register that was assigned register class during
inst-selection. This happens when register has multiple uses.
For wave32, uniform i1 to vcc copy was selected like vcc to vcc
copy when uniform i1 had assigned register class.
Uniform i1 register with assigned register class will have s1 LLT,
be defined using G_TRUNC and class will be SReg_32RegClass.
Vcc i1 register with assigned register class will have s1 LLT,
class will be SReg_32RegClass for wave32 and SReg_64RegClass for
wave64 and register will not be defined by G_TRUNC.

Differential Revision: https://reviews.llvm.org/D124163
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/i1-copy.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-i1-copy.mir
Commit 82ecf9a0b1b3b90e99e90cb16f4bff78c4e8be3c by epastor
[LLVM-ML] Add standard LLVM debug flags

Adds support for -debug and -debug-only= flags.

Reviewed By: ayzhao

Differential Revision: https://reviews.llvm.org/D123545
The file was modifiedllvm/tools/llvm-ml/Opts.td
The file was modifiedllvm/tools/llvm-ml/llvm-ml.cpp
Commit 30f22429d38944e126db75296a1ffc6c12c7b87a by zinenko
[mlir] Connect Transform dialect to PDL

This introduces a pair of ops to the Transform dialect that connect it to PDL
patterns. Transform dialect relies on PDL for matching the Payload IR ops that
are about to be transformed. For this purpose, it provides a container op for
patterns, a "pdl_match" op and transform interface implementations that call
into the pattern matching infrastructure.

To enable the caching of compiled patterns, this also provides the extension
mechanism for TransformState. Extensions allow one to store additional
information in the TransformState and thus communicate it between different
Transform dialect operations when they are applied. They can be added and
removed when applying transform ops. An extension containing a symbol table in
which the pattern names are resolved and a pattern compilation cache is
introduced as the first client.

Depends On D123664

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D124007
The file was modifiedmlir/include/mlir/Dialect/Transform/IR/TransformInterfaces.h
The file was modifiedmlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp
The file was modifiedmlir/test/Dialect/Transform/ops-invalid.mlir
The file was modifiedmlir/lib/Dialect/Transform/IR/TransformDialect.cpp
The file was modifiedmlir/include/mlir/Dialect/Transform/IR/TransformOps.h
The file was modifiedmlir/lib/Dialect/Transform/IR/CMakeLists.txt
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
The file was modifiedmlir/test/Dialect/Transform/test-interpreter.mlir
The file was modifiedmlir/test/lib/Dialect/Transform/TestTransformDialectExtension.td
The file was modifiedmlir/include/mlir/Dialect/Transform/IR/TransformDialect.h
The file was modifiedmlir/include/mlir/Dialect/Transform/IR/TransformOps.td
The file was modifiedmlir/lib/Dialect/Transform/IR/TransformInterfaces.cpp
The file was modifiedmlir/include/mlir/Dialect/Transform/IR/TransformDialect.td
The file was modifiedmlir/test/Dialect/Transform/ops.mlir
The file was modifiedmlir/lib/Dialect/Transform/IR/TransformOps.cpp
Commit 1f71b5a38605a4f101af43288ae22feb89c8a469 by gbreynoo
[llvm-ar] Fix thin archive being wrongly converted to a full archive

When using the L option to quick append a full archive to a thin
archive, the thin archive was being wrongly converted to a full archive.
I've fixed the issue and added a check for it in
thin-to-full-archive.test and expanded some tests.

Differential Revision: https://reviews.llvm.org/D123142
The file was modifiedllvm/test/tools/llvm-ar/regular-to-thin-archive.test
The file was modifiedllvm/tools/llvm-ar/llvm-ar.cpp
The file was modifiedllvm/test/tools/llvm-ar/thin-to-regular-archive.test
The file was modifiedllvm/test/tools/llvm-ar/flatten-thin-archive.test
Commit e1c0d2fb8272dd7f8e406334ac14077154217031 by kbobyrev
[clangd] Correctly identify self-contained headers included rercursively

Right now when exiting the file Headers.cpp will identify the recursive
inclusion (with a new FileID) as non self-contained and will add it to the set
from which it will never be removed. As a result, we get incorrect results in
the IncludeStructure and Include Cleaner. This patch is a fix.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D124166
The file was modifiedclang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp
The file was modifiedclang-tools-extra/clangd/unittests/HeadersTests.cpp
The file was modifiedclang-tools-extra/clangd/Headers.cpp
Commit 9f05b111ee1fc48974ed515c865bdaddb5998d01 by kbobyrev
[clangd] Include Cleaner: suppress unused warnings for IWYU pragma: export

Add limited support for "IWYU pragma: export" - for now it just supresses the
warning similar to "IWYU pragma: keep".

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D124170
The file was modifiedclang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp
The file was modifiedclang-tools-extra/clangd/Headers.cpp
Commit f80e369f61ebd33dd9377bb42fcab64d17072b18 by paul.robinson
[PS4] Driver: use correct --shared option
The file was modifiedclang/test/Driver/ps4-ps5-linker-non-win.c
The file was modifiedclang/lib/Driver/ToolChains/PS4CPU.cpp
The file was modifiedclang/test/Driver/ps4-ps5-linker-win.c
Commit 782d0105ba245aaded586931a997618a25f80690 by spatel
[InstCombine] add tests for C << (X - C1); NFC
The file was modifiedllvm/test/Transforms/InstCombine/shift-add.ll
Commit 5819f4a422865fc9a8ea4dc772769e14010ff6a7 by spatel
[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X

This is similar to an existing pre-shift-of-constant fold:
8a9c70fc01e6
...but in this case, we need no-wrap on the shl and a negative
offset:
https://alive2.llvm.org/ce/z/_RVz99

Fixes #54890
The file was modifiedllvm/test/Transforms/InstCombine/shift-add.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
Commit 8a3afc6da5bc94fcbac708156fc1cf4220e7d1f1 by tobias
[compiler-rt][Darwin] Add arm64 to simulator platforms

This patch is the reland of a8e5ce76b475a22546090a73c22fa4f83529aa4e,
which includes additional SDK version checks to ensure that
XCode's headers support arm64 builds.

Differential Revision: https://reviews.llvm.org/D119174
The file was modifiedcompiler-rt/cmake/builtin-config-ix.cmake
Commit d8c1d37ba37d22351e3edd4667e639898418419b by abrachet
[fuchsia] Don't include duplicate profiling symbols for Fuchsia

InstrProfilingPlatformLinux.c already provides these symbols. Linker order
saved us from noticing before.

Reviewed By: mcgrathr

Differential Revision: https://reviews.llvm.org/D124136
The file was modifiedcompiler-rt/lib/profile/InstrProfilingPlatformOther.c
Commit 13d59a8ee46ff57fffded278764d8e076c9daca8 by llvm-dev
[M68k] Regenerate cmp.ll tests

M68k is still experimental so wasn't updated in a recent DAG combine
The file was modifiedllvm/test/CodeGen/M68k/Control/cmp.ll
Commit ae46b3e01faa0515fd85b8e107389348bd0e341a by i
Revert D121279 "[MLIR][GPU] Add canonicalizer for gpu.memcpy"

This reverts commit 12f55cac69d8978d1c433756a8b2114bf9ed1e1b.

Causes miscompile. Will follow up with a reproduce.
The file was modifiedmlir/lib/Dialect/GPU/IR/GPUDialect.cpp
The file was modifiedmlir/include/mlir/Dialect/GPU/GPUOps.td
The file was modifiedmlir/test/Dialect/GPU/canonicalize.mlir
Commit f8a078f20c59620c44cb9bbf1ac32e9e2e611961 by llvm-dev
[X86] Add test case for Issue #54911
The file was modifiedllvm/test/CodeGen/X86/bitcast-int-to-vector-bool.ll
Commit ad46aaede6e4d5a6951fc9827da994d3fbe1af44 by adamcz
[clangd] Add beforeExecute() callback to FeatureModules.

It runs immediatelly before FrontendAction::Execute() with a mutable
CompilerInstance, allowing FeatureModules to register callbacks, remap
files, etc.

Differential Revision: https://reviews.llvm.org/D124176
The file was modifiedclang-tools-extra/clangd/Preamble.cpp
The file was modifiedclang-tools-extra/clangd/FeatureModule.h
The file was modifiedclang-tools-extra/clangd/ParsedAST.cpp
The file was modifiedclang-tools-extra/clangd/unittests/FeatureModulesTests.cpp
Commit 889847922dc6b9247f7f9189cf06e46fa5591049 by thakis
[lld/mac] Warn that writing zippered outputs isn't implemented

A "zippered" dylib contains several LC_BUILD_VERSION load commands, usually
one each for "normal" macOS and one for macCatalyst.

These are usually created by passing something like

   -shared -target arm64-apple-macos -darwin-target-variant arm64-apple-ios13.1-macabi

to clang, which turns it into

    -platform_version macos 12.0.0 12.3 -platform_version "mac catalyst" 14.0.0 15.4

for the linker.

ld64.lld can read these files fine, but it can't write them.  Before this
change, it would just silently use the last -platform_version flag and ignore
the rest.

This change adds a warning that writing zippered dylibs isn't implemented yet
instead.

Sadly, parts of ld64.lld's test suite relied on the previous
"silently use last flag" semantics for its test suite: `%lld` always expanded
to `ld64.lld -platform_version macos 10.15 11.0` and tests that wanted a
different value passed a 2nd `-platform_version` flag later on. But this now
produces a warning if the platform passed to `-platform_version` is not `macos`.

There weren't very many cases of this, so move these to use `%no-arg-lld` and
manually pass `-arch`.

Differential Revision: https://reviews.llvm.org/D124106
The file was modifiedlld/MachO/InputFiles.cpp
The file was modifiedlld/test/MachO/invalid/incompatible-arch.s
The file was modifiedlld/MachO/Driver.cpp
The file was modifiedlld/test/MachO/invalid/incompatible-target-tapi.test
The file was modifiedlld/test/MachO/zippered.yaml
The file was modifiedlld/test/MachO/platform-version.s
The file was modifiedlld/test/MachO/tapi-link-by-arch.s
The file was modifiedlld/test/MachO/lc-build-version.s
The file was modifiedlld/test/MachO/objc-uses-custom-personality.s
Commit af3fb071545918f2de61142fa12d8782e5a37fa5 by sam.mccall
[Frontend] Simplify PrecompiledPreamble::PCHStorage. NFC

- Remove fiddly union, preambles are heavyweight
- Remove fiddly move constructors in TempPCHFile and PCHStorage, use unique_ptr
- Remove unneccesary accessors on PCHStorage
- Remove trivial InMemoryStorage
- Move implementation details into cpp file

This is a prefactoring, followup change will change the in-memory PCHStorage to
avoid extra string copies while creating it.

Differential Revision: https://reviews.llvm.org/D124177
The file was modifiedclang/include/clang/Frontend/PrecompiledPreamble.h
The file was modifiedclang/lib/Frontend/PrecompiledPreamble.cpp
Commit e077e3a6483efcb2099670537a6d7168629b2eb3 by chenglin.bi
[InstCombine] add baseline test for (X * C2) << C1 --> X * (C2 << C1) without one use; NFC
The file was modifiedllvm/test/Transforms/InstCombine/apint-shift.ll
Commit 8960ba7491e8551f28bbc7a177e8cfb8404513b0 by spatel
Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X"

This reverts commit 5819f4a422865fc9a8ea4dc772769e14010ff6a7.
This caused bots to fail with a crash/assert during the fold,
so some constraint was missed.
The file was modifiedllvm/test/Transforms/InstCombine/shift-add.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
Commit afcc6baac52fcc91d1636f6803f5c230e7018016 by jacob.lambert
[clang][HIP] Updating driver to enable archive/bitcode to bitcode linking when targeting HIPAMD toolchain

Differential Revision: https://reviews.llvm.org/D124151
The file was modifiedclang/lib/Driver/ToolChains/HIPAMD.cpp
The file was modifiedclang/lib/Driver/ToolChains/HIPAMD.h
The file was modifiedclang/lib/Driver/Driver.cpp
The file was addedclang/test/Driver/hip-link-bc-to-bc.hip
The file was modifiedclang/test/Driver/hip-phases.hip
Commit b543d28df7b067dcda833c717a59faa28c1151a1 by chenglin.bi
[InstCombine] Add one use limitation for  (X * C2) << C1 --> X * (C2 << C1)

Follow up D123453, add one-use limitation for
(X * C2) << C1 --> X * (C2 << C1)
to make consistent with
lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124183
The file was modifiedllvm/test/Transforms/InstCombine/apint-shift.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
Commit ac94073daa18687b76dc49a60bb2844799f28ee3 by Stanislav.Mekhanoshin
[AMDGPU] Refine 64 bit misaligned LDS ops selection

Here is the performance data:
```
Using platform: AMD Accelerated Parallel Processing
Using device: gfx900:xnack-

ds_write_b64                       aligned by  8:  3.2 sec
ds_write2_b32                      aligned by  8:  3.2 sec
ds_write_b16 * 4                   aligned by  8:  7.0 sec
ds_write_b8 * 8                    aligned by  8: 13.2 sec
ds_write_b64                       aligned by  1:  7.3 sec
ds_write2_b32                      aligned by  1:  7.5 sec
ds_write_b16 * 4                   aligned by  1: 14.0 sec
ds_write_b8 * 8                    aligned by  1: 13.2 sec
ds_write_b64                       aligned by  2:  7.3 sec
ds_write2_b32                      aligned by  2:  7.5 sec
ds_write_b16 * 4                   aligned by  2:  7.1 sec
ds_write_b8 * 8                    aligned by  2: 13.3 sec
ds_write_b64                       aligned by  4:  4.6 sec
ds_write2_b32                      aligned by  4:  3.2 sec
ds_write_b16 * 4                   aligned by  4:  7.1 sec
ds_write_b8 * 8                    aligned by  4: 13.3 sec
ds_read_b64                        aligned by  8:  2.3 sec
ds_read2_b32                       aligned by  8:  2.2 sec
ds_read_u16 * 4                    aligned by  8:  4.8 sec
ds_read_u8 * 8                     aligned by  8:  8.6 sec
ds_read_b64                        aligned by  1:  4.4 sec
ds_read2_b32                       aligned by  1:  7.3 sec
ds_read_u16 * 4                    aligned by  1: 14.0 sec
ds_read_u8 * 8                     aligned by  1:  8.7 sec
ds_read_b64                        aligned by  2:  4.4 sec
ds_read2_b32                       aligned by  2:  7.3 sec
ds_read_u16 * 4                    aligned by  2:  4.8 sec
ds_read_u8 * 8                     aligned by  2:  8.7 sec
ds_read_b64                        aligned by  4:  4.4 sec
ds_read2_b32                       aligned by  4:  2.3 sec
ds_read_u16 * 4                    aligned by  4:  4.8 sec
ds_read_u8 * 8                     aligned by  4:  8.7 sec

Using platform: AMD Accelerated Parallel Processing
Using device: gfx1030

ds_write_b64                       aligned by  8:  4.4 sec
ds_write2_b32                      aligned by  8:  4.3 sec
ds_write_b16 * 4                   aligned by  8:  7.9 sec
ds_write_b8 * 8                    aligned by  8: 13.0 sec
ds_write_b64                       aligned by  1: 23.2 sec
ds_write2_b32                      aligned by  1: 23.1 sec
ds_write_b16 * 4                   aligned by  1: 44.0 sec
ds_write_b8 * 8                    aligned by  1: 13.0 sec
ds_write_b64                       aligned by  2: 23.2 sec
ds_write2_b32                      aligned by  2: 23.1 sec
ds_write_b16 * 4                   aligned by  2:  7.9 sec
ds_write_b8 * 8                    aligned by  2: 13.1 sec
ds_write_b64                       aligned by  4: 13.5 sec
ds_write2_b32                      aligned by  4:  4.3 sec
ds_write_b16 * 4                   aligned by  4:  7.9 sec
ds_write_b8 * 8                    aligned by  4: 13.1 sec
ds_read_b64                        aligned by  8:  3.5 sec
ds_read2_b32                       aligned by  8:  3.4 sec
ds_read_u16 * 4                    aligned by  8:  5.3 sec
ds_read_u8 * 8                     aligned by  8:  8.5 sec
ds_read_b64                        aligned by  1: 13.1 sec
ds_read2_b32                       aligned by  1: 22.7 sec
ds_read_u16 * 4                    aligned by  1: 43.9 sec
ds_read_u8 * 8                     aligned by  1:  7.9 sec
ds_read_b64                        aligned by  2: 13.1 sec
ds_read2_b32                       aligned by  2: 22.7 sec
ds_read_u16 * 4                    aligned by  2:  5.6 sec
ds_read_u8 * 8                     aligned by  2:  7.9 sec
ds_read_b64                        aligned by  4: 13.1 sec
ds_read2_b32                       aligned by  4:  3.4 sec
ds_read_u16 * 4                    aligned by  4:  5.6 sec
ds_read_u8 * 8                     aligned by  4:  7.9 sec
```

GFX10 exposes a different pattern for sub-DWORD load/store performance
than GFX9. On GFX9 it is faster to issue a single unaligned load or
store than a fully split b8 access, where on GFX10 even a full split
is better. However, this is a theoretical only gain because splitting
an access to a sub-dword level will require more registers and packing/
unpacking logic, so ignoring this option it is better to use a single
64 bit instruction on a misaligned data with the exception of 4 byte
aligned data where ds_read2_b32/ds_write2_b32 is better.

Differential Revision: https://reviews.llvm.org/D123956
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/ds_read2.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/ds-alignment.ll
The file was modifiedllvm/lib/Target/AMDGPU/DSInstructions.td
The file was modifiedllvm/test/CodeGen/AMDGPU/ds_write2.ll
Commit 98b866892d657010795cb6416454bc33ebf0cc2b by craig.topper
[RISCV] Add special case to constant materialization to remove trailing zeros first.

If there are fewer than 12 trailing zeros, we'll try to use an ADDI
at the end of the sequence. If we strip trailing zeros and end the
sequence with a SLLI we might find a shorter sequence.

Differential Revision: https://reviews.llvm.org/D124148
The file was modifiedllvm/test/CodeGen/RISCV/rv64zbp.ll
The file was modifiedllvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
The file was modifiedllvm/test/CodeGen/RISCV/rv64zba.ll
The file was modifiedllvm/test/CodeGen/RISCV/vararg.ll
The file was modifiedllvm/test/MC/RISCV/rv64zba-aliases-valid.s
The file was modifiedllvm/test/CodeGen/RISCV/imm.ll
The file was modifiedllvm/test/MC/RISCV/rv64zbs-aliases-valid.s
The file was modifiedllvm/test/MC/RISCV/rv64i-aliases-valid.s
Commit 0edb262d914a67b7d3284b013141cc8d5bf77bc6 by zinenko
[mlir] enable doc generation for the transform dialect
The file was modifiedmlir/include/mlir/Dialect/Transform/IR/CMakeLists.txt
The file was addedmlir/docs/Dialects/Transform.md
Commit 25aba1abb546a4486d2fe9c2bdb6d8c25047bf85 by chenglin.bi
Revert "[InstCombine] Add one use limitation for  (X * C2) << C1 --> X * (C2 << C1)"

This reverts commit b543d28df7b067dcda833c717a59faa28c1151a1.
The file was modifiedllvm/test/Transforms/InstCombine/apint-shift.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineShifts.cpp