SuccessChanges

Summary

  1. [NPM] Add target specific hook to add passes for New Pass Manager (details)
  2. [X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare. (details)
  3. [asan][test] XFAIL Posix/no_asan_gen_globals.c on Solaris (details)
  4. [NFC] Fix spacing in clang/test/Driver/aix-ld.c (details)
  5. [flang] Fix descriptor-based array data item I/O for list-directed CHARACTER & LOGICAL (details)
  6. [clangd] Remove dead variable. NFC (details)
  7. [PDB] Merge types in parallel when using ghashing (details)
  8. Revert "[PDB] Merge types in parallel when using ghashing" (details)
  9. [mlir][Linalg] Add pattern to tile and fuse Linalg operations on buffers. (details)
  10. [Msan] Add ptsname, ptsname_r interceptors (details)
  11. [AMDGPU] Reorganize VOP3P encoding (details)
Commit ce5379f0f0675592fd10a522009fd5b1561ca72b by aeubanks
[NPM] Add target specific hook to add passes for New Pass Manager

The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D88138
The file was addedllvm/test/CodeGen/Hexagon/registerpassbuildercallbacks.ll
The file was modifiedllvm/lib/Target/Hexagon/HexagonTargetMachine.h
The file was modifiedclang/lib/CodeGen/BackendUtil.cpp
The file was modifiedllvm/tools/opt/NewPMDriver.cpp
The file was modifiedllvm/include/llvm/Target/TargetMachine.h
The file was modifiedllvm/lib/Target/Hexagon/HexagonTargetMachine.cpp
Commit d1d7fc98325d948bede85e6304c5ca93f79e050e by craig.topper
[X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare.

This will be further canonicalized to a compare involving 0
which will enable the use of test instructions. Either using
cmovg for signed for cmovne for unsigned.

Fixes more case for PR47049
The file was modifiedllvm/test/CodeGen/X86/cmov.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 8a1084a9486313e9f46e61ab69f80309c7050e1f by ro
[asan][test] XFAIL Posix/no_asan_gen_globals.c on Solaris

`Posix/no_asan_gen_globals.c` currently `FAIL`s on Solaris:

  $ nm no_asan_gen_globals.c.tmp.exe | grep ___asan_gen_
  0809696a r .L___asan_gen_.1
  0809a4cd r .L___asan_gen_.2
  080908e2 r .L___asan_gen_.4
  0809a4cd r .L___asan_gen_.5
  0809a529 r .L___asan_gen_.7
  0809a4cd r .L___asan_gen_.8

As detailed in Bug 47607, there are two factors here:

- `clang` plays games by emitting some local labels into the symbol
  table.  When instead one uses `-fno-integrated-as` to have `gas` create
  the object files, they don't land in the objects in the first place.
- Unlike GNU `ld`, the Solaris `ld` doesn't support support
  `-X`/`--discard-locals` but instead relies on the assembler to follow its
  specification and not emit local labels.

Therefore this patch `XFAIL`s the test on Solaris.

Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D88218
The file was modifiedcompiler-rt/test/asan/TestCases/Posix/no_asan_gen_globals.c
Commit ae4c400e02fc3f7cff11cc332e6b107353b3e6a2 by hubert.reinterpretcast
[NFC] Fix spacing in clang/test/Driver/aix-ld.c

Fix one line with mismatch in indentation after afc277b0ed0d.
The file was modifiedclang/test/Driver/aix-ld.c
Commit 0c3c8f4ae69a619efd8dc088e2572db172d40547 by pklausler
[flang] Fix descriptor-based array data item I/O for list-directed CHARACTER & LOGICAL

These types have to distinguish list-directed I/O from formatted I/O,
and the subscript incrementation call was in the formatted branch
of the if() rather than after the if().

Differential revision: https://reviews.llvm.org/D88606
The file was modifiedflang/runtime/descriptor-io.h
The file was modifiedflang/unittests/Runtime/hello.cpp
Commit 85fc5bf341395171e67490061f6fbc76b297b78d by sam.mccall
[clangd] Remove dead variable. NFC
The file was modifiedclang-tools-extra/clangd/URI.cpp
Commit 49b3459930655d879b2dc190ff8fe11c38a8be5f by rnk
[PDB] Merge types in parallel when using ghashing

This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
The file was modifiedlld/test/COFF/precomp-link.test
The file was modifiedlld/include/lld/Common/ErrorHandler.h
The file was modifiedlld/COFF/DebugTypes.cpp
The file was modifiedlld/test/COFF/pdb-type-server-missing.yaml
The file was modifiedllvm/include/llvm/DebugInfo/PDB/Native/TpiStreamBuilder.h
The file was modifiedlld/test/COFF/pdb-type-server-simple.test
The file was modifiedllvm/lib/DebugInfo/CodeView/RecordName.cpp
The file was modifiedlld/COFF/DebugTypes.h
The file was modifiedllvm/include/llvm/DebugInfo/CodeView/TypeHashing.h
The file was modifiedlld/COFF/PDB.cpp
The file was modifiedllvm/include/llvm/DebugInfo/CodeView/TypeIndex.h
The file was modifiedlld/test/COFF/s_udt.s
The file was modifiedlld/COFF/Driver.cpp
The file was modifiedlld/test/COFF/pdb-global-hashes.test
The file was modifiedlld/COFF/PDB.h
The file was modifiedlld/test/COFF/pdb-procid-remapping.test
The file was modifiedllvm/lib/DebugInfo/PDB/Native/TpiStreamBuilder.cpp
The file was modifiedlld/COFF/TypeMerger.h
Commit 8d250ac3cd48d0f17f9314685a85e77895c05351 by rnk
Revert "[PDB] Merge types in parallel when using ghashing"

This reverts commit 49b3459930655d879b2dc190ff8fe11c38a8be5f.
The file was modifiedllvm/include/llvm/DebugInfo/CodeView/TypeHashing.h
The file was modifiedlld/COFF/PDB.h
The file was modifiedllvm/lib/DebugInfo/PDB/Native/TpiStreamBuilder.cpp
The file was modifiedllvm/include/llvm/DebugInfo/PDB/Native/TpiStreamBuilder.h
The file was modifiedlld/test/COFF/pdb-procid-remapping.test
The file was modifiedlld/test/COFF/pdb-type-server-missing.yaml
The file was modifiedlld/test/COFF/precomp-link.test
The file was modifiedlld/test/COFF/pdb-type-server-simple.test
The file was modifiedllvm/lib/DebugInfo/CodeView/RecordName.cpp
The file was modifiedlld/test/COFF/s_udt.s
The file was modifiedlld/COFF/Driver.cpp
The file was modifiedllvm/include/llvm/DebugInfo/CodeView/TypeIndex.h
The file was modifiedlld/COFF/TypeMerger.h
The file was modifiedlld/test/COFF/pdb-global-hashes.test
The file was modifiedlld/COFF/DebugTypes.cpp
The file was modifiedlld/COFF/DebugTypes.h
The file was modifiedlld/COFF/PDB.cpp
The file was modifiedlld/include/lld/Common/ErrorHandler.h
Commit c694588fc52a8845174fee06ad0bcfa338e87816 by ravishankarm
[mlir][Linalg] Add pattern to tile and fuse Linalg operations on buffers.

The pattern is structured similar to other patterns like
LinalgTilingPattern. The fusion patterns takes options that allows you
to fuse with producers of multiple operands at once.
- The pattern fuses only at the level that is known to be legal, i.e
  if a reduction loop in the consumer is tiled, then fusion should
  happen "before" this loop. Some refactoring of the fusion code is
  needed to fuse only where it is legal.
- Since the fusion on buffers uses the LinalgDependenceGraph that is
  not mutable in place the fusion pattern keeps the original
  operations in the IR, but are tagged with a marker that can be later
  used to find the original operations.

This change also fixes an issue with tiling and
distribution/interchange where if the tile size of a loop were 0 it
wasnt account for in these.

Differential Revision: https://reviews.llvm.org/D88435
The file was modifiedmlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
The file was modifiedmlir/test/lib/Transforms/CMakeLists.txt
The file was modifiedmlir/tools/mlir-opt/mlir-opt.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOpsInterface.td
The file was modifiedmlir/include/mlir/Dialect/Linalg/Utils/Utils.h
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
The file was addedmlir/test/Dialect/Linalg/fusion-pattern.mlir
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
The file was addedmlir/test/lib/Transforms/TestLinalgFusionTransforms.cpp
Commit 7475bd5411a3f62a7860db09a5bcf1fc147c43d6 by Vitaly Buka
[Msan] Add ptsname, ptsname_r interceptors

Reviewed By: eugenis, MaskRay

Differential Revision: https://reviews.llvm.org/D88547
The file was addedcompiler-rt/test/sanitizer_common/TestCases/Linux/ptsname.c
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
Commit 722d792499a4b60dd582f870cbdfb572897906b4 by Stanislav.Mekhanoshin
[AMDGPU] Reorganize VOP3P encoding

This changes width of encoding and opcode fields to match the
documentation.

Differential Revision: https://reviews.llvm.org/D88619
The file was modifiedllvm/lib/Target/AMDGPU/VOPInstructions.td
The file was modifiedllvm/lib/Target/AMDGPU/VOP3PInstructions.td