SuccessChanges

Summary

  1. Re-land "[PDB] Merge types in parallel when using ghashing" (details)
  2. [flang] Semantic analysis for FINAL subroutines (details)
  3. [OpenMP][libomptarget] make omp_get_initial_device 5.1 compliant (details)
  4. [OpenMP][OMPT] Update OMPT tests for newly added GOMP interface patches (details)
  5. Handle unknown OSes in DarwinTargetInfo::getExnObjectAlignment (details)
  6. [PowerPC] Add outer product instructions for MMA (details)
  7. Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes (bug 34579) (details)
  8. [OpenMP][libarcher] Allow all possible argument separators in TSAN_OPTIONS (details)
  9. [ARM] Add missing target for Arm neon test case. (details)
  10. [AArch64][GlobalISel] NFC: Refactor G_FCMP selection code (details)
  11. [lldb] Make TestGuiBasicDebug more lenient (details)
  12. [flang] Allow record advancement in external formatted sequential READ (details)
  13. [AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE. (details)
  14. [WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass (details)
  15. Try to fix build. May have used a C++ feature too new/not supported on all platforms. (details)
  16. [lld][WebAssembly] Allow exporting of mutable globals (details)
  17. Remove `Ops` suffix from dialect library names (details)
  18. [flang] Fix Gw.d format output (details)
  19. [mlir] Split Dialect::addOperations into two functions (details)
  20. [AArch64][GlobalISel] Clamp oversize FP arithmetic vectors. (details)
Commit 5519e4da83d1abc66620334692394749eceb0e50 by rnk
Re-land "[PDB] Merge types in parallel when using ghashing"

Stored Error objects have to be checked, even if they are success
values.

This reverts commit 8d250ac3cd48d0f17f9314685a85e77895c05351.
Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f..

Original commit message:
-----------------------------------------

This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
The file was modifiedlld/COFF/PDB.h
The file was modifiedlld/test/COFF/pdb-type-server-simple.test
The file was modifiedlld/COFF/PDB.cpp
The file was modifiedlld/include/lld/Common/ErrorHandler.h
The file was modifiedlld/test/COFF/s_udt.s
The file was modifiedllvm/lib/DebugInfo/PDB/Native/TpiStreamBuilder.cpp
The file was modifiedlld/COFF/DebugTypes.h
The file was modifiedllvm/include/llvm/DebugInfo/PDB/Native/TpiStreamBuilder.h
The file was modifiedlld/COFF/TypeMerger.h
The file was modifiedlld/test/COFF/pdb-procid-remapping.test
The file was modifiedlld/test/COFF/pdb-type-server-missing.yaml
The file was modifiedlld/test/COFF/precomp-link.test
The file was modifiedlld/COFF/DebugTypes.cpp
The file was modifiedlld/COFF/Driver.cpp
The file was modifiedllvm/lib/DebugInfo/CodeView/RecordName.cpp
The file was modifiedlld/test/COFF/pdb-global-hashes.test
The file was modifiedllvm/include/llvm/DebugInfo/CodeView/TypeHashing.h
The file was modifiedllvm/include/llvm/DebugInfo/CodeView/TypeIndex.h
Commit 37b2e2b04cf434b368b1edf29609be21952316f9 by pklausler
[flang] Semantic analysis for FINAL subroutines

Represent FINAL subroutines in the symbol table entries of
derived types.  Enforce constraints.  Update tests that have
inadvertent violations or modified messages.  Added a test.

The specific procedure distinguishability checking code for generics
was used to enforce distinguishability of FINAL procedures.
(Also cleaned up some confusion and redundancy noticed in the
type compatibility infrastructure while digging into that area.)

Differential revision: https://reviews.llvm.org/D88613
The file was modifiedflang/lib/Semantics/mod-file.h
The file was modifiedflang/include/flang/Evaluate/type.h
The file was modifiedflang/lib/Evaluate/tools.cpp
The file was modifiedflang/test/Semantics/resolve55.f90
The file was modifiedflang/lib/Semantics/mod-file.cpp
The file was addedflang/test/Semantics/final01.f90
The file was modifiedflang/include/flang/Semantics/symbol.h
The file was modifiedflang/lib/Semantics/pointer-assignment.cpp
The file was modifiedflang/lib/Semantics/resolve-names.cpp
The file was modifiedflang/test/Semantics/modfile10.f90
The file was modifiedflang/lib/Semantics/symbol.cpp
The file was modifiedflang/lib/Evaluate/characteristics.cpp
The file was modifiedflang/lib/Semantics/check-call.cpp
The file was modifiedflang/lib/Semantics/tools.cpp
The file was modifiedflang/test/Semantics/call03.f90
The file was modifiedflang/include/flang/Evaluate/characteristics.h
The file was modifiedflang/lib/Evaluate/type.cpp
The file was modifiedflang/test/Semantics/resolve32.f90
The file was modifiedflang/include/flang/Semantics/tools.h
The file was modifiedflang/lib/Semantics/check-declarations.cpp
The file was modifiedflang/test/Semantics/call05.f90
Commit 55cff5b288650f0ce814c3c85041852bbed554b8 by protze
[OpenMP][libomptarget] make omp_get_initial_device 5.1 compliant

OpenMP 5.1 defines omp_get_initial_device to return the same value as omp_get_num_devices.
Since this change is also 5.0 compliant, no versioning is needed.

Differential Revision: https://reviews.llvm.org/D88149
The file was modifiedopenmp/libomptarget/src/api.cpp
The file was modifiedopenmp/libomptarget/include/omptarget.h
The file was modifiedopenmp/runtime/src/kmp_ftn_entry.h
The file was modifiedopenmp/runtime/src/kmp.h
Commit 6104b30446aa976006fd322af4a57a8f0124f94f by protze
[OpenMP][OMPT] Update OMPT tests for newly added GOMP interface patches

This patch updates the expected results for the GOMP interface patches: D87267, D87269, and D87271.
The taskwait-depend test is changed to really use taskwait-depend and copied to an task_if0-depend test.

To pass the tests, the handling of the return address was fixed.

Differential Revision: https://reviews.llvm.org/D87680
The file was modifiedopenmp/runtime/test/ompt/tasks/dependences_mutexinoutset.c
The file was modifiedopenmp/runtime/src/kmp_gsupport.cpp
The file was modifiedopenmp/runtime/src/kmp_taskdeps.cpp
The file was modifiedopenmp/runtime/test/ompt/tasks/taskwait-depend.c
The file was modifiedopenmp/runtime/src/ompt-specific.h
The file was addedopenmp/runtime/test/ompt/tasks/task_if0-depend.c
Commit 21cf2e6c263d7a50654653bce4e83ab463fae580 by Akira
Handle unknown OSes in DarwinTargetInfo::getExnObjectAlignment

rdar://problem/69727650
The file was modifiedclang/lib/Basic/Targets/OSTargets.h
The file was modifiedclang/test/SemaCXX/warn-overaligned-type-thrown.cpp
Commit 66d2e3f495948412602db4507359b4612639e523 by saghir
[PowerPC] Add outer product instructions for MMA

This patch adds outer product instructions for MMA, including related infrastructure, and their tests.

Depends on D84968.

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D88043
The file was modifiedllvm/test/MC/PowerPC/ppc64-encoding-ISA31.s
The file was modifiedllvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
The file was modifiedllvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
The file was modifiedllvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.h
The file was modifiedllvm/lib/Target/PowerPC/PPCInstrPrefix.td
The file was modifiedllvm/test/MC/Disassembler/PowerPC/ppc64-encoding-ISA31.txt
The file was modifiedllvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp
Commit b23916504a1a9f29c7519ed83813774eecce1789 by craig.topper
Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes (bug 34579)

Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes to behave correctly in the case that the size of the significand is a multiple of the width of the integerParts making up the significand.

The patch to IEEEFloat::isSignificandAllOnes fixes bug 34579, and the patch to IEEE:Float:isSignificandAllZeros fixes the unit test "APFloatTest.x87Next" I added here. I have included both in this diff since the changes are very similar.

Patch by Andrew Briand
The file was modifiedllvm/lib/Support/APFloat.cpp
The file was modifiedllvm/unittests/ADT/APFloatTest.cpp
Commit 23419bfd1c8f26617bda47e6d4732dcbfe0c09a3 by protze
[OpenMP][libarcher] Allow all possible argument separators in TSAN_OPTIONS

Currently, the parser used to tokenize the TSAN_OPTIONS in libomp uses
only spaces as separators, even though TSAN in compiler-rt supports
other separators like ':' or ','.
CTest uses ':' to separate sanitizer options by default.
The documentation for other sanitizers mentions ':' as separator,
but TSAN only lists spaces, which is probably where this mismatch originated.

Patch provided by  upsj

Differential Revision: https://reviews.llvm.org/D87144
The file was addedopenmp/tools/archer/tests/parallel/parallel-nosuppression.c
The file was modifiedopenmp/tools/archer/tests/parallel/parallel-simple.c
The file was modifiedopenmp/tools/archer/tests/lit.cfg
The file was modifiedopenmp/tools/archer/ompt-tsan.cpp
Commit e4f50e587f077c246b7f29db0b7daddf583e2b64 by ranjeet.singh
[ARM] Add missing target for Arm neon test case.

This is a follow-up from https://reviews.llvm.org/D61717. Where Richard
described the issue with compiling arm_neon.h under
-flax-vector-conversions=none. It looks like the example reproducer does
actually work but what was missing was a test entry for that target.

Differential Revision: https://reviews.llvm.org/D88546
The file was modifiedclang/test/Headers/arm-neon-header.c
Commit bc43ddf42fff5a43f23354e25a32aca19541fec5 by Jessica Paquette
[AArch64][GlobalISel] NFC: Refactor G_FCMP selection code

Refactor this so it's similar to the existing integer comparison code.

Also add some missing 64-bit testcases to select-fcmp.mir.

Refactoring to prep for improving selection for G_FCMP-related conditional
branches etc.

Differential Revision: https://reviews.llvm.org/D88614
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/select-fcmp.mir
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
Commit d689570d7dcb16ee241676e22324dc456837eb23 by Jonas Devlieghere
[lldb] Make TestGuiBasicDebug more lenient

Matt's change to the register allocator in 89baeaef2fa9 changed where we
end up after the `finish`. Before we'd end up on line 4.

* thread #1, queue = 'com.apple.main-thread', stop reason = step out
Return value: (int) $0 = 1
    frame #0: 0x0000000100003f7d a.out`main(argc=1, argv=0x00007ffeefbff630) at main.c:4:3
   1    extern int func();
   2
   3    int main(int argc, char **argv) {
-> 4      func(); // Break here
   5      func(); // Second
   6      return 0;
   7    }

Now, we end up on line 5.

* thread #1, queue = 'com.apple.main-thread', stop reason = step out
Return value: (int) $0 = 1

    frame #0: 0x0000000100003f8d a.out`main(argc=1, argv=0x00007ffeefbff630) at main.c:5:3
   2
   3    int main(int argc, char **argv) {
   4      func(); // Break here
-> 5      func(); // Second
   6      return 0;
   7    }

Given that this is not expected stable to be stable I've made the test a
bit more lenient to accept both scenarios.
The file was modifiedlldb/test/API/commands/gui/basicdebug/TestGuiBasicDebug.py
Commit e24f0ac7a389fcb5c2f5295e717d9f7d3fcd4cea by pklausler
[flang] Allow record advancement in external formatted sequential READ

The '/' control edit descriptor causes a runtime crash for an
external formatted sequential READ because the AdvanceRecord()
member function for external units implemented only the tasks
to finish reading the current record.  Split those out into
a new FinishReadingRecord() member function, call that instead
from EndIoStatement(), and change AdvanceRecord() to both
finish reading the current record and to begin reading the next
one.

Differential revision: https://reviews.llvm.org/D88607
The file was modifiedflang/runtime/unit.h
The file was modifiedflang/runtime/io-stmt.h
The file was modifiedflang/runtime/io-stmt.cpp
The file was modifiedflang/runtime/unit.cpp
Commit 4ab45cc2260d87f18e1b05517d5d366b2e754b72 by Amara Emerson
[AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE.

Also use this opportunity start to clean up the mess of vector type lists we
have in the LegalizerInfo. Unfortunately since the legalizer rule builders require
std::initializer_list objects as parameters we can't programmatically generate the
type lists.
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-freeze.mir
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
Commit 460dda071e091df3b5584f21954c9209e7334c50 by aeubanks
[WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass

The legacy pass's default constructor sets UseCommandLine = true and
goes down a separate testing route. Match that in the NPM pass.

This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D88588
The file was modifiedllvm/test/Transforms/WholeProgramDevirt/import.ll
The file was modifiedllvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
The file was modifiedllvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
The file was modifiedllvm/lib/Passes/PassRegistry.def
Commit 93a1fc2e18b452216be70f534da42f7702adbe1d by Amara Emerson
Try to fix build. May have used a C++ feature too new/not supported on all platforms.
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
Commit 3c45a06f26edfb7e94003adf58cb8951ea9c2ce6 by sbc
[lld][WebAssembly] Allow exporting of mutable globals

In particular allow explict exporting of `__stack_pointer` but
exclud this from `--export-all` to avoid requiring the mutable
globals feature whenenve `--export-all` is used.

This uncovered a bug in populateTargetFeatures regarding checking
if the mutable-globals feature is allowed.

See: https://github.com/WebAssembly/binaryen/issues/2934

Differential Revision: https://reviews.llvm.org/D88506
The file was addedlld/test/wasm/mutable-global-exports.s
The file was modifiedlld/test/wasm/mutable-globals.s
The file was modifiedlld/wasm/Writer.cpp
The file was modifiedlld/docs/WebAssembly.rst
Commit d4e889f1f5723105dbab12b749503d2462eb1755 by stellaraccident
Remove `Ops` suffix from dialect library names

Dialects include more than just ops, so this suffix is outdated. Follows
discussion in
https://llvm.discourse.group/t/rfc-canonical-file-paths-to-dialects/621

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D88530
The file was modifiedmlir/lib/Dialect/StandardOps/CMakeLists.txt
The file was modifiedmlir/lib/ExecutionEngine/CMakeLists.txt
The file was modifiedmlir/lib/CAPI/Standard/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/SCFToSPIRV/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/LinalgToStandard/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/SCFToGPU/CMakeLists.txt
The file was modifiedmlir/test/lib/Transforms/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/StandardToSPIRV/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Vector/CMakeLists.txt
The file was modifiedmlir/docs/Tutorials/CreatingADialect.md
The file was modifiedmlir/lib/Conversion/LinalgToLLVM/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/SCF/Transforms/CMakeLists.txt
The file was modifiedmlir/lib/Analysis/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Affine/Utils/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/StandardOps/Transforms/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/EDSC/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Affine/IR/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Quant/CMakeLists.txt
The file was modifiedmlir/lib/Transforms/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/IR/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/GPU/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Shape/IR/CMakeLists.txt
The file was modifiedmlir/test/lib/Dialect/Test/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Affine/EDSC/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/Analysis/CMakeLists.txt
The file was modifiedmlir/lib/Transforms/Utils/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Affine/Transforms/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/GPUToVulkan/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/SCF/CMakeLists.txt
The file was modifiedmlir/test/EDSC/CMakeLists.txt
The file was modifiedflang/lib/Lower/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/AffineToStandard/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/Utils/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/GPUToSPIRV/CMakeLists.txt
The file was modifiedmlir/lib/Conversion/LinalgToSPIRV/CMakeLists.txt
Commit 4fb679d3b159f0a5e4ff87f4e7ecf44fbbf331b9 by pklausler
[flang] Fix Gw.d format output

The estimation of the decimal exponent needs to allow for all
'd' of the requested significant digits.

Also accept a plus sign on a "+kP" scaling factor in a format.

Differential revision: https://reviews.llvm.org/D88618
The file was modifiedflang/runtime/edit-output.cpp
The file was modifiedflang/runtime/format-implementation.h
Commit f0505534900bb1fcdee368136cd733aefd20ce39 by riddleriver
[mlir] Split Dialect::addOperations into two functions

The current implementation uses a fold expression to add all of the operations at once. This is really nice, but apparently the lifetime of each of the AbstractOperation instances is for the entire expression which may lead to a stack overflow for large numbers of operations. This splits the method in two to allow for the lifetime of the AbstractOperation to be properly scoped.
The file was modifiedmlir/include/mlir/IR/Dialect.h
Commit 196c097bba8b0b3932f3fcdcd5310f78ebaa43a3 by Amara Emerson
[AArch64][GlobalISel] Clamp oversize FP arithmetic vectors.
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-fp-arith.mir