SuccessChanges

Summary

  1. [MSan] Tweak CopyOrigin (details)
  2. [gn build] (manually) port 79f99ba65d96 (details)
  3. [mlir][Python] Add checking process before create an AffineMap from a permutation. (details)
  4. [X86][AMX] Prohibit pointer cast on load. (details)
  5. [Coroutine] Update promise object's final layout index (details)
  6. [PDB] Defer relocating .debug$S until commit time and parallelize it (details)
Commit 82655c151450e0103a3aa60725639da607f9220c by jianzhouzh
[MSan] Tweak CopyOrigin

There could be some mis-alignments when copying origins not aligned.

I believe inaligned memcpy is rare so the cases do not matter too much
in practice.

1) About the change at line 50

Let dst be (void*)5,
then d=5, beg=4
so we need to write 3 (4+4-5) bytes from 5 to 7.

2) About the change around line 77.

Let dst be (void*)5,
because of lines 50-55, the bytes from 5-7 were already writen.
So the aligned copy is from 8.

Reviewed-by: eugenis
Differential Revision: https://reviews.llvm.org/D94552
The file was modifiedcompiler-rt/lib/msan/msan_poisoning.cpp
Commit 25b3921f2fcd8fb3241c2f79e488f25a6374b99f by thakis
[gn build] (manually) port 79f99ba65d96
The file was modifiedllvm/utils/gn/secondary/libcxx/include/BUILD.gn
Commit c0f3ea8a08ca9a9ec473f6e9072ccf30dad5def8 by zhanghb97
[mlir][Python] Add checking process before create an AffineMap from a permutation.

An invalid permutation will trigger a C++ assertion when attempting to create an AffineMap from the permutation.
This patch adds an `isPermutation` function to check the given permutation before creating the AffineMap.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D94492
The file was modifiedmlir/lib/Bindings/Python/IRModules.cpp
The file was modifiedmlir/test/Bindings/Python/ir_affine_map.py
Commit 055644cc459eb204613ac788b73c51d5dab2fcbb by yuanke.luo
[X86][AMX] Prohibit pointer cast on load.

The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.

Differential Revision: https://reviews.llvm.org/D94372
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
The file was addedllvm/test/Transforms/InstCombine/X86/x86-amx-load-store.ll
Commit 5c7dcd7aead7b33ba065b98ab3573278feb42228 by Yuanfang Chen
[Coroutine] Update promise object's final layout index

promise is a header field but it is not guaranteed that it would be the third
field of the frame due to `performOptimizedStructLayout`.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D94137
The file was modifiedllvm/lib/Transforms/Coroutines/CoroFrame.cpp
The file was addedllvm/test/Transforms/Coroutines/coro-spill-promise.ll
Commit 6529d7c5a45b1b9588e512013b02f891d71bc134 by rnk
[PDB] Defer relocating .debug$S until commit time and parallelize it

This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.

This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.

Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1:  0m30.844s -> 0m32.094s (slightly slower)
wall-j3:  0m20.968s -> 0m20.312s (slightly faster)
wall-j8:  0m19.062s -> 0m17.672s (meaningfully faster)

I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.

Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.

Differential Revision: https://reviews.llvm.org/D94267
The file was modifiedlld/COFF/Chunks.cpp
The file was modifiedllvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp
The file was modifiedlld/COFF/PDB.cpp
The file was modifiedllvm/include/llvm/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.h
The file was modifiedllvm/lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp
The file was modifiedlld/COFF/Chunks.h