1. [X86] Don't match x87 register inline asm constraints unless the VT is floating point or its a clobber (details)
  2. [VectorCombine] limit load+insert transform to one-use (details)
  3. [AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal for shifts. (details)
  4. [AArch64][GlobalISel] Widen G_EXTRACT_VECTOR_ELT element types if < 8b. (details)
  5. [PDB] Split TypeServerSource and extend type index map lifetime (details)
  6. [SVE][WIP] Implement lowering for fixed length VSELECT to Scalable (details)
  7. [IRSim] Adding IR Instruction Mapper (details)
  8. [gn build] Port 7e4c6fb8546 (details)
  9. AArch64::ArchKind's underlying type is uint64_t (details)
  10. [Lsan] Use fp registers to search for pointers (details)
  11. Disable hoisting MI to hotter basic blocks when using pgo (details)
  12. [SCEV] Add test cases for max BTC with loop guard info. (details)
  13. [GVN] Add additional assume tests (NFC) (details)
  14. [GVN] Use that assume(!X) implies X==false (PR47496) (details)
  15. [LoopUnrollAndJam] Allow unroll and jam loops forced by user. (details)
  16. [InstCombine] Canonicalize SPF_ABS to abs intrinc (details)
Commit 3783d3bc7b3dd966ac3b9436b73f16f855d12ff2 by craig.topper
[X86] Don't match x87 register inline asm constraints unless the VT is floating point or its a clobber

The register class picked will be the RFP80 register class which has a f80 VT. The code in SelectionDAGBuilder that generates copies around inline assembly doesn't know how to handle an integer and floating point type of different bit widths.

The test case is derived from this which gcc accepts but clang crashes on. This patch just gives a more graceful error. I'm not sure if the single element struct case is special in gcc. Adding another field to the struct makes gcc reject it. If we want to support this correctly I think we need a change in the frontend to give us the true element type. Right now the frontend just realizes the constraint can take a memory argument so creates an integer type of the same size and bitcasts.

Differential Revision:
The file was addedllvm/test/CodeGen/X86/asm-reject-x87-int.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 48a23bccf3732e1480ad169bd4a08a68bb100bfa by spatel
[VectorCombine] limit load+insert transform to one-use

As discussed in:
...there are several potential fixes/follow-ups visible
in the test case, but this is the quickest and safest
fix of the perf regression.
The file was modifiedllvm/test/Transforms/VectorCombine/X86/load.ll
The file was modifiedllvm/lib/Transforms/Vectorize/VectorCombine.cpp
Commit bea7749d0364a8c694f236a97d58167a33efdb9e by Amara Emerson
[AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal for shifts.
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-shift.mir
Commit 7d5b10348371644c69041965b9864886e9961ddd by Amara Emerson
[AArch64][GlobalISel] Widen G_EXTRACT_VECTOR_ELT element types if < 8b.

In order to not unnecessarily promote the source vector to greater than our
native vector size of 128b, I've added some cascading rules to widen based on
the number of elements.
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/regbank-extract-vector-elt.mir
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalize-extract-vector-elt.mir
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
Commit 1e5b7e91aa64c267e495cb4bd8351b1840694437 by rnk
[PDB] Split TypeServerSource and extend type index map lifetime

Extending the lifetime of these type index mappings does increase memory
usage (+2% in my case), but it decouples type merging from symbol
merging. This is a pre-requisite for two changes that I have in mind:
- parallel type merging: speeds up slow type merging
- defered symbol merging: avoid heap allocating (relocating) all symbols

This eliminates CVIndexMap and moves its data into TpiSource. The maps
are also split into a SmallVector and ArrayRef component, so that the
ipiMap can alias the tpiMap for /Z7 object files, and so that both maps
can simply alias the PDB type server maps for /Zi files.

Splitting TypeServerSource establishes that all input types to be merged
can be identified with two 32-bit indices:
- The index of the TpiSource object
- The type index of the record
This is useful, because this information can be stored in a single
64-bit atomic word to enable concurrent hashtable insertion.

One last change is that now all object files with debugChunks get a
TpiSource, even if they have no type info. This avoids some null checks
and special cases.

Differential Revision:
The file was modifiedlld/COFF/DebugTypes.cpp
The file was modifiedlld/COFF/DebugTypes.h
The file was modifiedlld/COFF/InputFiles.cpp
The file was modifiedlld/COFF/PDB.cpp
The file was modifiedlld/COFF/TypeMerger.h
Commit a35c7f30769b4bc3745796af58c932f303a014e1 by mcinally
[SVE][WIP] Implement lowering for fixed length VSELECT to Scalable

Map fixed length VSELECT to its Scalable equivalent.

Differential Revision:
The file was addedllvm/test/CodeGen/AArch64/sve-fixed-length-int-select.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.h
The file was addedllvm/test/CodeGen/AArch64/sve-fixed-length-fp-select.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Commit 7e4c6fb854660318dc31ecb9842f6cfebb18c8e0 by andrew_litteken
[IRSim] Adding IR Instruction Mapper

This introduces the IRInstructionMapper, and the associated wrapper for
instructions, IRInstructionData, that maps IR level Instructions to
unsigned integers.

Mapping is done mainly by using the "isSameOperationAs" comparison
between two instructions.  If they return true, the opcode, result type,
and operand types of the instruction are used to hash the instruction
with an unsigned integer.  The mapper accepts instruction ranges, and
adds each resulting integer to a list, and each wrapped instruction to
a separate list.

At present, branches, phi nodes are not mapping and exception handling
is illegal.  Debug instructions are not considered.

The different mapping schemes are tested in

Recommit of: b04c1a9d3127730c05e8a22a0e931a12a39528df

Differential Revision:
The file was addedllvm/lib/Analysis/IRSimilarityIdentifier.cpp
The file was modifiedllvm/lib/Analysis/CMakeLists.txt
The file was addedllvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp
The file was modifiedllvm/unittests/Analysis/CMakeLists.txt
The file was addedllvm/include/llvm/Analysis/IRSimilarityIdentifier.h
Commit 667762c64e0b2925112037c197709402b4f2221d by llvmgnsyncbot
[gn build] Port 7e4c6fb8546
The file was modifiedllvm/utils/gn/secondary/llvm/lib/Analysis/
The file was modifiedllvm/utils/gn/secondary/llvm/unittests/Analysis/
Commit c145a1ca2593e3b8b79687d5ba8c3230c41b5130 by jonathan_roelofs
AArch64::ArchKind's underlying type is uint64_t
The file was modifiedllvm/include/llvm/Support/AArch64TargetParser.h
The file was modifiedllvm/lib/Support/AArch64TargetParser.cpp
The file was modifiedclang/lib/Driver/ToolChains/Arch/AArch64.cpp
The file was modifiedllvm/unittests/Support/TargetParserTest.cpp
Commit 5813fca1076089c835de737834955a0fe7eb3898 by Vitaly Buka
[Lsan] Use fp registers to search for pointers

X86 can use xmm registers for pointers operations. e.g. for std::swap.
I don't know yet if it's possible on other platforms.

NT_X86_XSTATE includes all registers from NT_FPREGSET so
the latter used only if the former is not available. I am not sure how
reasonable to expect that but LLD has such fallback in

Reviewed By: morehouse

Differential Revision:
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cpp
The file was modifiedcompiler-rt/test/lsan/TestCases/use_registers.cpp
The file was addedcompiler-rt/test/lsan/TestCases/use_registers_extra.cpp
Commit a4bb71b1c0d9952208ad32bc4992cc211d43c5bb by wei.huang
Disable hoisting MI to hotter basic blocks when using pgo

This is a follow up patch for to
enable the feature when using pgo.

Differential Revision:
The file was modifiedllvm/test/CodeGen/AArch64/O3-pipeline.ll
The file was modifiedllvm/test/CodeGen/ARM/O3-pipeline.ll
The file was modifiedllvm/lib/CodeGen/MachineLICM.cpp
The file was modifiedllvm/test/CodeGen/X86/opt-pipeline.ll
Commit 51973a607dfa4681037aff43e295f3ea1fb0f3f4 by flo
[SCEV] Add test cases for max BTC with loop guard info.

This adds test cases for PR40961 and PR47247. They illustrate cases in
which the max backedge-taken count can be improved by information from
the loop guards.
The file was addedllvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
Commit 59855b9d3bacc4321e3dd22ccf09bd9d177fdb6f by nikita.ppv
[GVN] Add additional assume tests (NFC)

The other assume tests seem to be dealing with equalities in
particular. Test implication for the condition itself, especially
the negated case from PR47496.
The file was addedllvm/test/Transforms/GVN/assume.ll
Commit 91ce8e121b7f24ef68fad0ab07f6ab7e1ee06855 by nikita.ppv
[GVN] Use that assume(!X) implies X==false (PR47496)

We already use that assume(X) implies X==true, do the same for
assume(!X) implying X==false. This fixes PR47496.
The file was modifiedllvm/lib/Transforms/Scalar/GVN.cpp
The file was modifiedllvm/test/Transforms/GVN/assume.ll
Commit 1cee33e9dbb6c30ff1dd30453a263696bfccfd8a by whitneyt
[LoopUnrollAndJam] Allow unroll and jam loops forced by user.

Summary: Allow unroll and jam loops forced by user.
LoopUnrollAndJamPass is still disabled by default in the NPM pipeline,
and can be controlled by -enable-npm-unroll-and-jam.

Reviewed By: Meinersbur, dmgreen

Differential Revision:
The file was modifiedllvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
The file was modifiedllvm/test/Transforms/LoopUnrollAndJam/pragma-explicit.ll
Commit 05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8 by nikita.ppv
[InstCombine] Canonicalize SPF_ABS to abs intrinc

Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.

To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.

It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance

Differential Revision:
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
The file was modifiedllvm/test/Transforms/InstCombine/abs_abs.ll
The file was modifiedllvm/test/Transforms/InstCombine/icmp.ll
The file was modifiedllvm/test/Transforms/InstCombine/select_meta.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
The file was modifiedclang/test/CodeGen/builtins-wasm.c
The file was modifiedllvm/test/Transforms/InstCombine/cttz-abs.ll
The file was modifiedllvm/test/Transforms/InstCombine/abs-1.ll
The file was modifiedllvm/test/Transforms/InstCombine/call-callconv.ll
The file was modifiedllvm/test/Transforms/InstCombine/max-of-nots.ll
The file was modifiedllvm/test/Transforms/InstCombine/sub-of-negatible.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/min-max-abs-cse.ll