1. [ARM] Fold predicate_cast(load) into vldr p0 (details)
  2. [X86] Make lowerShuffleAsLanePermuteAndPermute use sublanes on AVX2 (details)
  3. CallingConvLower.h - remove unnecessary MachineFunction.h include. NFC. (details)
  4. [modules] Correctly parse LateParsedTemplates in case of dependent modules. (details)
  5. [lldb][NFC] Rewrite CPP11EnumTypes test to make it faster (details)
  6. Fix typos in doc LangRef.rst (details)
  7. [Test] Range fix in test (details)
Commit 294c0cc3ebad969819be4b5b8d091418b0704595 by
[ARM] Fold predicate_cast(load) into vldr p0

This adds a simple tablegen pattern for folding predicate_cast(load)
into vldr p0, providing the alignment and offset are correct.

Differential Revision:
The file was modifiedllvm/lib/Target/ARM/
The file was modifiedllvm/test/CodeGen/Thumb2/mve-pred-loadstore.ll
Commit 740625fecd1a4cd8e5521bd1c98627eca6f7565d by llvm-dev
[X86] Make lowerShuffleAsLanePermuteAndPermute use sublanes on AVX2

Extends lowerShuffleAsLanePermuteAndPermute to search for opportunities to use vpermq (64-bit cross-lane shuffle) and vpermd (32-bit cross-lane shuffle) to get elements into the correct lane, in addition to the 128-bit full-lane permutes it previously searched for.

This is especially helpful in cross-lane byte shuffles, where the alternative tends to be "vpshufb both lanes separately and blend them with a vpblendvb", which is very expensive, especially on Haswell where vpblendvb uses the same execution port as all the shuffles.

Addresses PR47262

Patch By: @TellowKrinkle (TellowKrinkle)

Differential Revision:
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/oddshuffles.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-512-v32.ll
Commit 7582c5c023a8d6bff224e80dc5ded916122d8c99 by llvm-dev
CallingConvLower.h - remove unnecessary MachineFunction.h include. NFC.

Reduce to forward declaration, add the Register.h include that we still needed, move CCState::ensureMaxAlignment into CallingConvLower.cpp as it was the only function that needed the full definition of MachineFunction.

Fix a few implicit dependencies further down.
The file was modifiedllvm/lib/CodeGen/CallingConvLower.cpp
The file was modifiedllvm/lib/Target/Mips/MipsCallLowering.h
The file was modifiedllvm/include/llvm/CodeGen/GlobalISel/CallLowering.h
The file was modifiedllvm/include/llvm/CodeGen/CallingConvLower.h
Commit 2c9dbcda4f71497d4a58020bb093af438fb6e967 by v.g.vassilev
[modules] Correctly parse LateParsedTemplates in case of dependent modules.

While parsing LateParsedTemplates, Clang assumes that the Global DeclID matches
with the Local DeclID of a Decl. This is not the case when we have multiple
dependent modules , each having their own LateParsedTemplate section. In such a
case, a Local/Global DeclID confusion occurs which leads to improper casting of

This commit creates a Vector to map the LateParsedTemplate section of each
Module with their module file and therefore resolving the Global/Local DeclID

Reviewed By: rsmith

Differential Revision:
The file was modifiedclang/include/clang/Serialization/ASTReader.h
The file was modifiedclang/lib/Serialization/ASTReader.cpp
Commit 101f37a1b330e3f0ae57762db47bba28f72cf50d by Raphael Isemann
[lldb][NFC] Rewrite CPP11EnumTypes test to make it faster

TestCPP11EnumTypes is one of the most expensive tests on my system and takes
around 35 seconds to run. A relatively large amount of that time is actually
doing CPU intensive work it seems (and not waiting on timeouts like other
slow tests).

The main issue is that this test repeatedly compiles the same source files
with different compiler defines. The test is also including standard library
headers, so it will also build all system modules with the gmodules debug
info variant. This leads to the problem that this test ends up compiling all
system Clang modules 8 times (one for each subtest with a unique define). As
the system modules are quite large, this causes that this test spends most
of its runtime just recompiling all system modules on macOS.

There is also the small issue that this test is starting and start-stopping
the test process a few hundred times.

This rewrites the test to instead just use a macro to instantiate all the
enum types in a single source and uses global variables to test the values
(which means there is no more need to continue/stop or even start a process).

I kept running all the debug info variants (event though it doesn't seem really
relevant) to keep this as NFC as possible.

This reduced the test runtime by around 1.5 seconds on my system (or in relative
numbers, the runtime of this test decreases by 95%).
The file was modifiedlldb/test/API/lang/cpp/enum_types/main.cpp
The file was modifiedlldb/test/API/lang/cpp/enum_types/
Commit 691d436685fa2394b088a9e4726c075027ac9c51 by Vitaly Buka
Fix typos in doc LangRef.rst

Reviewed By: vitalybuka

Differential Revision:
The file was modifiedllvm/docs/LangRef.rst
Commit 8784e9016d3d586dca90d6dd24fe663ce2e096ae by mkazantsev
[Test] Range fix in test

test02_neg is not testing what it claims to test because its starting
value -1 lies outside of specified range.
The file was modifiedllvm/test/Transforms/IndVarSimplify/monotonic_checks.ll