1. [X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases (details)
  2. [InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322) (details)
  3. [ARM] Recognize "double extend" reduction patterns (details)
  4. [InstCombine][X86] getNegativeIsTrueBoolVec - use ConstantExpr evaluators. NFCI. (details)
  5. [Intrinsics] define semantics for experimental fmax/fmin vector reductions (details)
  6. [ARM] Fixup single source mla reductions. (details)
  7. [InstCombine][X86] Add tests for masked load/stores with comparisons. (details)
  8. Reland [AssumeBundles] Use operand bundles to encode alignment assumptions (details)
  9. [MachineScheduler] Fix operand scheduling for pre/post-increment loads (details)
  10. [Clang] Add option to allow marking pass-by-value args as noalias. (details)
Commit 35dc91aee2013ce1a57dfee965fa5fdee1987ee0 by llvm-dev
[X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases

Follow up to D86429 to handle the remaining regressions.

This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered.

For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better.

Differential Revision:
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-128-v16.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-512-v32.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
Commit 36e2e2e12efb6b02ad07f502d61b9a95937edb08 by nikita.ppv
[InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322)

This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.

The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:

* Export the SimplifyWithOpReplaced API from InstSimplify, with an
   added AllowRefinement flag. For replacements inside the TrueVal
   we don't actually care whether refinement occurs or not, the
   replacement is always legal. This part of the transform is now
   done in InstSimplify only. (It should be noted that the current
   AllowRefinement check is not sufficient -- that's an issue we
   need to address separately.)
* Change the InstCombine fold to work by temporarily dropping
   poison generating flags, running the fold and then restoring the
   flags if it didn't work out. This will ensure that the InstCombine
   fold is correct as long as the InstSimplify fold is correct.

Differential Revision:
The file was modifiedllvm/test/Transforms/InstCombine/select.ll
The file was modifiedllvm/lib/Analysis/InstructionSimplify.cpp
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
The file was modifiedllvm/include/llvm/Analysis/InstructionSimplify.h
Commit c437446d90be17c3fe8a216a90ee442222f2fe9d by
[ARM] Recognize "double extend" reduction patterns

We can sometimes get code that does:
  xe = zext i16 x to i32
  ye = zext i16 y to i32
  m = mul i32 xe, ye
  me = zext i32 m to i64
  r = vecreduce.add(me)
This "double extend" can trip up the reduction identification, but
should give identical results.

This extends the pattern matching to handle them.

Differential Revision:
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-mla.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-mlapred.ll
Commit 50ee0b99ec2902f5cf7a62a5e9b4a4f882b17031 by llvm-dev
[InstCombine][X86] getNegativeIsTrueBoolVec - use ConstantExpr evaluators. NFCI.

Don't do this manually, we can just use the ConstantExpr evaluators to do it more tidily for us.
The file was modifiedllvm/lib/Target/X86/X86InstCombineIntrinsic.cpp
Commit 3a8ea8609b82b7e5401698b7c63df6680e1257a8 by spatel
[Intrinsics] define semantics for experimental fmax/fmin vector reductions

As discussed on llvm-dev:

This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.

No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.

There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.

Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.

We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.

(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)

x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: The expansion
sequence before this patch may not have been correct.

Differential Revision:
The file was modifiedllvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
The file was modifiedllvm/test/CodeGen/Generic/expand-experimental-reductions.ll
The file was modifiedllvm/docs/LangRef.rst
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-fmin.ll
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
The file was modifiedllvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
The file was modifiedllvm/lib/Target/ARM/ARMTargetTransformInfo.h
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
The file was modifiedllvm/include/llvm/CodeGen/BasicTTIImpl.h
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
The file was modifiedllvm/lib/CodeGen/ExpandReductions.cpp
The file was modifiedllvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll
The file was modifiedllvm/test/CodeGen/AArch64/vecreduce-fmax-legalization-nan.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-fmax.ll
Commit 6cfd38d03d5fc3cde929ebf82529415595e8ef8e by
[ARM] Fixup single source mla reductions.

This fixes a complication on top of D87276. If we are sign extending
around a mul with the two operands that are the same, instcombine will
helpfully convert one of the sext to a zext. Reverse that so that we
again generate a reduction.

Differnetial Revision:
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-mla.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-vecreduce-mlapred.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
Commit d030aad7893a8cf7a68877b8b55eed1cd632411a by llvm-dev
[InstCombine][X86] Add tests for masked load/stores with comparisons.

As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything.
The file was modifiedllvm/test/Transforms/InstCombine/X86/x86-masked-memops.ll
Commit 78de7297abe2e8fa782682168989c70e3cb34a5c by tyker
Reland [AssumeBundles] Use operand bundles to encode alignment assumptions

NOTE: There is a mailing list discussion on this:

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
The file was modifiedllvm/lib/IR/Verifier.cpp
The file was modifiedllvm/include/llvm/Transforms/Scalar/AlignmentFromAssumptions.h
The file was modifiedclang/test/OpenMP/target_teams_distribute_parallel_for_simd_codegen.cpp
The file was modifiedllvm/unittests/Analysis/AssumeBundleQueriesTest.cpp
The file was modifiedclang/test/CodeGen/alloc-align-attr.c
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-openmp.cpp
The file was modifiedclang/test/CodeGen/align_value.cpp
The file was modifiedllvm/test/Transforms/AlignmentFromAssumptions/simple.ll
The file was modifiedllvm/lib/IR/IRBuilder.cpp
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-attribute-align_value-on-paramvar.cpp
The file was modifiedllvm/test/Transforms/Inline/byref-align.ll
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-attribute-alloc_align-on-function-variable.cpp
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-two-params.cpp
The file was modifiedllvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-three-params-variable.cpp
The file was modifiedclang/test/CodeGen/builtin-assume-aligned.c
The file was modifiedllvm/test/Transforms/AlignmentFromAssumptions/simple32.ll
The file was modifiedclang/test/CodeGen/assume-aligned-and-alloc-align-attributes.c
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-attribute-assume_aligned-on-function-two-params.cpp
The file was modifiedclang/test/OpenMP/simd_codegen.cpp
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-attribute-assume_aligned-on-function.cpp
The file was modifiedclang/test/CodeGen/non-power-of-2-alignment-assumptions.c
The file was modifiedllvm/test/Verifier/assume-bundles.ll
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-attribute-alloc_align-on-function.cpp
The file was modifiedllvm/lib/Analysis/AssumeBundleQueries.cpp
The file was modifiedclang/test/CodeGen/builtin-align-array.c
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
The file was modifiedllvm/test/Transforms/InstCombine/assume.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll
The file was modifiedclang/test/CodeGen/builtin-align.c
The file was modifiedclang/test/OpenMP/simd_metadata.c
The file was modifiedllvm/include/llvm/IR/IRBuilder.h
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-attribute-align_value-on-lvalue.cpp
The file was modifiedclang/lib/CodeGen/CodeGenFunction.cpp
The file was modifiedclang/test/CodeGen/catch-alignment-assumption-builtin_assume_aligned-three-params.cpp
The file was modifiedllvm/test/Transforms/Inline/align.ll
Commit 2e61cd1295e0031b2379af2b65373e2798a551cb by eleviant
[MachineScheduler] Fix operand scheduling for pre/post-increment loads

Differential revision:
The file was modifiedllvm/lib/Target/AArch64/
The file was modifiedllvm/test/tools/llvm-mca/AArch64/Exynos/load.s
Commit a874d63344093752c912d01de60211f65745ea6f by flo
[Clang] Add option to allow marking pass-by-value args as noalias.

After the recent discussion on cfe-dev 'Can indirect class parameters be
noalias?' [1], it seems like using using noalias is problematic for
current C++, but should be allowed for C-only code.

This patch introduces a new option to let the user indicate that it is
safe to mark indirect class parameters as noalias. Note that this also
applies to external callers, e.g. it might not be safe to use this flag
for C functions that are called by C++ functions.

In targets that allocate indirect arguments in the called function, this
enables more agressive optimizations with respect to memory operations
and brings a ~1% - 2% codesize reduction for some programs.

[1] :

Reviewed By: rjmccall

Differential Revision:
The file was addedclang/test/CodeGen/pass-by-value-noalias.c
The file was addedclang/test/CodeGenObjC/pass-by-value-noalias.m
The file was modifiedclang/include/clang/Driver/
The file was modifiedclang/lib/Frontend/CompilerInvocation.cpp
The file was modifiedclang/include/clang/Basic/CodeGenOptions.def
The file was modifiedclang/lib/CodeGen/CGCall.cpp
The file was addedclang/test/CodeGenCXX/pass-by-value-noalias.cpp