SuccessChanges

Summary

  1. [SelectionDAG] Fix argument copy elision with irregular types (details)
  2. Reland [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again (details)
  3. Reland [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost() (details)
  4. [CVP] Add test for PR50399 (NFC) (details)
Commit fd5cc418186ab0fc0650ec373fdf016101eba21d by thatlemon
[SelectionDAG] Fix argument copy elision with irregular types

D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102153
The file was modifiedllvm/test/CodeGen/X86/arg-copy-elide.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Commit 05a4e4a89c6b6dc6e3edfb5efb9ddc950ae47469 by lebedev.ri
Reland [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again

Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.

Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.

This was initially landed in c02476f3158f2908ef0a6f628210b5380bd33695,
but reverted in 5fddc3312bad7e62493f1605385fad5e589e6450,
because the code made some very optimistic assumptions about invariants
that didn't hold in practice.

Reviewed By: RKSimon, ABataev

Differential Revision: https://reviews.llvm.org/D100684
The file was modifiedllvm/test/Analysis/CostModel/X86/load_store.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Commit 8ed0864fd76ded2646b33de8fc610519dd7f1eb5 by lebedev.ri
Reland [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()

Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.

This was initially landed in 69ed93a4355123a45c1d7216aea7cd53d07a361b,
but was reverted in 6b95fd199d96e3ba5c28a23b17b74203522bdaa8
because the patch it depends on was reverted.
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit 069174a6349b18a05b7d48b09a8f8b113b402aae by nikita.ppv
[CVP] Add test for PR50399 (NFC)
The file was modifiedllvm/test/Transforms/CorrelatedValuePropagation/phi-common-val.ll