Changes

Summary

  1. [InstCombine] add/adjust tests for min/max intrinsics; NFC (details)
  2. [X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing (details)
  3. [NFC] combineX86ShufflesRecursively(): actually address nits for previous patch (details)
  4. [X86] lowerShuffleAsDecomposedShuffleMerge(): if both inputs are broadcastable/identities, canonicalize broadcasts as such (details)
  5. [X86][TLI] SimplifyDemandedVectorEltsForTargetNode(): don't break apart broadcasts from which not just the 0'th elt is demanded (details)
  6. [X86][Atom] Specific uops for all IMUL/IDIV instructions (details)
Commit 9555d1edb0d16f135ae57695fc2da55deaabf082 by spatel
[InstCombine] add/adjust tests for min/max intrinsics; NFC

If we transform these, we have to propagate no-wrap/undef carefully.
The file was modifiedllvm/test/Transforms/InstCombine/minmax-intrinsics.ll
Commit 1e72ca94e5796a744d0e1a8871c33b1b4edb0acb by lebedev.ri
[X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing

This was suggested in https://reviews.llvm.org/D108382#inline-1039018,
and it avoids regressions in that patch.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109065
The file was modifiedllvm/test/CodeGen/X86/vselect.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/oddshuffles.ll
Commit 0852313e47836152b00fb8b8fd62a7e12bf92abd by lebedev.ri
[NFC] combineX86ShufflesRecursively(): actually address nits for previous patch
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 07f1d8f0caa1516e0d97616adfea4aa94f7883a4 by lebedev.ri
[X86] lowerShuffleAsDecomposedShuffleMerge(): if both inputs are broadcastable/identities, canonicalize broadcasts as such

Split off from D108253.
Broadcast is simpler than any other shuffle we might produce
to do what we want to do here, so prefer it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108382
The file was modifiedllvm/test/CodeGen/X86/horizontal-sum.ll
The file was modifiedllvm/test/CodeGen/X86/copy-low-subvec-elt-to-high-subvec-elt.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit 5f2fe48d06c742872804da8b3d86596ed2bb9acb by lebedev.ri
[X86][TLI] SimplifyDemandedVectorEltsForTargetNode(): don't break apart broadcasts from which not just the 0'th elt is demanded

Apparently this has no test coverage before D108382,
but D108382 itself shows a few regressions that this fixes.

It doesn't seem worthwhile breaking apart broadcasts,
assuming we want the broadcasted value to be preset in several elements,
not just the 0'th one.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108411
The file was modifiedllvm/test/CodeGen/X86/horizontal-sum.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/sse41.ll
The file was modifiedllvm/test/CodeGen/X86/copy-low-subvec-elt-to-high-subvec-elt.ll
Commit cf8fac7d07307bc6679d60c3ad3e7a7792a2caa6 by llvm-dev
[X86][Atom] Specific uops for all IMUL/IDIV instructions

Based off a mixture of llvm-exegesis captures (PR36895) and Intel AoM / Agner / InstLatX64 reports.
The file was modifiedllvm/lib/Target/X86/X86ScheduleAtom.td
The file was modifiedllvm/test/tools/llvm-mca/X86/Atom/resources-x86_64.s