1. [CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972) (details)
  2. [InstCombine] move add after min/max intrinsic (details)
Commit 3538ee763d13a26515a224ddeb3a51a5af143e38 by llvm-dev
[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972)

Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports
The file was modifiedllvm/test/Analysis/CostModel/X86/cast.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-overflow.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/arith-fix.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/trunc.ll
Commit 6063e6b499c7829b941f94456af493a1ecb93ea1 by spatel
[InstCombine] move add after min/max intrinsic

This is another regression noted with the proposal to canonicalize
to the min/max intrinsics in D98152.

Here are Alive2 attempts to show correctness without specifying
exact constants: (smax) (smin) (umax) (umin)
(if you comment out the assume and/or no-wrap, you should see failures)

The different output for the umin test is due to a fold added with
c4fc2cb5b2d98125 :

// umin(x, 1) == zext(x != 0)

We probably want to adjust that, so it applies more generally
(umax --> sext or patterns where we can fold to select-of-constants).
Some folds that were ok when starting with cmp+select may increase
instruction count for the equivalent intrinsic, so we have to decide
if it's worth altering a min/max.

Differential Revision:
The file was modifiedllvm/test/Transforms/InstCombine/minmax-intrinsics.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineCalls.cpp