FailedChanges

Summary

  1. [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res |= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 *p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > *((i32)p) = val; i8 *p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > *((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897
  2. [X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032
  3. [X86] Disable f32->f64 extload when sse2 is enabled Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710
  4. Do not derive no-recurse attribute if function does not have exact definition. This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336 Reviewers: jdoerfert Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D63045
  5. [NFC] Test if commit access granted.
  6. Make test not write to source directory
  7. [X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled.
  8. [X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same.
Revision 362921 by qshanz:
[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D62897
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cppllvm.src/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
The file was modified/llvm/trunk/test/CodeGen/PowerPC/store-combine.llllvm.src/test/CodeGen/PowerPC/store-combine.ll
Revision 362920 by ctopper:
[X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its.

Summary:
Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes.

This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix.

Reviewers: RKSimon, spatel, xbolva00

Reviewed By: xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63032
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/Target/X86/X86ISelLowering.cppllvm.src/lib/Target/X86/X86ISelLowering.cpp
The file was modified/llvm/trunk/test/CodeGen/X86/cmp.llllvm.src/test/CodeGen/X86/cmp.ll
Revision 362919 by ctopper:
[X86] Disable f32->f64 extload when sse2 is enabled

Summary:
We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize.

This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62710
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/Target/X86/X86ISelLowering.cppllvm.src/lib/Target/X86/X86ISelLowering.cpp
The file was modified/llvm/trunk/lib/Target/X86/X86InstrAVX512.tdllvm.src/lib/Target/X86/X86InstrAVX512.td
The file was modified/llvm/trunk/lib/Target/X86/X86InstrSSE.tdllvm.src/lib/Target/X86/X86InstrSSE.td
Revision 362918 by vivekvpandya:
Do not derive no-recurse attribute if function does not have exact definition.
This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336

Reviewers: jdoerfert
Reviewed by: jdoerfert

Differential Revision: https://reviews.llvm.org/D63045
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/Transforms/IPO/FunctionAttrs.cppllvm.src/lib/Transforms/IPO/FunctionAttrs.cpp
The file was modified/llvm/trunk/test/Transforms/FunctionAttrs/arg_returned.llllvm.src/test/Transforms/FunctionAttrs/arg_returned.ll
Revision 362917 by lkail:
[NFC] Test if commit access granted.
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/test/CodeGen/PowerPC/extract-and-store.llllvm.src/test/CodeGen/PowerPC/extract-and-store.ll
Revision 362916 by nico:
Make test not write to source directory
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/test/tools/llvm-lib/machine-mismatch.testllvm.src/test/tools/llvm-lib/machine-mismatch.test
Revision 362915 by ctopper:
[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled.
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/Target/X86/X86InstrVecCompiler.tdllvm.src/lib/Target/X86/X86InstrVecCompiler.td
Revision 362914 by ctopper:
[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns.

Previously we did the equivalent operation in isel patterns with
COPY_TO_REGCLASS operations to transition. By inserting
scalar_to_vetors and extract_vector_elts before isel we can
allow each piece to be selected individually and accomplish the
same final result.

I ideally we'd use vector operations earlier in lowering/combine,
but that looks to be more difficult.

The scalar-fp-to-i64.ll changes are because we have a pattern for
using movlpd for store+extract_vector_elt. While an f64 store
uses movsd. The encoding sizes are the same.
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cppllvm.src/lib/Target/X86/X86ISelDAGToDAG.cpp
The file was modified/llvm/trunk/lib/Target/X86/X86InstrAVX512.tdllvm.src/lib/Target/X86/X86InstrAVX512.td
The file was modified/llvm/trunk/lib/Target/X86/X86InstrSSE.tdllvm.src/lib/Target/X86/X86InstrSSE.td
The file was modified/llvm/trunk/test/CodeGen/X86/scalar-fp-to-i64.llllvm.src/test/CodeGen/X86/scalar-fp-to-i64.ll
The file was modified/llvm/trunk/test/CodeGen/X86/sqrt-fastmath-mir.llllvm.src/test/CodeGen/X86/sqrt-fastmath-mir.ll