SuccessChanges

Summary

  1. [llvm] Add a parser from JSON to TensorSpec (details)
  2. [mlir][Vector] Add transformation + pattern to split vector.transfer_read into full and partial copies. (details)
  3. [mlir][DialectConversion] Add support for mergeBlocks in ConversionPatternRewriter. (details)
  4. [mlir][DialectConversion] Remove usage of std::distance to track position. (details)
  5. [X86] Use h-register for final XOR of __builtin_parity on 64-bit targets. (details)
  6. [PGO] Change a `NumVSites == 0` workaround to assert (details)
  7. [FPEnv] IRBuilder fails to add strictfp attribute (details)
  8. [NewPM][LoopVersioning] Port LoopVersioning to NPM (details)
  9. [X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero. (details)
  10. [X86] Make ENDBR instruction a scheduling boundary (details)
  11. [compiler-rt][profile] Fix various InstrProf tests on Solaris (details)
  12. [PGO] Extend the value profile buckets for mem op sizes. (details)
  13. [gn build] Port f78f509c758 (details)
Commit 4b1b109c5126efc963cc19949df5201e40f1bcc1 by mtrofin
[llvm] Add a parser from JSON to TensorSpec

A JSON->TensorSpec utility we will use subsequently to specify
additional outputs needed for certain training scenarios.

Differential Revision: https://reviews.llvm.org/D84976
The file was modifiedllvm/lib/Analysis/TFUtils.cpp
The file was modifiedllvm/include/llvm/Analysis/Utils/TFUtils.h
The file was modifiedllvm/unittests/Analysis/TFUtilsTest.cpp
Commit d313e9c12ed3541f63a36e3b0d59e9e1185603d2 by ntv
[mlir][Vector] Add transformation + pattern to split vector.transfer_read into full and partial copies.

This revision adds a transformation and a pattern that rewrites a "maybe masked" `vector.transfer_read %view[...], %pad `into a pattern resembling:

```
   %1:3 = scf.if (%inBounds) {
      scf.yield %view : memref<A...>, index, index
    } else {
      %2 = vector.transfer_read %view[...], %pad : memref<A...>, vector<...>
      %3 = vector.type_cast %extra_alloc : memref<...> to
      memref<vector<...>> store %2, %3[] : memref<vector<...>> %4 =
      memref_cast %extra_alloc: memref<B...> to memref<A...> scf.yield %4 :
      memref<A...>, index, index
   }
   %res= vector.transfer_read %1#0[%1#1, %1#2] {masked = [false ... false]}
```
where `extra_alloc` is a top of the function alloca'ed buffer of one vector.

This rewrite makes it possible to realize the "always full tile" abstraction where vector.transfer_read operations are guaranteed to read from a padded full buffer.
The extra work only occurs on the boundary tiles.

Differential Revision: https://reviews.llvm.org/D84631
The file was modifiedmlir/lib/Dialect/Vector/VectorTransforms.cpp
The file was modifiedmlir/include/mlir/Interfaces/VectorInterfaces.td
The file was modifiedmlir/test/lib/Transforms/TestVectorTransforms.cpp
The file was addedmlir/test/Dialect/Vector/vector-transfer-full-partial-split.mlir
The file was modifiedmlir/include/mlir/Dialect/Vector/VectorTransforms.h
The file was modifiedmlir/lib/Dialect/Vector/CMakeLists.txt
Commit e888886cc3daf2c2d6c20cad51cd5ec2ffc24789 by ravishankarm
[mlir][DialectConversion] Add support for mergeBlocks in ConversionPatternRewriter.

Differential Revision: https://reviews.llvm.org/D84795
The file was modifiedmlir/lib/Transforms/DialectConversion.cpp
The file was modifiedmlir/test/lib/Dialect/Test/TestOps.td
The file was modifiedmlir/test/lib/Dialect/Test/TestPatterns.cpp
The file was addedmlir/test/Transforms/test-merge-blocks.mlir
Commit 32f3a9a9d68eea7d40a19767b591622b4b737990 by ravishankarm
[mlir][DialectConversion] Remove usage of std::distance to track position.

Remove use of iterator::difference_type to know where to insert a
moved or erased block during undo actions.

Differential Revision: https://reviews.llvm.org/D85066
The file was modifiedmlir/lib/Transforms/DialectConversion.cpp
Commit ac82b918c74f3fab8d4a7c1905277bda6b9bccb4 by craig.topper
[X86] Use h-register for final XOR of __builtin_parity on 64-bit targets.

This adds an isel pattern and special XOR8rr_NOREX instruction
to enable the use of h-registers for __builtin_parity. This avoids
a copy and a shift instruction. The NOREX instruction is in case
register allocation doesn't use the matching l-register for some
reason. If a R8-R15 register gets picked instead, we won't be
able to encode the instruction since an h-register can't be used
with a REX prefix.

Fixes PR46954
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-xor-bool.ll
The file was modifiedllvm/lib/Target/X86/X86InstrCompiler.td
The file was modifiedllvm/lib/Target/X86/X86InstrArithmetic.td
The file was modifiedllvm/test/CodeGen/X86/parity.ll
Commit 317e00dc54c74a2e0fd0c62bdc6a6d68b0d2ca7e by i
[PGO] Change a `NumVSites == 0` workaround to assert

The root cause was fixed by 3d6f53018f845e893ad34f64ff2851a2e5c3ba1d.
The workaround added in 99ad956fdaee5398fdcf46fa49cb433cf52dc461 can be changed
to an assert now. (In case the fix regresses, there will be a heap-use-after-free.)
The file was modifiedcompiler-rt/lib/profile/InstrProfilingValue.c
Commit d535a91d13b88b547ba24ec50337aa0715d74d4d by kevin.neal
[FPEnv] IRBuilder fails to add strictfp attribute

The strictfp attribute is required on all function calls in a function
that is itself marked with the strictfp attribute. The IRBuilder knows
this and has a method for adding the attribute to function call instructions.

If a function being called has the strictfp attribute itself then the
IRBuilder will refuse to add the attribute to the calling instruction
despite being asked to add it. Eliminate this error.

Differential Revision: https://reviews.llvm.org/D84878
The file was modifiedllvm/unittests/IR/IRBuilderTest.cpp
The file was modifiedllvm/include/llvm/IR/IRBuilder.h
Commit 7c19c89dd5c532fef533e008fb5911d20992d2ac by aeubanks
[NewPM][LoopVersioning] Port LoopVersioning to NPM

Reviewed By: ychen, fhahn

Differential Revision: https://reviews.llvm.org/D85063
The file was modifiedllvm/lib/Transforms/Scalar/Scalar.cpp
The file was modifiedllvm/lib/Passes/PassBuilder.cpp
The file was modifiedllvm/include/llvm/InitializePasses.h
The file was modifiedllvm/test/Transforms/LoopVersioning/basic.ll
The file was modifiedllvm/lib/Passes/PassRegistry.def
The file was modifiedllvm/include/llvm/Transforms/Utils/LoopVersioning.h
The file was modifiedllvm/lib/Transforms/Utils/LoopVersioning.cpp
Commit 219f32f4b68679563443cdaae7b8174c9976409a by llvm-dev
[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero.

This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR().

This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle.
The file was modifiedllvm/test/CodeGen/X86/shuffle-vs-trunc-256.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-128-v8.ll
The file was modifiedllvm/test/CodeGen/X86/insertelement-ones.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
Commit f208c659fb76b1ad8ae83dd10c4f0c30986d48ee by craig.topper
[X86] Make ENDBR instruction a scheduling boundary

Instructions should not be scheduled across ENDBR instructions, as this would result in the ENDBR being displaced, breaking the parity needed for the Indirect Branch Tracking feature of CET.

Currently, the X86IndirectBranchTracking pass is later than the instruction scheduling in the pipeline, what causes the bug to be unnoticeable and very hard (if not unfeasible) to be triggered while compiling C files with the standard LLVM setup. Yet, for correctness and to prevent issues in future changes, the compiler should prevent the such scheduling.

Differential Revision: https://reviews.llvm.org/D84862
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.h
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.cpp
Commit 39494d9c21bab3281e4af30578af10f37ea09470 by ro
[compiler-rt][profile] Fix various InstrProf tests on Solaris

Currently, several InstrProf tests `FAIL` on Solaris (both sparc and x86):

  Profile-i386 :: Posix/instrprof-visibility.cpp
  Profile-i386 :: instrprof-merging.cpp
  Profile-i386 :: instrprof-set-file-object-merging.c
  Profile-i386 :: instrprof-set-file-object.c

On sparc there's also

  Profile-sparc :: coverage_comments.cpp

The failure mode is always the same:

  error: /var/llvm/local-amd64/projects/compiler-rt/test/profile/Profile-i386/Posix/Output/instrprof-visibility.cpp.tmp: Failed to load coverage: Malformed coverage data

The error is from `llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp`
(`loadBinaryFormat`), l.926:

  InstrProfSymtab ProfileNames;
  std::vector<SectionRef> NamesSectionRefs = *NamesSection;
  if (NamesSectionRefs.size() != 1)
    return make_error<CoverageMapError>(coveragemap_error::malformed);

where .size() is 2 instead.

Looking at the executable, I find (with `elfdump -c -N __llvm_prf_names`):

  Section Header[15]:  sh_name: __llvm_prf_names
      sh_addr:      0x8053ca5       sh_flags:   [ SHF_ALLOC ]
      sh_size:      0x86            sh_type:    [ SHT_PROGBITS ]
      sh_offset:    0x3ca5          sh_entsize: 0
      sh_link:      0               sh_info:    0
      sh_addralign: 0x1

  Section Header[31]:  sh_name: __llvm_prf_names
      sh_addr:      0x8069998       sh_flags:   [ SHF_WRITE SHF_ALLOC ]
      sh_size:      0               sh_type:    [ SHT_PROGBITS ]
      sh_offset:    0x9998          sh_entsize: 0
      sh_link:      0               sh_info:    0
      sh_addralign: 0x1

Unlike GNU `ld` (which primarily operates on section names) the Solaris
linker, following the ELF spirit, only merges input sections into an output
section if both section name and section flags match, so two separate
sections are maintained.

The read-write one comes from `lib/clang/12.0.0/lib/sunos/libclang_rt.profile-i386.a(InstrProfilingPlatformLinux.c.o)`
while the read-only one is generated by
`llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp` (`InstrProfiling::emitNameData`)
at l.1004 where `isConstant = true`.

The easiest way to avoid the mismatch is to change the definition in
`compiler-rt/lib/profile/InstrProfilingPlatformLinux.c` to `const`.

This fixes all failures observed.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D85116
The file was modifiedcompiler-rt/lib/profile/InstrProfilingPlatformLinux.c
Commit f78f509c75861dc4e26f9a22ad12996bf8005a2e by yamauchi
[PGO] Extend the value profile buckets for mem op sizes.

Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).

Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.

Differential Revision: https://reviews.llvm.org/D81682
The file was modifiedllvm/test/Transforms/PGOProfile/memop_profile_funclet.ll
The file was modifiedcompiler-rt/include/profile/InstrProfData.inc
The file was modifiedllvm/include/llvm/Transforms/Instrumentation/InstrProfiling.h
The file was modifiedllvm/lib/ProfileData/InstrProf.cpp
The file was modifiedllvm/include/llvm/ProfileData/InstrProfData.inc
The file was modifiedcompiler-rt/lib/profile/InstrProfilingValue.c
The file was modifiedllvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
The file was modifiedllvm/include/llvm/ProfileData/InstrProf.h
The file was addedllvm/unittests/ProfileData/InstrProfDataTest.cpp
The file was modifiedllvm/unittests/ProfileData/CMakeLists.txt
The file was modifiedllvm/test/Transforms/PGOProfile/memcpy.ll
The file was modifiedllvm/lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp
Commit c12bd8dac91adac81cd9721fe34daf473ebd5e10 by llvmgnsyncbot
[gn build] Port f78f509c758
The file was modifiedllvm/utils/gn/secondary/llvm/unittests/ProfileData/BUILD.gn