SuccessChanges

Summary

  1. [clang] NRVO: Improvements and handling of more cases. (details)
  2. [SimplifyCFG] avoid 'tmp' variables in test file; NFC (details)
  3. [LV] Parallel annotated loop does not imply all loads can be hoisted. (details)
  4. 2d Arm Neon sdot op, and lowering to the intrinsic. (details)
  5. [MLIR] Document that Dialect Conversion traverses in preorder (details)
  6. [AArch64][GlobalISel] Legalize scalar G_CTTZ + G_CTTZ_ZERO_UNDEF (details)
  7. [libcxx][ranges] removes default_initializable from weakly_incrementable and view (details)
  8. Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA (details)
  9. [clang] Implement P2266 Simpler implicit move (details)
  10. [Profile] Handle invalid profile data (details)
  11. [IR] make -warn-frame-size into a module attr (details)
  12. [Profile] Remove redundant check (details)
  13. LoadStoreVectorizer: support different operand orders in the add sequence match (details)
Commit 667fbcdd0b2ee5e78f5ce9789b862e3bbca94644 by mizvekov
[clang] NRVO: Improvements and handling of more cases.

This expands NRVO propagation for more cases:

Parse analysis improvement:
* Lambdas and Blocks with dependent return type can have their variables
  marked as NRVO Candidates.

Variable instantiation improvements:
* Fixes crash when instantiating NRVO variables in Blocks.
* Functions, Lambdas, and Blocks which have auto return type have their
  variables' NRVO status propagated. For Blocks with non-auto return type,
  as a limitation, this propagation does not consider the actual return
  type.

This also implements exclusion of VarDecls which are references to
dependent types.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: Quuxplusone

Differential Revision: https://reviews.llvm.org/D99696
The file was modifiedclang/lib/Sema/Sema.cpp
The file was modifiedclang/lib/Sema/SemaStmt.cpp
The file was modifiedclang/test/CodeGen/nrvo-tracking.cpp
The file was modifiedclang/include/clang/Sema/Sema.h
The file was modifiedclang/lib/Sema/SemaCoroutine.cpp
The file was modifiedclang/lib/Sema/SemaExprCXX.cpp
The file was modifiedclang/lib/Sema/SemaTemplateInstantiateDecl.cpp
Commit 7b969ef8b4eb93d7a2be093b27280f12b8cd9ccb by spatel
[SimplifyCFG] avoid 'tmp' variables in test file; NFC
The file was modifiedllvm/test/Transforms/SimplifyCFG/two-entry-phi-return.ll
Commit 4f01122c3f6c70beee8f736f196a09976602685f by joachim
[LV] Parallel annotated loop does not imply all loads can be hoisted.

As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
The file was removedllvm/test/Transforms/LoopVectorize/X86/force-ifcvt.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/tail_folding_and_assume_safety.ll
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
The file was modifiedllvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
Commit 20daedacca803b81db6d8773b705345702bf0fc3 by ataei
2d Arm Neon sdot op, and lowering to the intrinsic.

This adds Sdot2d op, which is similar to the usual Neon
intrinsic except that it takes 2d vector operands, reflecting the
structure of the arithmetic that it's performing: 4 separate
4-dimensional dot products, whence the vector<4x4xi8> shape.

This also adds a new pass, arm-neon-2d-to-intr, lowering
this new 2d op to the 1d intrinsic.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D102504
The file was addedmlir/include/mlir/Conversion/ArmNeon2dToIntr/ArmNeon2dToIntr.h
The file was modifiedmlir/lib/Conversion/PassDetail.h
The file was addedmlir/lib/Conversion/ArmNeon2dToIntr/ArmNeon2dToIntr.cpp
The file was addedmlir/test/Dialect/ArmNeon/invalid.mlir
The file was modifiedmlir/include/mlir/Dialect/ArmNeon/ArmNeon.td
The file was modifiedmlir/lib/Conversion/CMakeLists.txt
The file was modifiedmlir/include/mlir/Conversion/Passes.td
The file was addedmlir/lib/Conversion/ArmNeon2dToIntr/CMakeLists.txt
The file was modifiedmlir/include/mlir/Conversion/Passes.h
The file was addedmlir/test/Target/LLVMIR/arm-neon-2d.mlir
Commit 4f6ec382c8b7204f3b1f48060025f970925f5804 by gcmn
[MLIR] Document that Dialect Conversion traverses in preorder

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D102525
The file was modifiedmlir/docs/DialectConversion.md
Commit 933df6ca796c0ace889bcc64706ec53462bd859a by Jessica Paquette
[AArch64][GlobalISel] Legalize scalar G_CTTZ + G_CTTZ_ZERO_UNDEF

This adds legalization for scalar G_CTTZ and G_CTTZ_ZERO_UNDEF. Vector support
requires handling vector G_BITREVERSE, which I haven't gotten around to yet.

For G_CTTZ_ZERO_UNDEF, we just lower it to G_CTTZ.

For G_CTTZ, we match SelectionDAG's lowering to a G_BITREVERSE + G_CTLZ.

e.g. https://godbolt.org/z/nPEseYh1s

(With this patch, we have slightly worse codegen than SDAG for types smaller
than s32; it seems like we're missing a combine.)

Also, this adds in a function to build G_BITREVERSE to MachineIRBuilder.

Differential Revision: https://reviews.llvm.org/D104065
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
The file was addedllvm/test/CodeGen/AArch64/GlobalISel/legalize-cttz-zero-undef.mir
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
The file was addedllvm/test/CodeGen/AArch64/GlobalISel/legalize-cttz.mir
The file was modifiedllvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
Commit 462f8f06113616ac5646144972d3f453639aac69 by cjdb
[libcxx][ranges] removes default_initializable from weakly_incrementable and view

also:

* removes default constructors from predefined iterators
* makes span and string_view views

Partially implements P2325.
Partially resolves LWG3326.

Differential Revision: https://reviews.llvm.org/D102468
The file was removedlibcxx/test/std/iterators/predef.iterators/insert.iterators/insert.iter.ops/insert.iter.cons/default.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.req/range.view/view.compile.pass.cpp
The file was modifiedlibcxx/test/std/ranges/range.req/range.view/view.subsumption.compile.pass.cpp
The file was modifiedlibcxx/include/span
The file was modifiedlibcxx/include/string_view
The file was removedlibcxx/test/std/iterators/predef.iterators/insert.iterators/back.insert.iter.ops/back.insert.iter.cons/default.pass.cpp
The file was removedlibcxx/test/std/iterators/stream.iterators/ostream.iterator/ostream.iterator.cons.des/default.pass.cpp
The file was removedlibcxx/test/std/iterators/predef.iterators/insert.iterators/front.insert.iter.ops/front.insert.iter.cons/default.pass.cpp
The file was removedlibcxx/test/std/iterators/stream.iterators/ostreambuf.iterator/ostreambuf.iter.cons/default.pass.cpp
The file was modifiedlibcxx/docs/Cxx2aStatusPaperStatus.csv
The file was modifiedlibcxx/include/__ranges/concepts.h
The file was modifiedlibcxx/include/iterator
The file was modifiedlibcxx/test/std/iterators/iterator.requirements/iterator.concepts/iterator.concept.winc/weakly_incrementable.compile.pass.cpp
The file was modifiedlibcxx/include/__iterator/concepts.h
The file was modifiedlibcxx/test/std/containers/views/range_concept_conformance.compile.pass.cpp
The file was removedlibcxx/test/std/iterators/iterator.requirements/iterator.concepts/iterator.concept.winc/subsumption.compile.pass.cpp
The file was modifiedlibcxx/test/std/strings/string.view/range_concept_conformance.compile.pass.cpp
The file was modifiedlibcxx/docs/Cxx2aStatusIssuesStatus.csv
The file was modifiedlibcxx/include/__ranges/enable_view.h
Commit 41555eaf65b12db00c8a18e7fe530f72ab9ebfc0 by andrew.kaylor
Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA

SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store.

Added a LIT test to catch one case.

Patch by Mark Mendell

Differential Revision: https://reviews.llvm.org/D103254
The file was addedllvm/test/Transforms/SROA/mem-par-metadata-sroa-cast.ll
The file was modifiedllvm/lib/Transforms/Scalar/SROA.cpp
Commit cbd0054b9eb17ec48f0702e3828209646c8f5ebd by mizvekov
[clang] Implement P2266 Simpler implicit move

This Implements [[http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2266r1.html|P2266 Simpler implicit move]].

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: Quuxplusone

Differential Revision: https://reviews.llvm.org/D99005
The file was modifiedclang/lib/Sema/SemaType.cpp
The file was modifiedclang/lib/Sema/SemaCoroutine.cpp
The file was modifiedclang/test/SemaCXX/constant-expression-cxx14.cpp
The file was modifiedclang/test/CXX/class/class.init/class.copy.elision/p3.cpp
The file was modifiedclang/test/SemaCXX/warn-return-std-move.cpp
The file was modifiedclang/test/CXX/temp/temp.decls/temp.mem/p5.cpp
The file was modifiedclang/test/SemaCXX/return-stack-addr.cpp
The file was modifiedclang/test/CXX/drs/dr3xx.cpp
The file was modifiedclang/lib/Sema/SemaStmt.cpp
The file was modifiedclang/test/SemaCXX/deduced-return-type-cxx14.cpp
The file was modifiedclang/test/CXX/dcl.dcl/dcl.spec/dcl.type/dcl.spec.auto/p7-cxx14.cpp
The file was modifiedclang/lib/Sema/SemaExprCXX.cpp
The file was modifiedclang/test/SemaCXX/coroutines.cpp
The file was modifiedclang/include/clang/Sema/Sema.h
The file was modifiedclang/test/CXX/expr/expr.prim/expr.prim.lambda/p4-cxx14.cpp
The file was modifiedclang/test/SemaCXX/constant-expression-cxx11.cpp
The file was modifiedclang/test/SemaCXX/coroutine-rvo.cpp
Commit 189428c8fc2465c25efbf4f0bb73e26fecf150ce by aeubanks
[Profile] Handle invalid profile data

This mostly follows LLVM's InstrProfReader.cpp error handling.
Previously, attempting to merge corrupted profile data would result in
crashes. See https://crbug.com/1216811#c4.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104050
The file was modifiedcompiler-rt/lib/profile/InstrProfilingFile.c
The file was addedcompiler-rt/test/profile/Linux/corrupted-profile.c
The file was modifiedcompiler-rt/test/profile/instrprof-merge.c
The file was modifiedcompiler-rt/test/profile/instrprof-without-libc.c
The file was modifiedcompiler-rt/test/profile/Linux/instrprof-merge-vp.c
The file was modifiedcompiler-rt/lib/profile/InstrProfilingMerge.c
The file was modifiedcompiler-rt/lib/profile/InstrProfiling.h
Commit fc018ebb608ee0c1239b405460e49f1835ab6175 by ndesaulniers
[IR] make -warn-frame-size into a module attr

-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.

-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.

Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.

Reviewed By: rsmith, qcolombet

Differential Revision: https://reviews.llvm.org/D103928
The file was modifiedclang/include/clang/Driver/Options.td
The file was modifiedllvm/lib/CodeGen/PrologEpilogInserter.cpp
The file was modifiedllvm/test/CodeGen/ARM/warn-stack.ll
The file was addedllvm/test/Linker/warn-stack-frame.ll
The file was modifiedclang/test/Frontend/backend-diagnostic.c
The file was modifiedclang/include/clang/Basic/CodeGenOptions.def
The file was modifiedllvm/lib/IR/Module.cpp
The file was modifiedllvm/test/CodeGen/X86/warn-stack.ll
The file was addedclang/test/Driver/Wframe-larger-than.c
The file was modifiedllvm/include/llvm/IR/Module.h
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
The file was modifiedclang/lib/CodeGen/CodeGenModule.cpp
The file was modifiedclang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp
Commit b73742bc8d2ec53f0892f1609837c088f9cfcf64 by aeubanks
[Profile] Remove redundant check

This is already checked outside the loop.

Followup to D104050.
The file was modifiedcompiler-rt/lib/profile/InstrProfilingMerge.c
Commit 119965865cc730060e4cc95690ee7dab91c2c440 by vkeles
LoadStoreVectorizer: support different operand orders in the add sequence match

First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.

Then we use the refactored code trying 4 variants of matching operands.

Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.

Reviewed By: volkan

Differential Revision: https://reviews.llvm.org/D103912
The file was modifiedllvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll
The file was modifiedllvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp