Commit
2c53215e99cbd7b24768d9f0915aff9fedfcaa4a
by david.green[ARM] Skip debug info in recomputeVPTBlockMask
The ARMLowOverheadLoops pass recalculates VPT block masks when it converts VCMP's inside VPT blocks into VPT's. The function to do so doesn't seem to handle debug info though, leading to invalid block creation or asserts at compile time. Make sure the function skips any debug info between the MVE instructions it inspects.
Differential Revision: https://reviews.llvm.org/D110564
|
 | llvm/test/CodeGen/Thumb2/LowOverheadLoops/vpt-block-debug.mir |
 | llvm/lib/Target/ARM/Thumb2InstrInfo.cpp |
Commit
1a1aed8da8c19fd58a7208be9ab5bf185474392b
by spatel[InstCombine] add tests for icmp-gep; NFC
We need more coverage for commuted and (un)signed preds to verify that things behave as expected here. Currently, we do not transform signed preds or non-inbounds geps.
|
 | llvm/test/Transforms/InstCombine/icmp-gep.ll |
Commit
1f8bead67820ed3e1ad590ae4f49433c3771f86c
by spatel[InstCombine] reduce code for swapped predicate; NFC
|
 | llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp |
Commit
bd379915de38a9af3d65e19075a6a64ebbb8d6db
by sgueltonRefine the constraint for isInlineBuiltinDeclaration
Require it to be always_inline, to more closely match how _FORITFY_SOURCE behaves.
This avoids generation of `.inline` suffixed functions - these should always be inlined.
|
 | clang/lib/AST/Decl.cpp |
 | clang/test/CodeGen/memcpy-nobuiltin.inc |
Commit
0ea77502e22115dca6bcf82f895072ec67f3d565
by sjoerd.meijer[LoopFlatten] Updating Phi nodes after IV widening
In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV widening was introduced. If after widening of the IV the transformation is *not* applied, the narrow phi node was incorrectly modified, which should only happen if flattening happens. This can be seen in the added test widen-iv2.ll, which incorrectly had 1 incoming value, but should have its original 2 incoming values, which is now restored.
Differential Revision: https://reviews.llvm.org/D110234
|
 | llvm/test/Transforms/LoopFlatten/widen-iv2.ll |
 | llvm/lib/Transforms/Scalar/LoopFlatten.cpp |
Commit
f701505c45c708338202532f0f14af55b5509794
by a.bataev[SLP]Improve vectorization of phi nodes by trying wider vectors.
Try to improve vectorization of the PHI nodes by trying to vectorize similar instructions at the size of the widest possible vectors, then aggregating with compatible type PHIs and trying to vectoriza again and only if this failed, try smaller sizes of the vector factors for compatible PHI nodes. This restores performance of several benchmarks after tuning of the fp/int conversion instructions costs.
Differential Revision: https://reviews.llvm.org/D108740
|
 | llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h |
 | llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp |
 | llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll |
Commit
73a196a11c0e6fe7bbf33055cc2c96ce3c61ff0d
by jingu.kangRecommit "[AArch64] Split bitmask immediate of bitwise AND operation"
This reverts the revert commit f85d8a5bed95cc17a452b6b63b9866fbf181d94d with bug fixes.
Original message:
MOVi32imm + ANDWrr ==> ANDWri + ANDWri MOVi64imm + ANDXrr ==> ANDXri + ANDXri
The mov pseudo instruction could be expanded to multiple mov instructions later. In this case, try to split the constant operand of mov instruction into two bitmask immediates. It makes only two AND instructions intead of multiple mov + and instructions.
Added a peephole optimization pass on MIR level to implement it.
Differential Revision: https://reviews.llvm.org/D109963
|
 | llvm/lib/Target/AArch64/CMakeLists.txt |
 | llvm/test/CodeGen/AArch64/O3-pipeline.ll |
 | llvm/test/CodeGen/AArch64/aarch64-split-and-bitmask-immediate.ll |
 | llvm/lib/Target/AArch64/AArch64.h |
 | llvm/lib/Target/AArch64/AArch64TargetMachine.cpp |
 | llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp |
 | llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h |
 | llvm/test/CodeGen/AArch64/unfold-masked-merge-scalar-constmask-innerouter.ll |
Commit
1cd3ae019892c81a8a81daf4e2baffa3be6270db
by erich.keaneFix missing return from 9324cc2ca951fe5fe11c85470cb08e699c59499c
No idea how my local machine missed this, but I saw no warning for it, it seems to have been lost in some level of translating this back for upstreaming.
|
 | clang/lib/AST/Expr.cpp |
Commit
fdd8c10959544c14ddcf874fd9c2841a8bea1d21
by david.green[ARM] Delay reverting WLS in arm-block-placement
As we have to split blocks, we may be left in an invalid loop state after a WLS is reverted to a DLS. Instead remember the WLS that could not be fixed and revert them after finishing processing all other loops.
Differential Revision: https://reviews.llvm.org/D110567
|
 | llvm/test/CodeGen/Thumb2/LowOverheadLoops/wls-revert-placement.mir |
 | llvm/lib/Target/ARM/ARMBlockPlacement.cpp |
Commit
86cd2369b6cd7eb17374fb31bccac7895fe34658
by mgorny[lldb] [DynamicRegisterInfo] Refactor SetRegisterInfo()
Move the "slice" and "composite" handling into separate methods to avoid if/else hell. Use more LLVM types whenever possible. Replace printf()s with llvm::Error combined with LLDB logging.
Differential Revision: https://reviews.llvm.org/D110619
|
 | lldb/source/Plugins/Process/Utility/DynamicRegisterInfo.cpp |
 | lldb/source/Plugins/Process/Utility/DynamicRegisterInfo.h |
Commit
f3932ae1a078075e9e35f51ead3aaca05a9a23c7
by dvyukovtsan: fix cur_thread alignment
Commit 354ded67b3 ("tsan: align ThreadState to cache line") did an incomplete thing. It marked ThreadState as cache line aligned, but the thread local ThreadState instance is declared as an aligned char array with hard-coded 64-byte alignment. On PowerPC cache line size is 128 bytes, so the hard-coded 64-byte alignment is not enough. Use cache line alignment consistently.
Differential Revision: https://reviews.llvm.org/D110629
|
 | compiler-rt/lib/tsan/rtl/tsan_rtl.cpp |
Commit
993ada05f5a05615ec16da4a69bd368529a7e5d1
by emaste[lldb] [unittests] Fix building the FreeBSD arm64 Register Context test
Differential Revision: https://reviews.llvm.org/D110545
|
 | lldb/unittests/Process/Utility/RegisterContextFreeBSDTest.cpp |
Commit
b38c04ab7f8f847eac2b4b6f965137a85c967d21
by clementval[fir][NFC] Rename operand of EmboxOp
Rename `lenParams` to `typeparams` to be in sync with fir-dev.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D110628
Co-authored-by: Jean Perier <jperier@nvidia.com> Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
|
 | flang/lib/Optimizer/Dialect/FIROps.cpp |
 | flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp |
 | flang/include/flang/Optimizer/Dialect/FIROps.td |
Commit
ade5023c54cffcbefe0557b5473d55b06e40809b
by dvyukovtsan: fix tls_race3 test on darwin
Darwin also needs to use __tsan_tls_initialization to pass the test.
Differential Revision: https://reviews.llvm.org/D110631
|
 | compiler-rt/lib/tsan/rtl/tsan_platform_mac.cpp |
Commit
ccc83ac7c501c8e117753af0729414350aa9c117
by dvyukovtsan: print a meaningful frame for stack races
Depends on D110631.
Differential Revision: https://reviews.llvm.org/D110632
|
 | compiler-rt/lib/tsan/rtl/tsan_rtl_thread.cpp |
Commit
c93da7d9cf161ffda2366a96eb060c3b824cb549
by lebedev.riRevert "[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Linux"
See original review https://reviews.llvm.org/D107799
This reverts commit f9dbca68d48e705f6d45df8f58d6b2ee88bce76c.
|
 | llvm/CMakeLists.txt |
Commit
9e4f1f926552ac061c79e420933ff9104ceefb87
by kazu[SystemZ] Remove redundant declaration SystemZMnemonicSpellCheck (NFC)
Note that SystemZMnemonicSpellCheck is defined in SystemZGenAsmMatcher.inc, which SystemZAsmParser.cpp includes.
Identified with readability-redundant-declaration.
|
 | llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp |
Commit
70391b3468b8a4a07b49df88d7fa88c9644cda77
by quinn.pham[PowerPC] FP compare and test XL compat builtins.
This patch is in a series of patches to provide builtins for compatability with the XL compiler. This patch adds builtins for compare exponent and test data class operations on floating point values.
Reviewed By: #powerpc, lei
Differential Revision: https://reviews.llvm.org/D109437
|
 | clang/test/CodeGen/builtins-ppc-xlcompat-test.c |
 | llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-test.ll |
 | clang/lib/CodeGen/CGBuiltin.cpp |
 | llvm/include/llvm/IR/IntrinsicsPowerPC.td |
 | llvm/lib/Target/PowerPC/PPCISelLowering.cpp |
 | clang/lib/Sema/SemaChecking.cpp |
 | clang/test/CodeGen/builtins-ppc-xlcompat-pwr9-error.c |
 | clang/include/clang/Basic/BuiltinsPPC.def |
 | clang/lib/Basic/Targets/PPC.cpp |
 | clang/include/clang/Basic/DiagnosticSemaKinds.td |
Commit
091c16f76ba1e6341afd717445323f8396b7772f
by wlei[llvm-profgen] On-demand symbolization
Previously we do symbolization for all the functions and actually we only need the symbols that's hit by the samples.
This can significantly speed up the time for large size binary.
Optimization for per-inliner will come along with next patch.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110465
|
 | llvm/tools/llvm-profgen/ProfiledBinary.cpp |
 | llvm/tools/llvm-profgen/PerfReader.h |
 | llvm/tools/llvm-profgen/ProfiledBinary.h |
Commit
ce40843a3fe120621bb6e4aa07dc9cbf76b6aa0e
by wlei[llvm-profgen][CSSPGO] On-demand function size computation for preinliner
Similar to https://reviews.llvm.org/D110465, we can compute function size on-demand for the functions that's hit by samples.
Here we leverage the raw range samples' address to compute a set of sample hit function. Then `BinarySizeContextTracker` just works on those function range for the size.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D110466
|
 | llvm/tools/llvm-profgen/ProfileGenerator.h |
 | llvm/tools/llvm-profgen/ProfiledBinary.h |
 | llvm/tools/llvm-profgen/ProfileGenerator.cpp |
 | llvm/tools/llvm-profgen/ProfiledBinary.cpp |
Commit
aa93c55889ec6284e0337d6baf01c71dadd76043
by lebedev.ri[X86][Costmodel] Load/store i16 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have: https://godbolt.org/z/bhscej4WM - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `13`.
For store we have: https://godbolt.org/z/Yf4Pfnxbq - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=3.5` So pick cost of `10`.
I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110590
|
 | llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll |
 | llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-6.ll |
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
Commit
b3011bcc78926686dd95bd5dbb4c2c66d8be24a2
by lebedev.ri[X86][Costmodel] Load/store i16 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have: https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=4.5` So pick cost of `9`.
For store we have: https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=6.0` So pick cost of `15`.
I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110591
|
 | llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll |
 | llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-6.ll |
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
Commit
24e42f7d28e98152484cf9bbf8ee4080f5082da0
by lebedev.ri[X86][Costmodel] Load/store i16 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have: https://godbolt.org/z/3Tc5s897j - for intels `Block RThroughput: =39.0`; for ryzens, `Block RThroughput: <=13.5` So pick cost of `39`.
For store we have: https://godbolt.org/z/fo1h9E67e - for intels `Block RThroughput: =21.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `21`.
I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110592
|
 | llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll |
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
 | llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-6.ll |
Commit
b6b7860954c677003a5a0b0d4071e88aa44e9081
by lebedev.ri[X86][Costmodel] Load/store i16 Stride=6 VF=16 interleaving costs
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3
For this tuple, measuring becomes problematic since there's a lot of spilling going on, but apparently all these memory ops do not affect worst-case estimate at all here.
For load we have: https://godbolt.org/z/5qGb9odP6 - for intels `Block RThroughput: <=106.0`; for ryzens, `Block RThroughput: <=34.8` So pick cost of `106`.
For store we have: https://godbolt.org/z/KrWcv4Ph7 - for intels `Block RThroughput: =58.0`; for ryzens, `Block RThroughput: <=20.5` So pick cost of `58`.
I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110593
|
 | llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll |
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
 | llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-6.ll |
Commit
a7d084a18de736e0ed750c67c5851fa6b2a085f5
by arthur.j.odwyer[libc++] [compare] Rip out more vestiges of *_equality. NFCI.
There's really no reason to even have two different enums here, but *definitely* we shouldn't have *three*, and they don't need so many synonymous enumerator values.
Differential Revision: https://reviews.llvm.org/D110516
|
 | libcxx/include/__compare/ordering.h |