Commit
97b6c92dcd56937bc27de7c4c08381fc71c402e7
by github[include-cleaner] Add missing deps from unittests
|
 | clang-tools-extra/include-cleaner/unittests/CMakeLists.txt |
Commit
3562f855b71e159908806d3152e81ef439d041ca
by llvm-dev[X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0
|
 | llvm/lib/Target/X86/X86ISelLowering.cpp |
 | llvm/test/CodeGen/X86/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll |
Commit
5fa169335f7d086ef71547c6033a9357a617c1b8
by jay.foad[AMDGPU] Simplify the test case for D124450
|
 | llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll |
Commit
4cacd22418ceb31ad9277469a2a10d714c651deb
by npopov[InstCombine] Add test for is_alpha check with logical or and nsw (NFC)
The combination of logical or and nsw prevents the fold from happening.
|
 | llvm/test/Transforms/InstCombine/and-or-icmps.ll |
Commit
cacaa445c3a3a2551a6e2aef51414e47def9cc06
by david.spickettReland "[lldb] Use shutil.which in Shell tests find_executable"
This reverts commit d9247cc84825539d346c74eb1379c6cb948d3a71.
With the Windows tests updated to expect .EXE suffixes. This changed because shutil.which uses PATHEXT which will contain, amongst others, "EXE".
Also I noticed the "." in ".exe" was the wildcard dot not literal dot so I've escaped those.
|
 | lldb/test/Shell/helper/build.py |
 | lldb/test/Shell/BuildScript/toolchain-clang.test |
 | lldb/test/Shell/BuildScript/toolchain-clang-cl.test |
 | lldb/test/Shell/BuildScript/toolchain-msvc.test |
Commit
7a0b897e8664d11481230a69a88fca2b2ee5f904
by paul.walker[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling
refineUniformBase and selectGatherScatterAddrMode both attempt the transformation:
base(0) + index(A+splat(B)) => base(B) + index(A)
However, this is only safe when index is not implicitly scaled.
Differential Revision: https://reviews.llvm.org/D123222
|
 | llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll |
 | llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp |
 | llvm/lib/Target/AArch64/AArch64ISelLowering.cpp |
Commit
61d3a3afe26fd49353d9fd37d0e7817a13c28659
by geek4civicAVRExpandPseudoInsts.cpp: Fix a warning. [-Wunused-but-set-variable]
It has been enabled since llvmorg-15-init-5683-g2af845a6519c, aka D122271.
|
 | llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp |
Commit
2e6657b340f0f33414a97c79b3d1e37ad947ec7a
by geek4civicllvm/Support/Debug.h: Suppress warnings with -Asserts. [-Wunused-variable]
Re. setCurrentDebugTypes(X,N), the only user is llvm-ml.cpp (exc. DebugTests) since llvmorg-15-init-8355-g82ecf9a0b1b3.
FIXME: X and N are evaluated regardless of NDEBUG. Could we avoid evaluating (but w/o warnings) with NDEBUG?
|
 | llvm/include/llvm/Support/Debug.h |
Commit
f8463da4a329b839cfd01d7f80ae72e18f3c061e
by david.spickett[lldb] Allow EXE or exe in toolchain-msvc.test
I suspect that one of link or cl is found by shutil.which and one isn't, hence the case difference. It doesn't really matter for what the test is looking for.
|
 | lldb/test/Shell/BuildScript/toolchain-msvc.test |
Commit
e66127e69bfabe1b18857c3e3962125a9fe5aa7c
by flo[VPlan] Simplify & adjust code as suggested in D123005.
Improve code as suggested in D123005. Applied separately, because the comments where made a diff that has not been rebased to current main.
|
 | llvm/lib/Transforms/Vectorize/VPlan.cpp |
Commit
3c2a74a3ae02d16e899e280953c055f92aa6cdaa
by springerm[mlir][linalg][transform] Add TileOp to transform dialect
This commit adds a tiling op to the transform dialect as an external op.
Differential Revision: https://reviews.llvm.org/D124661
|
 | mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp |
 | mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td |
 | mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp |
 | mlir/test/Dialect/Linalg/transform-ops.mlir |
 | utils/bazel/llvm-project-overlay/mlir/BUILD.bazel |
 | mlir/include/mlir/Dialect/Linalg/TransformOps/CMakeLists.txt |
 | mlir/lib/Dialect/Linalg/CMakeLists.txt |
 | mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp |
 | mlir/include/mlir/Dialect/Linalg/CMakeLists.txt |
 | mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h |
 | mlir/include/mlir/InitAllDialects.h |
 | mlir/lib/Dialect/Linalg/TransformOps/CMakeLists.txt |
Commit
982cbed81920b474d41b63760cbde68680b15966
by npopov[InstCombine] Fold logical and/or of range icmps with nowrap flags
This is an edge-case where we don't convert to bitwise and/or based on implies poison reasoning, so explicitly try to perform the fold in logical form. The transform itself is poison-safe, as both icmps are based on the same value and any nowrap flags are discarded as part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used example).
|
 | llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp |
 | llvm/test/Transforms/InstCombine/and-or-icmps.ll |
 | llvm/lib/Transforms/InstCombine/InstCombineInternal.h |
 | llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp |
Commit
24a133e16fc50a39e530c7795edabbdf3b2edd2d
by flo[LV] Rename CountRoundDown to VectorTripCount (NFC)
The name CountRoundDown is potentially misleading, as the number of iterations can be rounded up when folding the tail.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D119681
|
 | llvm/lib/Transforms/Vectorize/LoopVectorize.cpp |
Commit
1881711fbb7b0cd1b8d492b3ca4b70ce75824030
by npopov[InstCombine] Remove memset of undef value
This removes memset with undef char. We already do this for stores of undef value.
This comes with the caveat that this optimization is not, strictly speaking, legal for undef values, because we might be overwriting a poison value. However, our entire load/store model currently still operates on undef values, so we need to support undef here as well for internal consistency.
Once https://github.com/llvm/llvm-project/issues/52930 is resolved, these and related folds can be limited to poison -- I've added FIXMEs to that effect.
Differential Revision: https://reviews.llvm.org/D124173
|
 | llvm/test/Transforms/InstCombine/store.ll |
 | llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp |
 | llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp |
 | llvm/test/Transforms/InstCombine/memset.ll |
Commit
2c8cb9acb51e2fa74bf9339ddd0884ef9d921dfc
by jperier[flang] Handle common block with different sizes in same file
Semantics is not preventing a named common block to appear with different size in a same file (named common block should always have the same storage size (see Fortran 2018 8.10.2.5), but it is a common extension to accept different sizes).
Lowering was not coping with this well, since it just use the first common block appearance, starting with BLOCK DATAs to define common blocks (this also was an issue with the blank common block, which can legally appear with different size in different scoping units).
Semantics is also not preventing named common from being initialized outside of a BLOCK DATA, and lowering was dealing badly with this, since it only gave an initial value to common blocks Globals if the first common block appearance, starting with BLOCK DATAs had an initial value.
Semantics is also allowing blank common to be initialized, while lowering was assuming this would never happen, and was never creating an initial value for it.
Lastly, semantics was not complaining if a COMMON block was initialized in several scoping unit in a same file, while lowering can only generate one of these initial value.
To fix this, add a structure to keep track of COMMON block properties (biggest size, and initial value if any) at the Program level. Once the size of a common block appearance is know, the common block appearance is checked against this information. It allows semantics to emit an error in case of multiple initialization in different scopes of a same common block, and to warn in case named common blocks appears with different sizes. Lastly, this allows lowering to use the Program level info about common blocks to emit the right GlobalOp for a Common Block, regardless of the COMMON Block appearances order: It emits a GlobalOp with the biggest size, whose lowest bytes are initialized with the initial value if any is given in a scope where the common block appears.
Lowering is updated to go emit the common blocks before anything else so that the related GlobalOps are available when lowering the scopes where common block appear. It is also updated to not assume that blank common are never initialized.
Differential Revision: https://reviews.llvm.org/D124622
|
 | flang/lib/Lower/PFTBuilder.cpp |
 | flang/test/Semantics/common-blocks-warn.f90 |
 | flang/test/Semantics/resolve42.f90 |
 | flang/test/Lower/module_use.f90 |
 | flang/include/flang/Lower/ConvertVariable.h |
 | flang/lib/Lower/Bridge.cpp |
 | flang/lib/Lower/ConvertVariable.cpp |
 | flang/test/Lower/common-block-2.f90 |
 | flang/docs/Extensions.md |
 | flang/include/flang/Semantics/semantics.h |
 | flang/include/flang/Lower/PFTBuilder.h |
 | flang/test/Semantics/common-blocks.f90 |
 | flang/test/Lower/common-block.f90 |
 | flang/lib/Semantics/compute-offsets.cpp |
 | flang/lib/Semantics/semantics.cpp |
 | flang/test/Lower/module_definition.f90 |
 | flang/test/Lower/pointer-initial-target-2.f90 |
Commit
027c728f29889ea6502030ec3623774d830c2ac3
by npopov[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize
This is an alternative to D124530. In getUniformBase() only create scales that match the gather/scatter element size. If targets also support other scales, then they can produce those scales in target DAG combines. This is what X86 already does (as long as the resulting scale would be 1, 2, 4 or 8).
This essentially restores the pre-opaque-pointer state of things.
Fixes https://github.com/llvm/llvm-project/issues/55021.
Differential Revision: https://reviews.llvm.org/D124605
|
 | llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp |
 | llvm/test/CodeGen/X86/gather-scatter-opaque-ptr-2.ll |
 | llvm/test/CodeGen/X86/gather-scatter-opaque-ptr.ll |
Commit
5b524da42f6817920eb1f1dd30f8a2dc3241d614
by npopov[InstCombine] Add test for unused atomic load from non-constant global (NFC)
|
 | llvm/test/Transforms/InstCombine/atomic.ll |
Commit
643c9b22ef527be8532d7b75ccf64180fa060339
by jhuber6[OpenMP] Make generating offloading entries more generic
This patch moves the logic for generating the offloading entries to the OpenMPIRBuilder. This makes it easier to re-use in other places, such as for OpenMP support in Flang or using the same method for generating offloading entires for other languages like Cuda.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D123460
|
 | llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |
 | llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h |
 | clang/lib/CodeGen/CGOpenMPRuntime.h |
 | clang/lib/CodeGen/CGOpenMPRuntime.cpp |
 | llvm/include/llvm/Frontend/OpenMP/OMPKinds.def |
Commit
ca6bbe008512c9dc6a1ac242466a9d42288daff8
by jhuber6[OpenMP] Make clang argument handling for the new driver more generic
In preparation for accepting other offloading kinds with the new driver, this patch makes the way we handle offloading actions more generic. A new field to get the associated device action's toolchain is used rather than manually iterating a list. This makes building the arguments easier and makes sure that we doin't rely on any implicit ordering.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D123313
|
 | clang/lib/Driver/Action.cpp |
 | clang/include/clang/Driver/Action.h |
 | clang/lib/Driver/ToolChains/Clang.cpp |
Commit
4e2b5a6693e299fcb8671d4dbb69c993d181b29f
by jhuber6[Clang] Make enabling the new driver more generic
In preparation for allowing other offloading kinds to use the new driver a new opt-in flag `-foffload-new-driver` is added. This is distinct from the existing `-fopenmp-new-driver` because OpenMP will soon use the new driver by default while the others should not.
Reviewed By: yaxunl, tra
Differential Revision: https://reviews.llvm.org/D123325
|
 | clang/lib/Driver/Driver.cpp |
 | clang/include/clang/Driver/Options.td |
 | clang/lib/Driver/ToolChains/Clang.cpp |
Commit
c5e5b54350fecd4b44c60eb4e982c13de5307aee
by jhuber6[CUDA] Add driver support for compiling CUDA with the new driver
This patch adds the basic support for the clang driver to compile and link CUDA using the new offloading driver. This requires handling the CUDA offloading kind and embedding the generated files into the host. This will allow us to link OpenMP code with CUDA code in the linker wrapper. More support will be required to create functional CUDA / HIP binaries using this method.
Depends on D120270 D120271 D120934
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D120272
|
 | clang/test/Driver/cuda-openmp-driver.cu |
 | clang/lib/Driver/ToolChains/Clang.cpp |
 | clang/lib/Driver/Driver.cpp |
 | clang/test/Driver/cuda-phases.cu |
 | clang/include/clang/Basic/Cuda.h |
 | clang/include/clang/Basic/DiagnosticDriverKinds.td |
Commit
59588f0a3d47e3e366d675b8f9724c10a6222c0e
by paul.walker[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost.
getGatherScatterIndexIsExtended currently looks through all SIGN_EXTEND_INREG operations regardless of their input type. This patch restricts the code to only look through i32->i64 extensions, which are the ones supported implicitly by SVE addressing modes.
Differential Revision: https://reviews.llvm.org/D123318
|
 | llvm/lib/Target/AArch64/AArch64ISelLowering.cpp |
 | llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll |
Commit
23c509754d4b81e5503f8da5caa3d4c00af85afb
by paul.walker[DAGCombiner] Stop invalid sign conversion in refineIndexType.
When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned.
Depends On D123318
Differential Revision: https://reviews.llvm.org/D123326
|
 | llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll |
 | llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp |
Commit
23c10e8d0f97dc38c9f620541c7f3ffd04bef905
by mboehme[clang] Eliminate TypeProcessingState::trivial.
This flag is redundant -- it's true iff `savedAttrs` is empty.
Querying `savedAttrs.empty()` should not take any more time than querying the `trivial` flag, so this should not have a performance impact either.
I noticed this while working on https://reviews.llvm.org/D111548.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123783
|
 | clang/lib/Sema/SemaType.cpp |
Commit
f685bce8080cd817adc67a7c06fb6834fa356139
by stefanp[PowerPC][NFC] Add a function to determine if a call needs to be NOTOC.
Add the isNoTOCCallInstr function to PPCInstrInfo to determine if a call opcode does not need a TOC restore after the call. All call opcodes should be listed in this function. A default unreachable in this function should force future call opcodes to also be added.
This is a follow up patch to D122012
Reviewed By: jsji, shchenz
Differential Revision: https://reviews.llvm.org/D124415
|
 | llvm/lib/Target/PowerPC/PPCInstrInfo.h |
 | llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp |
Commit
205246cb64358aa6f03b54d47d73708122d76bbf
by anna[CompileTime] [Passes] Avoid computing unnecessary analyses. NFC
Similar to c515b2f39e77, If there are no loops in the function as seen through LI, we should avoid computing the remaining expensive analyses (such as SCEV, BPI). Reordered the analyses requests and early return if there are no loops.
The logic of avoiding expensive analyses is applied to LoopVectorizer, LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate on loops.
This is an NFC with compile time improvement.
Differential Revision: https://reviews.llvm.org/D124529
|
 | llvm/test/Transforms/SCCP/preserve-analysis.ll |
 | llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp |
 | llvm/lib/Transforms/Vectorize/LoopVectorize.cpp |
 | llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp |
Commit
9e7c9967c3fd573ef53b145e24e6a1e6ba930c82
by david.candlerAdditionally set f32 mode with denormal-fp-math
When the denormal-fp-math option is used, this should set the denormal handling mode for all floating point types. However, currently 32-bit float types can ignore this setting as there is a variant of the option, denormal-fp-math-f32, specifically for that type which takes priority when checking the mode based on type and remains at the default of IEEE. From the description, denormal-fp-math would be expected to set the mode for floats unless overridden by the f32 variant, and code in the front end only emits the f32 option if it is different to the general one, so setting just denormal-fp-math should be valid.
This patch changes the denormal-fp-math option to also set the f32 mode. If denormal-fp-math-f32 is also specified, this is then overridden as expected, but if it is absent floats will be set to the mode specified by the former option, rather than remain on the default.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122589
|
 | clang/lib/Driver/ToolChains/Clang.cpp |
 | clang/lib/Frontend/CompilerInvocation.cpp |
 | clang/test/CodeGen/denormalfpmode-f32.c |
Commit
a80081763cb3792bf69ee95ee73c8754f1bfe074
by flo[SimplifyCFG] Avoid shifting by a too large exponent.
TI->getBitWidth can be > 64 and in those cases the shift will be UB due to the exponent being too large.
To fix this, cap the shift at 63. I think this should work out fine, because TableSize is itself a 64 bit type and the maximum table size must fit in the type. Also, if we would underestimate the size here, at most we get an extra ZExt.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D124608
|
 | llvm/test/Transforms/SimplifyCFG/X86/switch-to-lookup-large-types.ll |
 | llvm/lib/Transforms/Utils/SimplifyCFG.cpp |
Commit
371412e065a63107d5d79330da6757ff693d91cc
by a.bataev[COST]Fix crash for non-power-2 vector shuffle mask.
Need to normalizize the mask to avoid possible crashes during attempts to estimate cost of the very long shuffles with non-power-2 number of elements in masks.
|
 | llvm/test/Analysis/CostModel/X86/shuffle-non-pow-2.ll |
 | llvm/lib/Target/X86/X86TargetTransformInfo.cpp |
Commit
b424055b52a52c0a2ae8eb08de1460f7cfb4fb43
by llvm-dev[X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC.
This is a NFC cleanup as part of the work on #55066 - the idea being that we will be able to check for multiple sub lane scales.
|
 | llvm/lib/Target/X86/X86ISelLowering.cpp |
Commit
b3826192fb6e3f7f05ff21911f5f948ad5eabcdc
by npopov[InstCombine] Add additional tests for gep of minus ptrtoint (NFC)
|
 | llvm/test/Transforms/InstCombine/constant-fold-gep.ll |
Commit
47d66255701a5cfeab6c05e3642a2cccf7a4c09f
by jhuber6[OpenMP] Add options to only compile the host or device when offloading
OpenMP recently moved to the new offloading driver, this had the effect of making it more difficult to inspect intermediate code for the device. This patch adds `-foffload-host-only` and `-foffload-device-only` to control which sides get compiled. This will allow users to more easily inspect output without needing the temp files.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124220
|
 | clang/include/clang/Driver/Options.td |
 | clang/test/Driver/openmp-offload-gpu-new.c |
 | clang/lib/Driver/Driver.cpp |
 | clang/test/Driver/cuda-openmp-driver.cu |
Commit
6aeb2a215ee170a0abbc4a99f75065816fcff744
by npopov[InstCombine] Require LoopInfo in test (NFC)
This test case doesn't show what it was intended to without require<loops>.
|
 | llvm/test/Transforms/InstCombine/constant-fold-gep.ll |
Commit
d9c64d33b98be695fc78a65624242033058ed117
by jhuber6[OpenMP] Allow CUDA to be linked with OpenMP using the new driver
After basic support for embedding and handling CUDA files was added to the new driver, we should be able to call CUDA functions from OpenMP code. This patch makes the necessary changes to successfuly link in CUDA programs that were compiled using the new driver. With this patch it should be possible to compile device-only CUDA code (no kernels) and call it from OpenMP as follows:
``` $ clang++ cuda.cu -fopenmp-new-driver -offload-arch=sm_70 -c $ clang++ openmp.cpp cuda.o -fopenmp-new-driver -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_70 ```
Currently this requires using a host variant to suppress the generation of a CPU-side fallback call.
Depends on D120272
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120273
|
 | clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp |
 | clang/test/Driver/linker-wrapper.c |
Commit
5c3837312503b4ef8443951194127c4ba2a03153
by craig.topper[RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW.
It's possible that we have a constant that isn't simm32 so we can't use LUI+ADDIW, but we can use LUI+ADDI. Because ADDI uses a sign extended constant, it's possible that after subtracting it out, we end up with a simm32 that maps to LUI.
This patch detects this case after removing Lo12 and before shifting the value for SLLI.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D124222
|
 | llvm/test/CodeGen/RISCV/rv64zbs.ll |
 | llvm/test/CodeGen/RISCV/imm.ll |
 | llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp |
 | llvm/test/MC/RISCV/rv64zbs-aliases-valid.s |
 | llvm/test/MC/RISCV/rv64i-aliases-valid.s |
Commit
484fcb98883ffc43d2daab6f29e2569399950936
by a.bataev[SLP][NFC]Fix a comment.
|
 | llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp |
Commit
f927be0df8a599942314f761b672cb5faee69a0f
by preames[RISCV] Extract getAllOnesMask helper [nfc]
|
 | llvm/lib/Target/RISCV/RISCVISelLowering.cpp |
Commit
b481512485a87a5510bf28f63cc512ad26c075a8
by paul.walker[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine.
This is essentially a refactoring patch but allows more cases to be caught, hence the output changes to some tests.
Differential Revision: https://reviews.llvm.org/D122994
|
 | llvm/lib/Target/AArch64/AArch64ISelLowering.cpp |
 | llvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll |
 | llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll |
 | llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll |
Commit
813e521e55b11165138b071f446eda94b14570dc
by Joseph.Nash[AMDGPU] Add gfx11 subtarget ELF definition
This is the first patch of a series to upstream support for the new subtarget.
Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>
Patch 1/N for upstreaming AMDGPU gfx11 architectures.
Reviewed By: foad, kzhuravl, #amdgpu
Differential Revision: https://reviews.llvm.org/D124536
|
 | llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml |
 | llvm/tools/llvm-readobj/ELFDumper.cpp |
 | llvm/docs/AMDGPUUsage.rst |
 | llvm/include/llvm/Support/TargetParser.h |
 | clang/test/Misc/target-invalid-cpu-note.c |
 | llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp |
 | llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test |
 | llvm/lib/ObjectYAML/ELFYAML.cpp |
 | llvm/include/llvm/BinaryFormat/ELF.h |
 | llvm/lib/Support/TargetParser.cpp |
 | llvm/lib/Object/ELFObjectFile.cpp |
Commit
3ea191ed03d40489357c5069aedd3383abb3ad58
by preames[RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc]
|
 | llvm/lib/Target/RISCV/RISCVISelLowering.cpp |
Commit
9c8a88382d86c731db3c5c92d8ecd6ef296329ab
by jhuber6[Clang][Docs] Add new offloading flags to the clang documentation
Summary: Some previous patches introduced the `--offload-new-driver` flag, which is a generic way to enable the new driver, and the `--offload-host-only` and `--offload-device-only` flags which allow users to compile for one side, making it easier to inspect intermediate code for offloading compilations. This patch just documents them in the command line reference.
|
 | clang/docs/ClangCommandLineReference.rst |
Commit
7abfaa0a815a37ef6abd3ad7eb169007bdc36619
by dimitry[lldb] Define LLDB_VERSION_PATCH correctly
In commit ccf1469a4cdb lldb got its own generated Version.inc file, with `LLDB_VERSION` macros. However, it used `LLDB_VERSION_PATCHLEVEL` instead of the actually correct `LLDB_VERSION_PATCH`. Correct this.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D124672
|
 | lldb/include/lldb/Version/Version.inc.in |
 | llvm/utils/gn/secondary/lldb/include/lldb/Version/BUILD.gn |
Commit
ef87865b98fa25af1d2c045bab1268b2a1503374
by aaronSilence -Wstrict-prototype diagnostics in C2x mode
This also disables the diagnostic when the user passes -fno-knr-functions.
|
 | clang/test/Sema/c2x-warn-strict-prototypes.c |
 | clang/lib/Sema/SemaType.cpp |
 | clang/docs/ReleaseNotes.rst |
Commit
dcb77643e3440e948010ed8ecb4c2f8fe4fadb93
by david.penryReapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled with clang.
original commit: 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f reverted in: fa49021c68ef7a7adcdf7b8a44b9006506523191
------
This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch.
MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body.
The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues.
It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D122672
|
 | llvm/lib/Target/ARM/ARMSubtarget.h |
 | llvm/test/CodeGen/Thumb2/swp-exitbranchdir.mir |
 | llvm/lib/Target/ARM/ARM.td |
 | llvm/lib/Target/ARM/ARMTargetMachine.cpp |
 | llvm/lib/Target/ARM/ARMSubtarget.cpp |
 | llvm/include/llvm/CodeGen/ModuloSchedule.h |
 | llvm/lib/CodeGen/ModuloSchedule.cpp |
 | llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp |
 | llvm/include/llvm/CodeGen/MachinePipeliner.h |
 | llvm/test/CodeGen/Thumb2/swp-fixedii.mir |
 | llvm/test/CodeGen/ARM/O3-pipeline.ll |
 | llvm/lib/Target/ARM/ARMBaseInstrInfo.h |
 | llvm/lib/CodeGen/MachinePipeliner.cpp |
Commit
ec6d1a0278dd22606f085c3584e8d3a26a4478c1
by Joseph.NashFix sphinx build error in AMDGPUUsage.rst
Corrects error from 813e521e55b11165138b071f446eda94b14570dc
|
 | llvm/docs/AMDGPUUsage.rst |
Commit
6f79700830292d86afec5f3cf5143b00e6f3f1fd
by isanbard[randstruct] Automatically randomize a structure of function pointers
Strutures of function pointers are a good surface area for attacks. We should therefore randomize them unless explicitly told not to.
Reviewed By: aaron.ballman, MaskRay
Differential Revision: https://reviews.llvm.org/D123544
|
 | clang/lib/Sema/SemaDecl.cpp |
 | clang/unittests/AST/RandstructTest.cpp |
Commit
dca2bc408186667346ab3bbb951adab44feba5bd
by jinghamAdd a mutex to the ThreadPlanStackMap class. We've seen very occasional crashes that we can only explain by simultaneous access to the ThreadPlanStackMap, so I'm adding a mutex to protect it.
Differential Revision: https://reviews.llvm.org/D124029
|
 | lldb/include/lldb/Target/ThreadPlanStack.h |
 | lldb/source/Target/ThreadPlanStack.cpp |
Commit
f735b3a2b0ce348a5e3149622325c2d676703ef0
by Vitaly Buka[mlir] Prevent argStorage relocations
This fixes msan reports like https://reviews.llvm.org/P8285
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D124576
|
 | mlir/lib/Dialect/ControlFlow/IR/ControlFlowOps.cpp |
Commit
aa7470a1b31346f3509ba4b831be39fa1d327772
by jinghamAdd a paragraph showing how to use container commands.
Differential Revision: https://reviews.llvm.org/D124028
|
 | lldb/docs/use/python-reference.rst |
Commit
5a7936401c0a5a8959273108051b094ffba84b8c
by pklausler[flang] Fix build bot problem
A recent change is eliciting a valid warning from the out-of-tree flang build bot; fix by using a reference in a range-based for().
Differential Revision: https://reviews.llvm.org/D124682
|
 | flang/lib/Lower/ConvertVariable.cpp |
Commit
51e02409f0220c796d34f72b3a4b8ba3d3f34cb9
by Stanislav.Mekhanoshin[AMDGPU] Produce waitcounts for LDS DMA
MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written can be accessed. A load from LDS to VMEM does not need a wait.
Differential Revision: https://reviews.llvm.org/D124626
|
 | llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |
 | llvm/test/CodeGen/AMDGPU/lds-dma-waitcnt.mir |
Commit
8bdfc73f633dca9859123b8596bcb521700c6a7f
by Joseph.Nash[AMDGPU][clang] Definition of gfx11 subtarget
Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>
Patch 2/N for upstreaming of AMDGPU gfx11 architecture
Depends on D124536
Reviewed By: foad, kzhuravl, #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D124537
|
 | clang/lib/Basic/Targets/AMDGPU.cpp |
 | clang/include/clang/Basic/Cuda.h |
 | clang/test/Driver/amdgpu-mcpu.cl |
 | clang/test/Misc/target-invalid-cpu-note.c |
 | clang/lib/Basic/Cuda.cpp |
 | clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp |
 | clang/test/Driver/amdgpu-macros.cl |
 | clang/test/CodeGenOpenCL/amdgpu-features.cl |
 | clang/lib/Basic/Targets/NVPTX.cpp |
Commit
268089b6ac4bca2c87b272a61d7dcc6ad3e752e4
by gclaytonFix the encoding and decoding of UniqueCStringMap<T> objects when saved to cache files.
UniqueCStringMap<T> objects are a std::vector<UniqueCStringMap::Entry> objects where the Entry object contains a ConstString + T. The values in the vector are sorted first by ConstString and then by the T value. ConstString objects are simply uniqued "const char *" values and when we compare we use the actual string pointer as the value we sort by. This caused a problem when we saved the symbol table name indexes and debug info indexes to disk in one process when they were sorted, and then loaded them into another process when decoding them from the cache files. Why? Because the order in which the ConstString objects were created are now completely different and the string pointers will no longer be sorted in the new process the cache was loaded into.
The unit tests created for the initial patch didn't catch the encoding and decoding issues of UniqueCStringMap<T> because they were happening in the same process and encoding and decoding would end up createing sorted UniqueCStringMap<T> objects due to the constant string pool being exactly the same.
This patch does the sort and also reserves the right amount of entries in the UniqueCStringMap::m_map prior to adding them all to avoid doing multiple allocations.
Added a unit test that loads an object file from yaml, and then I created a cache file for the original file and removed the cache file's signature mod time check since we will generate an object file from the YAML, and use that as the object file for the Symtab object. Then we load the cache data from the array of symtab cache bytes so that the ConstString "const char *" values will not match the current process, and verify we can lookup the 4 names from the object file in the symbol table.
Differential Revision: https://reviews.llvm.org/D124572
|
 | lldb/source/Plugins/SymbolFile/DWARF/NameToDIE.cpp |
 | lldb/unittests/Symbol/SymtabTest.cpp |
 | lldb/source/Symbol/Symtab.cpp |
Commit
2a84a86184392a7e18a958f36db0b2b3da6ae2bd
by martin[lldb] Fix initialization of LazyBool/bool variables m_overwrite/m_overwrite_lazy. NFCI.
This silences a GCC warning after 1f7b58f2a50461493f083b2ed807b25e036286f6 / D122680:
lldb/source/Commands/CommandObjectCommands.cpp:1650:22: warning: enum constant in boolean context [-Wint-in-bool-context] 1650 | bool m_overwrite = eLazyBoolCalculate; | ^~~~~~~~~~~~~~~~~~
Differential Revision: https://reviews.llvm.org/D123204
|
 | lldb/source/Commands/CommandObjectCommands.cpp |
Commit
68ee5ad0082c0a72ff16a6571c9db7054afd6ea3
by aperry[flang] Update Google Doc link for Flang Biweekly Sync call notes
Notes from the Flang Biweekly Sync calls have been merged into the same document as the notes from the Flang Technical calls. This patch updates the link in the GettingInvolved document to point to the new location.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D124689
|
 | flang/docs/GettingInvolved.md |
Commit
6e689cbaf412effbef392acbb3d123ad2c3b8eb5
by tejohnson[memprof] Correct comment in test [NFC]
Correct comment referring incorrectly to address sanitizer (from which the memprof tests were originally forked).
|
 | llvm/test/Instrumentation/HeapProfiler/basic.ll |
Commit
d85eb4e2d62e51645922ec17678a319b3c7d872c
by clattner[AsmParser] Introduce a new "Argument" abstraction + supporting logic
MLIR has a common pattern for "arguments" that uses syntax like `%x : i32 {attrs} loc("sourceloc")` which is implemented in adhoc ways throughout the codebase. The approach this uses is verbose (because it is implemented with parallel arrays) and inconsistent (e.g. lots of things drop source location info).
Solve this by introducing OpAsmParser::Argument and make addRegion (which sets up BlockArguments for the region) take it. Convert the world to propagating this down. This means that we correctly capture and propagate source location information in a lot more cases (e.g. see the affine.for testcase example), and it also simplifies much code.
Differential Revision: https://reviews.llvm.org/D124649
|
 | mlir/include/mlir/IR/FunctionImplementation.h |
 | mlir/lib/Parser/Parser.cpp |
 | mlir/lib/Dialect/Affine/IR/AffineOps.cpp |
 | mlir/lib/Dialect/PDLInterp/IR/PDLInterp.cpp |
 | mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp |
 | mlir/include/mlir/IR/OpImplementation.h |
 | mlir/test/Dialect/GPU/invalid.mlir |
 | mlir/lib/Dialect/Async/IR/Async.cpp |
 | mlir/lib/Dialect/GPU/IR/GPUDialect.cpp |
 | mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp |
 | flang/lib/Optimizer/Dialect/FIROps.cpp |
 | mlir/test/IR/locations.mlir |
 | mlir/lib/IR/FunctionImplementation.cpp |
 | mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp |
 | mlir/lib/Dialect/SPIRV/IR/SPIRVOps.cpp |
 | mlir/lib/Parser/AttributeParser.cpp |
 | mlir/test/lib/Dialect/Test/TestDialect.cpp |
 | mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp |
 | mlir/lib/Dialect/SCF/SCF.cpp |
Commit
ac33c335bd9997e80c25c384b5a559cdeac6ba68
by spatel[InstCombine] add tests for FP<->int casts; NFC
This overlaps with at least some existing tests, but the smaller types should be faster for alive2 to verify. We know that at least one of these is currently wrong (miscompile) as shown in #55150.
|
 | llvm/test/Transforms/InstCombine/sitofp.ll |
Commit
c428a3d2a09e2d144911290920b1fa59953d7898
by congzhecao[LoopCacheAnalysis] Enable delinearization of fixed sized arrays
Currently loop cache cost (LCC) cannot analyze fix-sized arrays since it cannot delinearize them. This patch adds the capability to delinearize fix-sized arrays to LCC. Most of the code is ported from DependenceAnalysis.cpp and some refactoring will be done in a next patch.
Reviewed By: #loopoptwg, Meinersbur
Differential Revision: https://reviews.llvm.org/D122857
|
 | llvm/lib/Analysis/LoopCacheAnalysis.cpp |
 | llvm/include/llvm/Analysis/LoopCacheAnalysis.h |
 | llvm/test/Analysis/LoopCacheAnalysis/PowerPC/LoopnestFixedSize.ll |