Changes from Git (git http://labmaster3.local/git/llvm-project.git)


  1. [DebugInfo][InstrRef][NFC] Test changes: DBG_VALUE to DBG_INSTR_REF (details)
  2. Update unit test API usage (NFC) (details)
  3. OpenMP: Correctly query location for amdgpu-arch (details)
  4. [DAG] Add tests for fpsti.sat for various architectures. NFC (details)
  5. [DebugInfo][InstrRef] Preserve properties of restored variables (details)
  6. [InstCombine] try to fold 'or' into 'mul' operand (details)
  7. [ELF] --cref: If -Map is specified, print to the map file (details)
  8. [unroll] Split full exact and full bound unroll costing [NFC] (details)
  9. [DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG (details)
  10. [unroll] Reduce scope of UnrollFactor variable in computeUnrollCount [NFC] (details)
  11. [unroll] Use early return in shouldPartialUnroll [nfc] (details)
  12. [DebugInfo][InstrRef][NFC] "Final" x86 test cleanup (details)
  13. [SCEVExpander] Drop poison generating flags when reusing instructions (details)
  14. [CVP] Remove ashr of -1 or 0 (details)
  15. [DebugInfo][InstrRef] Terminate overlapping variable fragments (details)
  16. [clang-tidy] Fix pr48613: "llvm-header-guard uses a reserved identifier" (details)
  17. [openmp][devicertl] Add a missing loader_uninitialized attribute (details)
  18. [lldb][NFC] Format lldb/include/lldb/Symbol/Type.h (details)
  19. [NFC][Regalloc] Split canEvictInterference into hint and general (details)
  20. [Demangle] Add support for D simple single qualified names (details)
  21. [Demangle] Add support for multiple identifiers in D qualified names (details)
  22. [Demangle] Add support for D anonymous symbols (details)
  23. Tests for D112754 (details)
  24. X86: Fold masked-merge when and-not is not available (details)
  25. [mlir][sparse] generalize sparse tensor output implementation (details)
  26. Add missing header (details)
  27. Revert "[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h" (details)
  28. [sanitizer] Add Leb128 encoding/decoding (details)
  29. [NFC] Header comment in referred to Aarch64 (details)
  30. [RISCV] Add a test case to show the bug in RISCVFrameLowering. (details)
  31. [RISCV] Fix a bug in RISCVFrameLowering. (details)
  32. [NFC][sanitizer] Track progress of populating the block (details)
  33. [RISCV] Promote f16 log/pow/exp/sin/cos/etc. to f32 libcalls. (details)
  34. [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB (details)
  35. [AMDGPU] Enable copy between VGPR and AGPR classes during regalloc (details)
  36. [DebugInfo] Do not replace existing nodes from DICompileUnit (details)
  37. [mlir][python] Add pyi stub files to enable auto completion. (details)
  38. [mlir][python] Implement more SymbolTable methods. (details)
  39. [mlir][python] Audit and fix a lot of the Python pyi stubs. (details)
  40. [X86][clang] Enable floating-point type for -mno-x87 option on 32-bits (details)
  41. [ELF] Move GOT/PLT relocation code closer. NFC (details)
  42. [clang-tidy] Warn on functional C-style casts (details)
  43. [ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards (details)
  44. [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` (details)
  45. [llvm-profgen] Compute and show profile density (details)
  46. [PR52549][clang-cl] Predefine _MSVC_EXECUTION_CHARACTER_SET (details)
Commit 32815bc907c4bf25866545da93b00b5d6c2ce45f by jeremy.morse
[DebugInfo][InstrRef][NFC] Test changes: DBG_VALUE to DBG_INSTR_REF

This patch contains a bunch of replacements of:

    DBG_VALUE $somereg


    SOMEINST debug-instr-number1
    DBG_INSTR_REF 1, 0, ...

It's mostly SelectionDAG tests that are making sure that the variable
location assignment is placed in the correct position in the instructions.

To avoid a loss in test coverage of SelectionDAG, which is used by a lot
of different backends, all these tests now have two modes and sets of RUN
lines, one for DBG_VALUE mode, the other for instruction referencing.

Differential Revision:
The file was modifiedllvm/test/DebugInfo/X86/pr34545.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-dbgvalue-phi-use-2.ll
The file was modifiedllvm/test/DebugInfo/X86/pr40427.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-dbgvalue-phi-use-3.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-dbgvalue-phi-use-1.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-transfer-dbgvalue.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-dbgvalue-phi-use-4.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-salvage-add.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-dangling-dbgvalue.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-dbgvalue-ssareg.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-ir-salvage.ll
Commit 4f215bfa6ee525b245b81462d75c3e1e47d18f13 by Adrian Prantl
Update unit test API usage (NFC)
The file was modifiedlldb/unittests/Platform/PlatformAppleSimulatorTest.cpp
Commit 935abeaace123e5f11792a5175079d974d0a0be8 by Matthew.Arsenault
OpenMP: Correctly query location for amdgpu-arch

This was trying to figure out the build path for amdgpu-arch, and
making assumptions about where it is which were not working on my
system. Whether a standalone build or not, we should have a proper
imported target to get the location from.
The file was modifiedopenmp/libomptarget/plugins/amdgpu/CMakeLists.txt
Commit 410d276400a9ee2440387d372db6b0f112853cc0 by
[DAG] Add tests for fpsti.sat for various architectures. NFC
The file was addedllvm/test/CodeGen/RISCV/fpclamptosat_vec.ll
The file was addedllvm/test/CodeGen/WebAssembly/fpclamptosat.ll
The file was addedllvm/test/CodeGen/ARM/fpclamptosat_vec.ll
The file was addedllvm/test/CodeGen/RISCV/fpclamptosat.ll
The file was addedllvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
The file was addedllvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll
The file was addedllvm/test/CodeGen/X86/fpclamptosat.ll
The file was addedllvm/test/CodeGen/X86/fpclamptosat_vec.ll
The file was addedllvm/test/CodeGen/AArch64/fpclamptosat.ll
Commit 9cf31b8d39d67843eeb314bacf6f78a1c969e1cc by jeremy.morse
[DebugInfo][InstrRef] Preserve properties of restored variables

InstrRefBasedLDV observes when variable locations are clobbered, scans what
values are available in the machine, and re-issues a DBG_VALUE for the
variable if it can find another location. Unfortunately, I hadn't joined up
the Indirectness flag, so if it did this to an Indirect Value, the
indirectness would be dropped.

Fix this, and add a test that if we clobber a variable value (on the stack
in this case), then the recovered variable location keeps the Indirect

Differential Revision:
The file was addedllvm/test/DebugInfo/MIR/InstrRef/restore-clobber-with-indirectness.mir
The file was modifiedllvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
Commit 99f8b795cc03f9bcda7f9cbd9625c2976ae62bd5 by spatel
[InstCombine] try to fold 'or' into 'mul' operand

or (mul X, Y), X --> mul X, (add Y, 1) (when the multiply has no common bits with X)

We already have this fold if the pattern ends in 'add', but we can miss it if the
'add' becomes 'or' via another no-common-bits transform.

This is part of fixing:
...but it won't make a difference on that example yet.

Differential Revision:
The file was modifiedllvm/test/Transforms/InstCombine/or.ll
The file was modifiedllvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
Commit 1ce51a5f355ffba72b01e5e688cda7bbba2aa282 by i
[ELF] --cref: If -Map is specified, print to the map file

PR48282: This behavior matches GNU ld and gold.

Reviewed By: markj

Differential Revision:
The file was modifiedlld/ELF/Writer.cpp
The file was modifiedlld/ELF/MapFile.cpp
The file was modifiedlld/ELF/
The file was modifiedlld/docs/ReleaseNotes.rst
The file was modifiedlld/test/ELF/cref.s
The file was modifiedlld/ELF/MapFile.h
The file was modifiedlld/docs/ld.lld.1
Commit 829b62adf5db189843b9a9ce626dfef97f76059f by listmail
[unroll] Split full exact and full bound unroll costing [NFC]

This change should be NFC. It's posted for review mostly to make sure others are happy with the names I'm introducing for "exact full unroll" and "bounded full unroll". The motivation here is that our cost model for bounded unrolling is too aggressive - it gives benefits for exits we aren't going to prune - but I also just think the new version of the code is a lot easier to follow.

Differential Revision:
The file was modifiedllvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
Commit a20987adf4f80e2657eb3032a5a91e13f58106a0 by jeremy.morse
[DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG

Usually dbg.declares get translated into either entries in an MF
side-table, or a DBG_VALUE on entry to the function with IsIndirect set
(including in instruction referencing mode). Much rarer is a dbg.declare
attached to a non-argument value, such as in the test added in this patch
where there's a variable-length-array. Such dbg.declares become SDDbgValue
nodes with InIndirect=true.

As it happens, we weren't correctly emitting DBG_INSTR_REFs with the
additional indirection. This patch adds the extra indirection, encoded as
adding an additional DW_OP_deref to the expression.

Differential Revision:
The file was addedllvm/test/DebugInfo/X86/instr-ref-dbg-declare.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
Commit a655e0f991ba59f34fc24c44d04bbc56ff564c3e by listmail
[unroll] Reduce scope of UnrollFactor variable in computeUnrollCount [NFC]

Suggested in review of D114453, done as a separate change to get all uses at once.
The file was modifiedllvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
Commit f50207c015df91132efe135fd64c3c5bb36c0909 by listmail
[unroll] Use early return in shouldPartialUnroll [nfc]
The file was modifiedllvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
Commit fc9dae420c0c7f0f4667e0aa9f3d37d72b2a9906 by jeremy.morse
[DebugInfo][InstrRef][NFC] "Final" x86 test cleanup

These are some final test changes for using instruction referencing on X86:
* Most of these tests just have the flag switched so that they run with
   instr-ref, and just work: these tests were fixed by earlier patches.
* There are some spurious differences in textual outputs,
* A few have different temporary labels in the output because more
   MCSymbols are printed to the output.

Differential Revision:
The file was modifiedllvm/test/DebugInfo/X86/sdagsplit-1.ll
The file was modifiedllvm/test/CodeGen/MIR/X86/diexpr-win32.mir
The file was modifiedllvm/test/DebugInfo/COFF/fpo-stack-protect.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-split-arg.ll
The file was modifiedllvm/test/DebugInfo/MIR/X86/entry-value-of-modified-param.mir
The file was modifiedllvm/test/CodeGen/X86/2010-05-26-DotDebugLoc.ll
The file was modifiedllvm/test/DebugInfo/X86/dbg-addr-dse.ll
The file was modifiedllvm/test/tools/llvm-locstats/locstats.ll
The file was modifiedllvm/test/DebugInfo/COFF/fpo-shrink-wrap.ll
The file was modifiedllvm/test/CodeGen/X86/fast-regalloc-live-out-debug-values.mir
The file was modifiedllvm/test/DebugInfo/MIR/X86/kill-entry-value-after-diamond-bbs.mir
The file was modifiedllvm/test/DebugInfo/COFF/types-array-advanced.ll
The file was modifiedllvm/test/DebugInfo/MIR/X86/live-debug-values-restore.mir
Commit 8906a0fe64abf1a9c8641ee51908bba7cbf8ec54 by listmail
[SCEVExpander] Drop poison generating flags when reusing instructions

The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct.

This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid.

In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled.

On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE.

The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB.

Differential Revision:
The file was modifiedllvm/test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll
The file was modifiedllvm/test/Transforms/IndVarSimplify/lftr-address-space-pointers.ll
The file was modifiedllvm/test/Transforms/IndVarSimplify/pr24783.ll
The file was modifiedllvm/test/Transforms/LoopPredication/basic.ll
The file was modifiedllvm/test/Transforms/IndVarSimplify/promote-iv-to-eliminate-casts.ll
The file was modifiedllvm/test/CodeGen/PowerPC/common-chain.ll
The file was modifiedllvm/test/Transforms/IRCE/non-loop-invariant-rhs-instr.ll
The file was modifiedllvm/test/Transforms/IndVarSimplify/lftr-reuse.ll
The file was modifiedllvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll
The file was modifiedllvm/test/Transforms/IndVarSimplify/ashr-expansion.ll
The file was modifiedllvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
Commit 45ecfed6c636d06f76bca0a44803e945cdae9506 by listmail
[CVP] Remove ashr of -1 or 0

Fixes PR#52190. There is already a check for converting ashr instructions with non-negative left-hand sides into lshr; this patch adds an optimization to remove ashr altogether if the left-hand side is known to be in the range [-1, 1).

Differential Revision:
The file was modifiedllvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
The file was modifiedllvm/test/Transforms/CorrelatedValuePropagation/ashr.ll
Commit 0eee844539e406dfa8010a129ea3655d2298ac10 by jeremy.morse
[DebugInfo][InstrRef] Terminate overlapping variable fragments

If we have a variable where its fragments are split into overlapping

    DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16)
    DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32)

we should only propagate the most recently assigned fragment out of a
block. LiveDebugValues only deals with live-in variable locations, as
overlaps within blocks is DbgEntityHistoryCalculators domain.

InstrRefBasedLDV has kept the accumulateFragmentMap method from
VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's
produced a mapping of variable / fragments to the overlapped variable /
fragments, VLocTracker uses it to identify when a debug instruction needs
to terminate the other parts it overlaps with. The test is updated for
some standard "InstrRef picks different registers" variation, and the
order of some unrelated DBG_VALUEs changes.

Differential Revision:
The file was modifiedllvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h
The file was modifiedllvm/unittests/CodeGen/InstrRefLDVTest.cpp
The file was modifiedllvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
The file was modifiedllvm/test/DebugInfo/MIR/X86/live-debug-values-fragments.mir
Commit c7aa358798e6330593fd5cc2ff4caf6bc15ba3c9 by mail
[clang-tidy] Fix pr48613: "llvm-header-guard uses a reserved identifier"


llvm-header-guard is suggesting header guards with leading underscores
if the header file path begins with a '/' or similar special character.
Only reserved identifiers should begin with an underscore.

Differential Revision:
The file was modifiedclang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
The file was modifiedclang-tools-extra/unittests/clang-tidy/LLVMModuleTest.cpp
The file was modifiedclang-tools-extra/clang-tidy/utils/HeaderGuard.h
Commit 3ab150f6e44b99dea855024c48d0878eb55ae3d0 by jonathanchesterfield
[openmp][devicertl] Add a missing loader_uninitialized attribute
The file was modifiedopenmp/libomptarget/DeviceRTL/src/Debug.cpp
Commit 6f99e1aa58e3566fcce689bc986b7676e818c038 by contact
[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h

Reviewed By: teemperor, JDevlieghere

Differential Revision:

Signed-off-by: Luís Ferreira <>
The file was modifiedlldb/source/Symbol/Type.cpp
The file was modifiedlldb/include/lldb/Symbol/Type.h
Commit e8b8304d76ccefa9880bbb352d9f81f330ef1ea1 by mtrofin
[NFC][Regalloc] Split canEvictInterference into hint and general

There are 2 eviction queries. One is made by tryAssign, when it attempts to
free an interference occupying the hint of the candidate. The other is
during 'regular' interference resolution, where we scan over all
physical registers and try to see if we can evict live ranges in favor
of the candidate. We currently use the same logic in both cases, just
that the former never passes the cost to any subsequent query.
Technically, the 2 decisions could be implemented with different

This patch splits the 2.


Differential Revision:
The file was modifiedllvm/lib/CodeGen/RegAllocGreedy.cpp
Commit e63c799a767b0f682af62eba9d1d375c59e58627 by dblaikie
[Demangle] Add support for D simple single qualified names

    This patch adds support for simple single qualified names that includes
    internal mangled names and normal symbol names.

Differential Revision:
The file was modifiedllvm/unittests/Demangle/DLangDemangleTest.cpp
The file was modifiedllvm/lib/Demangle/DLangDemangle.cpp
Commit 6e08abdc256bb9c2158ab5dbfa082a78faa3543a by dblaikie
[Demangle] Add support for multiple identifiers in D qualified names

Reviewed By: dblaikie

Differential Revision:
The file was modifiedllvm/lib/Demangle/DLangDemangle.cpp
The file was modifiedllvm/unittests/Demangle/DLangDemangleTest.cpp
Commit b779f02a1cb73bb3885e2059e418dfc1c16d25e2 by dblaikie
[Demangle] Add support for D anonymous symbols

    Anonymous symbols are represented by 0 in the mangled symbol. We should skip
    them in order to represent the demangled name correctly, otherwise demangled
    names like `demangle..anon` can happen.

Reviewed By: dblaikie

Differential Revision:
The file was modifiedllvm/unittests/Demangle/DLangDemangleTest.cpp
The file was modifiedllvm/lib/Demangle/DLangDemangle.cpp
Commit 53dfa52546833d4c8443d976e67fef820ff54426 by Matthias Braun
Tests for D112754

Differential Revision:
The file was addedllvm/test/CodeGen/X86/fold-masked-merge.ll
Commit 87ba99c263afd4c1c090c17eaf51089b1edbc280 by Matthias Braun
X86: Fold masked-merge when and-not is not available

Differential Revision:
The file was modifiedllvm/lib/Target/X86/X86ISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/or-lea.ll
The file was modifiedllvm/test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll
The file was modifiedllvm/test/CodeGen/X86/fold-masked-merge.ll
The file was modifiedllvm/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
Commit 7d4da4e1ab7f79e51db0d5c2a0f5ef1711122dd7 by ajcbik
[mlir][sparse] generalize sparse tensor output implementation

Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").

Reviewed By: bixia

Differential Revision:
The file was modifiedmlir/test/Dialect/SparseTensor/sparse_out.mlir
The file was addedmlir/test/Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir
The file was modifiedmlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
The file was modifiedmlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
The file was modifiedmlir/test/Integration/Dialect/SparseTensor/CPU/sparse_vector_ops.mlir
The file was modifiedmlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
Commit bd4c6a476fd037fb07a1c484f75d93ee40713d3d by dblaikie
Add missing header
The file was modifiedllvm/lib/Demangle/DLangDemangle.cpp
Commit 2e5c47eda14a547c21e57d869a1e51ffd9938289 by contact
Revert "[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h"

This reverts commit 6f99e1aa58e3566fcce689bc986b7676e818c038.
The file was modifiedlldb/include/lldb/Symbol/Type.h
The file was modifiedlldb/source/Symbol/Type.cpp
Commit 25a7e4b9f7c60883c677a246641287744b0bb479 by Vitaly Buka
[sanitizer] Add Leb128 encoding/decoding

Reviewed By: dvyukov, kstoimenov

Differential Revision:
The file was addedcompiler-rt/lib/sanitizer_common/sanitizer_leb128.h
The file was modifiedcompiler-rt/lib/sanitizer_common/tests/CMakeLists.txt
The file was modifiedcompiler-rt/lib/sanitizer_common/CMakeLists.txt
The file was addedcompiler-rt/lib/sanitizer_common/tests/sanitizer_leb128_test.cpp
Commit fde937748b7def9f9d349b85bf9077f07a84b724 by mtrofin
[NFC] Header comment in referred to Aarch64

Differential Revision:
The file was modifiedllvm/lib/Target/X86/
Commit 4ae2222e143b8541b6567f9852d9600a17cc9426 by
[RISCV] Add a test case to show the bug in RISCVFrameLowering.

If the number of arguments is too large to use register passing, it
needs to occupy stack space to pass the arguments to the callee. There
are two scenarios. One is to reserve the space in prologue and the other
is to reserve the space before the function calls. When we need to
reserve the stack space before function calls, the stack pointer is
adjusted. Under the scenario, we should not use stack pointer to access
the stack objects. It looks like,

callseq_start  ->  sp = sp - reserved_space
// We should not use SP to access stack objects in this area.
call @foo
callseq_end    ->  sp = sp + reserved_space

Differential Revision:
The file was addedllvm/test/CodeGen/RISCV/rvv/no-reserved-frame.ll
Commit 9a88566537177df75af1fcde69e0626fed2b1145 by
[RISCV] Fix a bug in RISCVFrameLowering.

When we have out-going arguments passing through stack and we do not
reserve the stack space in the prologue. Use BP to access stack objects
after adjusting the stack pointer before function calls.

callseq_start  ->  sp = sp - reserved_space
// Use FP to access fixed stack objects.
// Use BP to access non-fixed stack objects.
call @foo
callseq_end    ->  sp = sp + reserved_space

Differential Revision:
The file was modifiedllvm/test/CodeGen/RISCV/rvv/no-reserved-frame.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVFrameLowering.cpp
Commit a06d3527563503f17794bf119ee471d0ca2669ca by Vitaly Buka
[NFC][sanitizer] Track progress of populating the block

In multi-threaded application concurrent StackStore::Store may
finish in order different from assigned Id. So we can't assume
that after we switch writing the next block the previous is done.

The workaround is to count exact number of uptr stored into the block,
including skipped tail/head which were not able to fit entire trace.

Depends on D114490.

Reviewed By: morehouse

Differential Revision:
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_stack_store.cpp
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_stack_store.h
The file was modifiedcompiler-rt/lib/sanitizer_common/tests/sanitizer_stack_store_test.cpp
Commit b121d23a9cea711e832505c0b2495de6a51591c1 by craig.topper
[RISCV] Promote f16 log/pow/exp/sin/cos/etc. to f32 libcalls.

Prevents crashes or cannot select errors.

Reviewed By: frasercrmck

Differential Revision:
The file was modifiedllvm/test/CodeGen/RISCV/half-intrinsics.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.cpp
Commit f1d8345a2ab3c343929212d1c62174cfaa46e71a by carrot
[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB

Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example

    %reg101 = ...
            = ... reg101
    %reg103 = ADD %reg101, %reg102

We can create mapping between %reg101 and %reg103.

Differential Revision:
The file was modifiedllvm/test/CodeGen/X86/vector-fshr-128.ll
The file was modifiedllvm/test/CodeGen/ARM/usat.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-ashr-sub128.ll
The file was modifiedllvm/test/CodeGen/X86/bmi2.ll
The file was modifiedllvm/test/CodeGen/X86/bitreverse.ll
The file was modifiedllvm/test/CodeGen/X86/setcc-combine.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-lshr-128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
The file was modifiedllvm/test/CodeGen/SystemZ/int-div-04.ll
The file was modifiedllvm/test/CodeGen/X86/combine-sdiv.ll
The file was modifiedllvm/test/CodeGen/X86/pmulh.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc-ssat.ll
The file was modifiedllvm/test/CodeGen/X86/vector-narrow-binop.ll
The file was modifiedllvm/test/CodeGen/X86/sat-add.ll
The file was modifiedllvm/test/CodeGen/X86/vec_ctbits.ll
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll
The file was modifiedllvm/test/CodeGen/X86/rem.ll
The file was modifiedllvm/lib/CodeGen/TwoAddressInstructionPass.cpp
The file was modifiedllvm/test/CodeGen/SystemZ/int-div-01.ll
The file was modifiedllvm/test/CodeGen/X86/nontemporal-loads.ll
The file was modifiedllvm/test/CodeGen/X86/vector-popcnt-128.ll
The file was modifiedllvm/test/CodeGen/ARM/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
The file was modifiedllvm/test/CodeGen/X86/divide-by-constant.ll
The file was modifiedllvm/test/CodeGen/X86/ctpop-combine.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc-packus.ll
The file was modifiedllvm/test/CodeGen/X86/lzcnt-cmp.ll
The file was modifiedllvm/test/CodeGen/X86/vector-bitreverse.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
The file was modifiedllvm/test/CodeGen/X86/sdiv_fix_sat.ll
The file was modifiedllvm/test/CodeGen/ARM/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
The file was modifiedllvm/test/CodeGen/X86/vec_umulo.ll
The file was modifiedllvm/test/CodeGen/X86/vector-lzcnt-sub128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining-avx512bw.ll
The file was modifiedllvm/test/CodeGen/X86/64-bit-shift-by-32-minus-y.ll
The file was modifiedllvm/test/CodeGen/X86/horizontal-sum.ll
The file was modifiedllvm/test/CodeGen/X86/umul-with-overflow.ll
The file was modifiedllvm/test/CodeGen/X86/haddsub-3.ll
The file was modifiedllvm/test/CodeGen/X86/vector-fshl-128.ll
The file was modifiedllvm/test/CodeGen/X86/vector-trunc-usat.ll
The file was modifiedllvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll
The file was modifiedllvm/test/CodeGen/X86/sse3-avx-addsub-2.ll
The file was modifiedllvm/test/CodeGen/X86/urem-seteq-vec-nonsplat.ll
The file was modifiedllvm/test/CodeGen/X86/vec-strict-cmp-128.ll
The file was modifiedllvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
The file was modifiedllvm/test/CodeGen/X86/haddsub.ll
The file was modifiedllvm/test/CodeGen/X86/atomic-unordered.ll
The file was modifiedllvm/test/CodeGen/X86/combine-bitselect.ll
The file was modifiedllvm/test/CodeGen/ARM/fpclamptosat.ll
The file was modifiedllvm/test/CodeGen/X86/horizontal-reduce-fadd.ll
The file was modifiedllvm/test/CodeGen/X86/shift-combine.ll
The file was modifiedllvm/test/CodeGen/X86/vselect-packss.ll
The file was modifiedllvm/test/CodeGen/X86/omit-urem-of-power-of-two-or-zero-when-comparing-with-zero.ll
The file was modifiedllvm/test/CodeGen/X86/umul_fix_sat.ll
The file was modifiedllvm/test/CodeGen/SystemZ/int-div-03.ll
The file was modifiedllvm/test/CodeGen/X86/smul_fix_sat.ll
The file was modifiedllvm/test/CodeGen/X86/vector-tzcnt-128.ll
The file was modifiedllvm/test/CodeGen/X86/uadd_sat_vec.ll
The file was modifiedllvm/test/CodeGen/X86/vector-lzcnt-128.ll
The file was modifiedllvm/test/CodeGen/X86/smul_fix.ll
The file was modifiedllvm/test/CodeGen/X86/shl-crash-on-legalize.ll
The file was modifiedllvm/test/CodeGen/X86/uadd_sat.ll
The file was modifiedllvm/test/CodeGen/X86/bypass-slow-division-32.ll
The file was modifiedllvm/test/CodeGen/Thumb/srem-seteq-illegal-types.ll
The file was modifiedllvm/test/CodeGen/X86/fpclamptosat.ll
The file was modifiedllvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
The file was modifiedllvm/test/CodeGen/X86/umul_fix.ll
The file was modifiedllvm/test/CodeGen/X86/vector-ext-logic.ll
The file was modifiedllvm/test/CodeGen/X86/popcnt.ll
The file was modifiedllvm/test/CodeGen/SystemZ/int-mul-08.ll
The file was modifiedllvm/test/CodeGen/X86/vector-popcnt-128-ult-ugt.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shuffle-combining-avx512vbmi.ll
The file was modifiedllvm/test/CodeGen/X86/pull-binop-through-shift.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-ashr-128.ll
The file was modifiedllvm/test/CodeGen/ARM/ssat.ll
The file was modifiedllvm/test/CodeGen/X86/vector-shift-lshr-sub128.ll
The file was modifiedllvm/test/CodeGen/X86/8bit_cmov_of_trunc_promotion.ll
The file was modifiedllvm/test/CodeGen/X86/slow-pmulld.ll
The file was modifiedllvm/test/CodeGen/X86/haddsub-shuf.ll
Commit 5297cbf04532f61fe18570982f4f2a3095d08c13 by Christudasan.Devadasan
[AMDGPU] Enable copy between VGPR and AGPR classes during regalloc

Greedy register allocator prefers to move a constrained
live range into a larger allocatable class over spilling
them. This patch defines the necessary superclasses for
vector registers. For subtargets that support copy between
VGPRs and AGPRs, the vector register spills during regalloc
now become just copies.

Reviewed By: rampitec, arsenm

Differential Revision:
The file was addedllvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIRegisterInfo.h
The file was modifiedllvm/test/CodeGen/AMDGPU/extend-phi-subrange-not-in-parent.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr.ll
The file was addedllvm/test/CodeGen/AMDGPU/vector-spill-restore-to-other-vector-type.mir
The file was addedllvm/test/CodeGen/AMDGPU/spill-vector-superclass.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
The file was modifiedllvm/lib/Target/AMDGPU/SIInstrInfo.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/spill-agpr.ll
Commit 0150645bf5ae0d55866e77d2bec5aad4e5226b7c by kyulee
[DebugInfo] Do not replace existing nodes from DICompileUnit

When creating a new DIBuilder with an existing DICompileUnit, load the
DINodes from the current DICompileUnit so they don't get overwritten.
This is done in the MachineOutliner pass, but it didn't change the CU so
the bug never appeared. We need this if we ever want to add DINodes to
the CU after it has been created, e.g., DIGlobalVariables.

Reviewed By: dblaikie

Differential Revision:
The file was modifiedllvm/unittests/IR/IRBuilderTest.cpp
The file was modifiedllvm/lib/IR/DIBuilder.cpp
Commit a6e7d024a9ebda1564fd78b829c45169add80864 by stellaraccident
[mlir][python] Add pyi stub files to enable auto completion.

There is no completely automated facility for generating stubs that are both accurate and comprehensive for native modules. After some experimentation, I found that MyPy's stubgen does the best at generating correct stubs with a few caveats that are relatively easy to fix:
  * Some types resolve to cross module symbols incorrectly.
  * staticmethod and classmethod signatures seem to always be completely generic and need to be manually provided.
  * It does not generate an __all__ which, from testing, causes namespace pollution to be visible to IDE code completion.

As a first step, I did the following:
  * Ran `stubgen` for ``, `_mlir.passmanager`, and `_mlirExecutionEngine`.
  * Manually looked for all instances where unnamed arguments were being emitted (i.e. as 'arg0', etc) and updated the C++ side to include names (and re-ran stubgen to get a good initial state).
  * Made/noted a few structural changes to each `pyi` file to make it minimally functional.
  * Added the `pyi` files to the CMake rules so they are installed and visible.

To test, I added a `.env` file to the root of the project with `PYTHONPATH=...` set as per instructions. Then reload the developer window (in VsCode) and verify that completion works for various changes to test cases.

There are still a number of overly generic signatures, but I want to check in this low-touch baseline before iterating on more ambiguous changes. This is already a big improvement.

Differential Revision:
The file was modifiedmlir/lib/Bindings/Python/IRModule.h
The file was modifiedmlir/lib/Bindings/Python/Pass.cpp
The file was addedmlir/python/mlir/_mlir_libs/_mlir/__init__.pyi
The file was addedmlir/python/mlir/_mlir_libs/_mlirExecutionEngine.pyi
The file was modifiedmlir/lib/Bindings/Python/ExecutionEngineModule.cpp
The file was modifiedmlir/lib/Bindings/Python/MainModule.cpp
The file was modifiedmlir/lib/Bindings/Python/IRCore.cpp
The file was modifiedmlir/lib/Bindings/Python/IRAttributes.cpp
The file was modifiedmlir/python/CMakeLists.txt
The file was addedmlir/python/mlir/_mlir_libs/_mlir/ir.pyi
The file was modifiedmlir/lib/Bindings/Python/IRTypes.cpp
The file was modifiedmlir/lib/Bindings/Python/IRAffine.cpp
The file was addedmlir/python/mlir/_mlir_libs/_mlir/passmanager.pyi
Commit bdc3183742f1e996d58bdf23b91966e64ad5e9a3 by stellaraccident
[mlir][python] Implement more SymbolTable methods.

* set_symbol_name, get_symbol_name, set_visibility, get_visibility, replace_all_symbol_uses, walk_symbol_tables
* In integrations I've been doing, I've been reaching for all of these to do both general IR manipulation and module merging.
* I don't love the replace_all_symbol_uses underlying APIs since they necessitate SYMBOL_COUNT walks and have various sharp edges. I'm hoping that whatever emerges eventually for this can still retain this simple API as a one-shot.

Differential Revision:
The file was modifiedmlir/lib/Bindings/Python/IRModule.h
The file was modifiedmlir/test/python/ir/
The file was modifiedmlir/include/mlir-c/IR.h
The file was modifiedmlir/lib/Bindings/Python/IRCore.cpp
The file was modifiedmlir/lib/CAPI/IR/IR.cpp
The file was modifiedmlir/python/mlir/_mlir_libs/_mlir/ir.pyi
The file was addedmlir/test/python/ir/
Commit a88bb5b9fee5aee8c25cabad44a257175e384f52 by stellaraccident
[mlir][python] Audit and fix a lot of the Python pyi stubs.

* Classes that are still todo are marked with "# TODO: Auto-generated. Audit and fix."
* Those without this note have been cross-checked with C++ sources and most have been spot checked by hovering in VsCode.

Differential Revision:
The file was modifiedmlir/python/mlir/_mlir_libs/_mlir/ir.pyi
The file was modifiedmlir/python/mlir/_mlir_libs/_mlir/passmanager.pyi
The file was modifiedmlir/python/mlir/_mlir_libs/_mlir/__init__.pyi
The file was modifiedmlir/python/mlir/_mlir_libs/_mlirExecutionEngine.pyi
Commit 42c15c7edf174fc7a45131a1b89ee816fada7633 by
[X86][clang] Enable floating-point type for -mno-x87 option on 32-bits

We should match GCC's behavior which allows floating-point type for -mno-x87 option on 32-bits.

The previous block issues have partially been fixed by D112143.

Reviewed By: asavonic, nickdesaulniers

Differential Revision:
The file was modifiedclang/lib/Basic/Targets/X86.cpp
The file was modifiedclang/test/Sema/x86-no-x87.cpp
Commit 5047e3a3ba92402b60c200201484b422cad8bea6 by i
[ELF] Move GOT/PLT relocation code closer. NFC
The file was modifiedlld/ELF/Relocations.cpp
Commit 5bbe50148f3b515c170be22209395b72890f5b8c by carlosgalvezp
[clang-tidy] Warn on functional C-style casts

The google-readability-casting check is meant to be on par
with cpplint's readability/casting check, according to the
documentation. However it currently does not diagnose
functional casts, like:

float x = 1.5F;
int y = int(x);

This is detected by cpplint, however, and the guidelines
are clear that such a cast is only allowed when the type
is a class type (constructor call):

> You may use cast formats like `T(x)` only when `T` is a class type.

Therefore, update the clang-tidy check to check this

Differential Revision:
The file was modifiedclang-tools-extra/docs/ReleaseNotes.rst
The file was modifiedclang-tools-extra/clang-tidy/google/AvoidCStyleCastsCheck.cpp
The file was modifiedclang-tools-extra/test/clang-tidy/checkers/google-readability-casting.cpp
Commit 89453ed6f2059b5cec576fc41914def713fe38f7 by ardb
[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards

We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the
stack guard for PIC code that references the GOT, since arm-pseudo may
expand this to the narrow tLDRpci rather than the wider t2LDRpci.

Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci.


Reviewed By: ardb

Differential Revision:
The file was modifiedllvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
The file was modifiedllvm/lib/Target/ARM/Thumb2InstrInfo.cpp
The file was addedllvm/test/CodeGen/ARM/expand-pseudos.ll
The file was modifiedllvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
The file was modifiedllvm/lib/Target/ARM/
Commit 8cd782487fe68082e57d24a576b77f529d77f96c by lebedev.ri
[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`

We ask `TTI.getAddressComputationCost()` about the cost of computing vector address,
and then multiply it by the vector width. This doesn't make any sense,
it implies that we'd do a vector GEP and then scalarize the vector of pointers,
but there is no such thing in the vectorized IR, we perform scalar GEP's.

This is *especially* bad on X86, and was effectively prohibiting any scalarized
vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()`
says that cost of vector address computation is `10` as compared to `1` for scalar.

The computed costs are similar to the ones with D111222+D111220,
but we end up without masked memory intrinsics that we'd then have to
expand later on, without much luck. (D111363)

Differential Revision:
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/scatter-i64-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/masked-scatter-i32-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/scatter-i16-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/gather-i64-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-5.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/scatter-i8-with-i8-index.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/gather-i8-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/scatter-i32-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/masked-scatter-i64-with-i8-index.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/gather-i32-with-i8-index.ll
Commit c2e08aba1afd5a69dbe74b03ce6f463d45102222 by wlei
[llvm-profgen] Compute and show profile density

AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler.

We define the density of a profile Prof as follows:

- For each function A in the profile, density(A) = total_samples(A) / sizeof(A).
- density(Prof) = min(density(A)) for all functions A that are warm (defined below).

A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the `ProfileSummaryBuilder::getHotCountThreshold(..)` as threshold which can be set by percent(`--profile-summary-cutoff-hot`) or by value(`--profile-summary-hot-count`).

We also introduce `--hot-function-density-threshold` to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples.

This also applies for CS profile with all profiles merged into base.

Reviewed By: hoy, wenlei

Differential Revision:
The file was addedllvm/test/tools/llvm-profgen/Inputs/
The file was addedllvm/test/tools/llvm-profgen/profile-density.test
The file was addedllvm/test/tools/llvm-profgen/Inputs/
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.cpp
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.h
The file was modifiedllvm/tools/llvm-profgen/ProfiledBinary.h
Commit 7ba70d32736aef0c640b9d0e7b9081fc208c81c2 by markus.boeck02
[PR52549][clang-cl] Predefine _MSVC_EXECUTION_CHARACTER_SET

Since VS 2022 17.1 MSVC predefines _MSVC_EXECUTION_CHARACTER_SET to inform the users of the execution character set defined at compile time. The value the macro expands to is a Windows Code Page Identifier which are documented here:

As clang currently only supports UTF-8 it is defined as 65001. If clang-cl were to support a different execution character set in the future we'd have to change the value.


Differential Revision:
The file was modifiedclang/lib/Basic/Targets/OSTargets.cpp
The file was modifiedclang/test/Preprocessor/init.c