Changes

Summary

  1. [LoopVectorize] Add vector reduction support for fmuladd intrinsic (details)
  2. [LoopVectorize] Propagate fast-math flags for VPInstruction (details)
  3. [LoopVectorize] Print fast-math flags for VPReductionRecipe (details)
  4. [LoopVectorize][CostModel] Update cost model for fmuladd intrinsic (details)
  5. [lldb/gdb-remote] Remove more non-stop mode remnants (details)
  6. [llvm-reduce] Add parallel chunk processing. (details)
  7. [mlir][linalg][bufferize][NFC] Move tensor interface impl to new build target (details)
  8. [clang-format] NFC - recent changes caused clang-format to no longer be clang-formatted. (details)
  9. [ARM] Add fma and update fadd/fmul predicated select tests. NFC (details)
  10. tsan: extend mmap test (details)
  11. [ARM] Fold floating point select(binop) patterns (details)
  12. [DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag (details)
  13. [NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc (details)
  14. [mlir][linalg][bufferize][NFC] Move vector interface impl to new build target (details)
  15. [ARM] Fold (fadd x, (vselect c, y, -1.0)) into (vselect c, (fadd x, y), x) (details)
  16. [AMDGPU] Only allow implicit WQM in pixel shaders (details)
  17. [LLDB/test] lldbutil check_breakpoint() - check target instance (details)
  18. [AMDGPU] Only select VOP3 forms of VOP2 instructions (details)
  19. [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 (details)
  20. [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (details)
  21. [DebugInfo] Check both instr-ref and DBG_VALUE modes of sdag tests (details)
  22. Clean up clang-format tech debt. (details)
  23. sanitizer_common: remove SANITIZER_USE_MALLOC (details)
  24. tsan: add another fork deadlock test (details)
  25. [DebugInfo] Adjust x86 location-list tests for instruction referencing (details)
  26. [PowerPC] Provide XL-compatible vec_round implementation (details)
  27. [llvm-dwarfdump][Statistics] Handle LTO cases with cross CU referencing (details)
  28. [InstSimplify] add tests for xor logic; NFC (details)
  29. [InstSimplify] fold xor logic of 2 variables, part 2 (details)
  30. [X86] Add D113970 tests cases for or-lea with no common bits. (details)
  31. [X86] Add BMI test coverage for for or-lea with no common bits tests (details)
  32. [LV] Use patterns in some induction tests, to make more robust. (NFC) (details)
  33. Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl" (details)
  34. [AArch64][SVE] Recognize all ones mask during fixed mask generation (details)
  35. [VPlan] Remove unused VPInstruction constructor. (NFC) (details)
  36. [libc] Fix wrong type for load/store of Repeated elements (details)
Commit c2441b6b89bfe52a16f6c5ed5a0a49c9a02daf2a by rosie.sumpter
[LoopVectorize] Add vector reduction support for fmuladd intrinsic

Enables LoopVectorize to handle reduction patterns involving the
llvm.fmuladd intrinsic.

Differential Revision: https://reviews.llvm.org/D111555
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
The file was modifiedllvm/lib/Analysis/IVDescriptors.cpp
The file was modifiedllvm/include/llvm/Analysis/IVDescriptors.h
The file was modifiedllvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
The file was modifiedllvm/lib/Transforms/Utils/LoopUtils.cpp
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/reduction-inloop.ll
Commit 991074012a6c9a294c5c64cf51502934a8e9bb36 by rosie.sumpter
[LoopVectorize] Propagate fast-math flags for VPInstruction

In-loop vector reductions which use the llvm.fmuladd intrinsic involve
the creation of two recipes; a VPReductionRecipe for the fadd and a
VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags
these should be propagated through to the fmul instruction, so an
interface setFastMathFlags has been added to the VPInstruction class to
enable this.

Differential Revision: https://reviews.llvm.org/D113125
The file was modifiedllvm/lib/IR/Operator.cpp
The file was modifiedllvm/include/llvm/IR/Operator.h
The file was modifiedllvm/lib/IR/AsmWriter.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.h
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.cpp
The file was modifiedllvm/test/Transforms/LoopVectorize/vplan-printing.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
Commit 2d33327f9d4c154e7454c2d830c1caa8e6850f4f by rosie.sumpter
[LoopVectorize] Print fast-math flags for VPReductionRecipe
The file was modifiedllvm/test/Transforms/LoopVectorize/vplan-printing.ll
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.cpp
Commit df32a39dd0f68383a1685a4571715edb70664969 by rosie.sumpter
[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic

This patch updates the cost model for ordered reductions so that a call
to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction
plus the cost of an ordered fadd reduction.

Differential Revision: https://reviews.llvm.org/D111630
The file was modifiedllvm/lib/Transforms/Vectorize/LoopVectorize.cpp
The file was modifiedllvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
Commit 6f82264dbb02028d4ec4940aeb6d716dded6e879 by pavel
[lldb/gdb-remote] Remove more non-stop mode remnants

The read thread handling is completely dead code now that non-stop mode
no longer exists.
The file was modifiedlldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunication.h
The file was modifiedlldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
The file was modifiedlldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
The file was modifiedlldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
Commit 8ef460fc5137816c0bcc9ec471e7d60d3cc2f5ee by flo
[llvm-reduce] Add parallel chunk processing.

This patch adds parallel processing of chunks. When reducing very large
inputs, e.g. functions with 500k basic blocks, processing chunks in
parallel can significantly speed up the reduction.

To allow modifying clones of the original module in parallel, each clone
needs their own LLVMContext object. To achieve this, each job parses the
input module with their own LLVMContext. In case a job successfully
reduced the input, it serializes the result module as bitcode into a
result array.

To ensure parallel reduction produces the same results as serial
reduction, only the first successfully reduced result is used, and
results of other successful jobs are dropped. Processing resumes after
the chunk that was successfully reduced.

The number of threads to use can be configured using the -j option.
It defaults to 1, which means serial processing.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D113857
The file was modifiedllvm/test/tools/llvm-reduce/operands-skip.ll
The file was modifiedllvm/tools/llvm-reduce/deltas/Delta.cpp
The file was modifiedllvm/tools/llvm-reduce/CMakeLists.txt
Commit bb273a35a02a00dbba8549e858df310f4b6a32b1 by springerm
[mlir][linalg][bufferize][NFC] Move tensor interface impl to new build target

This makes ComprehensiveBufferize entirely independent of the tensor dialect.

Differential Revision: https://reviews.llvm.org/D114217
The file was modifiedmlir/lib/Dialect/Linalg/ComprehensiveBufferize/BufferizableOpInterface.cpp
The file was addedmlir/lib/Dialect/Linalg/ComprehensiveBufferize/TensorInterfaceImpl.cpp
The file was modifiedmlir/lib/Dialect/Linalg/ComprehensiveBufferize/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt
The file was modifiedmlir/include/mlir/Dialect/Linalg/ComprehensiveBufferize/LinalgInterfaceImpl.h
The file was addedmlir/include/mlir/Dialect/Linalg/ComprehensiveBufferize/TensorInterfaceImpl.h
The file was modifiedmlir/include/mlir/Dialect/Linalg/ComprehensiveBufferize/BufferizableOpInterface.h
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
The file was modifiedmlir/lib/Dialect/Linalg/ComprehensiveBufferize/ComprehensiveBufferize.cpp
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferizePass.cpp
Commit 93fc91610f427f42b77fa36a65a70b9b86225c37 by mydeveloperday
[clang-format] NFC - recent changes caused clang-format to no longer be clang-formatted.

The following 2 commits caused files in clang-format to no longer be clang-formatted.

we would lose our "clean" status https://releases.llvm.org/13.0.0/tools/clang/docs/ClangFormattedStatus.html

c2271926a4fc  - Make clang-format fuzz through Lexing with asserts enabled (https://github.com/llvm/llvm-project/commit/c2271926a4fc )

84bf5e328664 - Fix various problems found by fuzzing. (https://github.com/llvm/llvm-project/commit/84bf5e328664)

Reviewed By: HazardyKnusperkeks, owenpan

Differential Revision: https://reviews.llvm.org/D114430
The file was modifiedclang/lib/Format/Format.cpp
The file was modifiedclang/lib/Format/TokenAnalyzer.cpp
The file was modifiedclang/lib/Format/TokenAnnotator.cpp
The file was modifiedclang/lib/Format/WhitespaceManager.cpp
The file was modifiedclang/lib/Format/SortJavaScriptImports.cpp
Commit 734e2386ffb34e5ab5fbdc1063fd11e6a2a632ce by david.green
[ARM] Add fma and update fadd/fmul predicated select tests. NFC
The file was modifiedllvm/test/CodeGen/Thumb2/mve-pred-selectop2.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll
Commit 764b35d89f57a9052d84898422a865dc2e08edca by dvyukov
tsan: extend mmap test

Test size larger than clear_shadow_mmap_threshold,
which is handled differently.

Depends on D114348.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D114366
The file was modifiedcompiler-rt/test/tsan/ignored-interceptors-mmap.cpp
Commit d9af9c2c5a53c9ba6aa0255240a2a40e8bea27aa by david.green
[ARM] Fold floating point select(binop) patterns

Similar to D84091 which added extra predicated folds for integer operations
using the identity element of the operation, this adds them for floating
point operations for the form `BinOp(x, select(p, y, Identity))`. They are
folded back to predicated versions of the operator, with fadd having the
identity -0.0, fsub using the identity 0.0 and fmul using 1.0.

Differential Revision: https://reviews.llvm.org/D113574
The file was modifiedllvm/lib/Target/ARM/ARMInstrMVE.td
The file was modifiedllvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll
Commit b8f68ad9cdb11d585acc6c38ad124b32efb6178a by jeremy.morse
[DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag

It appears that we can emit all the instructions for a function, including
debug instructions, and then optimise some of the values out late.
Specifically, in the attached test case, an argument gets optimised out
after DBG_VALUE / DBG_INSTR_REFs are created. This confuses
MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a
defining instruction, and crashes instead.

Fix this by identifying when there's no defining instruction, and
translating that instead into a DBG_VALUE $noreg.

Differential Revision: https://reviews.llvm.org/D114476
The file was addedllvm/test/DebugInfo/X86/instr-ref-sdag-empty-vreg.ll
The file was modifiedllvm/lib/CodeGen/MachineFunction.cpp
Commit cf40ca026f9193c46c3db1f3cb2ac0dff5f2b695 by david.sherwood
[NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc

In quite a few places we were calling getCurSDLoc() to get the debug
location, but this is already a local variable `sdl`.

Differential Revision: https://reviews.llvm.org/D114447
The file was modifiedllvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Commit ca9d149e07551257813a7c9913efdfbbc23774a1 by springerm
[mlir][linalg][bufferize][NFC] Move vector interface impl to new build target

This makes ComprehensiveBufferize entirely independent of the vector dialect.

Differential Revision: https://reviews.llvm.org/D114218
The file was modifiedmlir/lib/Dialect/Linalg/ComprehensiveBufferize/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferizePass.cpp
The file was modifiedmlir/lib/Dialect/Linalg/ComprehensiveBufferize/ComprehensiveBufferize.cpp
The file was addedmlir/lib/Dialect/Linalg/ComprehensiveBufferize/VectorInterfaceImpl.cpp
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
The file was addedmlir/include/mlir/Dialect/Linalg/ComprehensiveBufferize/VectorInterfaceImpl.h
Commit 581f837355b9523bd3217fb05eed3d577d51b95d by david.green
[ARM] Fold (fadd x, (vselect c, y, -1.0)) into (vselect c, (fadd x, y), x)

This is similar to D113574, but as a DAG combine, not tablegen patterns.
Doing the fold as a DAG combine allows the fadd to be folded with a
fmul, finally producing a predicated vfma. It performs the same fold of
fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as
the identity value of a fadd.

Differential Revision: https://reviews.llvm.org/D113584
The file was modifiedllvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
Commit 976f3b3c9eba0835d5ab7d191bd2e88ceda86ebe by carl.ritson
[AMDGPU] Only allow implicit WQM in pixel shaders

Implicit derivatives are only valid in pixel shaders,
hence only implicitly enable WQM for pixel shaders.
This avoids unintended WQM in other shader types (e.g. compute)
when image sampling instructions are used.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114414
The file was modifiedllvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/memory_clause.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/wqm.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll
Commit c52ff0cfcbf46b81d3df868a259d94910cd15c9a by georgiev
  [LLDB/test] lldbutil check_breakpoint() - check target instance

Check test.target instance type before we attempt to get the breakpoint.
This fix is suggested by 'clayborg'.
Ref: https://reviews.llvm.org/D111899#inline-1090156
The file was modifiedlldb/packages/Python/lldbsuite/test/lldbutil.py
Commit 8a52bd82e36855b3ad842f2535d0c78a97db55dc by jay.foad
[AMDGPU] Only select VOP3 forms of VOP2 instructions

Change VOP_PAT_GEN to default to not generating an instruction selection
pattern for the VOP2 (e32) form of an instruction, only for the VOP3
(e64) form. This allows SIFoldOperands maximum freedom to fold copies
into the operands of an instruction, before SIShrinkInstructions tries
to shrink it back to the smaller encoding.

This affects the following VOP2 instructions:
v_min_i32
v_max_i32
v_min_u32
v_max_u32
v_and_b32
v_or_b32
v_xor_b32
v_lshr_b32
v_ashr_i32
v_lshl_b32

A further cleanup could simplify or remove VOP_PAT_GEN, since its
optional second argument is never used.

Differential Revision: https://reviews.llvm.org/D114252
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.demote.ll
The file was modifiedllvm/lib/Target/AMDGPU/SIInstrInfo.td
The file was modifiedllvm/test/CodeGen/AMDGPU/sdwa-peephole.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/sext-in-reg.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/extract-lowbits.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/ctpop16.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/shl.v2i16.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/lshr.v2i16.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/stack-realign.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/shl.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/ssubsat.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/inline-asm.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/flat-scratch.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/commute-shifts.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/idot8u.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/idot8s.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/select-constant-xor.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/bfe-patterns.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/ashr.v2i16.ll
Commit d7e03df719464354b20a845b7853be57da863924 by jay.foad
[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32

Select SelectionDAG ops smul_lohi/umul_lohi to
v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0.
v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions
so it is better to use one instruction than two.

Further improvements are possible to make better use of the addend
operand, but this is already a strict improvement over what we have
now.

Differential Revision: https://reviews.llvm.org/D113986
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.h
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
The file was modifiedllvm/lib/Target/AMDGPU/SIISelLowering.cpp
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
The file was modifiedllvm/test/CodeGen/AMDGPU/mul.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/mul_int24.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/udiv.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/llvm.mulo.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/bypass-div.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/mad_64_32.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/wwm-reserved.ll
Commit 3cf4a2c6203b5777d56f0c04fb743b85a041d6f9 by llvm-dev
[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl

If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
The file was modifiedllvm/lib/Target/ARM/ARMInstrThumb2.td
The file was modifiedllvm/lib/Target/ARM/ARMInstrThumb.td
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-512.ll
The file was modifiedllvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-128.ll
The file was modifiedllvm/test/CodeGen/X86/rotate_vec.ll
The file was modifiedllvm/lib/Target/ARM/ARMInstrInfo.td
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-256.ll
Commit 2191d502a8576c5d46f8771d5f0d353eb525b9b9 by jeremy.morse
[DebugInfo] Check both instr-ref and DBG_VALUE modes of sdag tests

In these test updates for instruction referencing, I've added specific
instr-ref RUN lines, and kep thte DBG_VALUE-based variable location check
lines too. This is because argument handling is really fiddly, and I figure
it's worth duplicating the testing to ensure it's definitely correct.

There's also dbg-value-superreg-copy2.mir, a dtest for where varaible
locations go when virtual registers are coalesced together. I don't think
there's an instruction referencing specific test for this, so have
duplicated that to for instruction referencing.

Differential Revision: https://reviews.llvm.org/D114262
The file was modifiedllvm/test/DebugInfo/X86/dbg-value-funcarg.ll
The file was modifiedllvm/test/DebugInfo/X86/dbg-value-funcarg3.ll
The file was modifiedllvm/test/DebugInfo/X86/dbg-value-arg-movement.ll
The file was addedllvm/test/CodeGen/X86/dbg-value-superreg-copy2.mir
The file was modifiedllvm/test/DebugInfo/X86/dbg-value-funcarg2.ll
Commit 1b5a43ac3f1113cd0512752e021fc70740726698 by klimek
Clean up clang-format tech debt.

Make all code go through FormatTokenSource instead of going around it, which
makes changes to TokenSource brittle.

Add LLVM_DEBUG in FormatTokenSource to be able to follow the token stream.
The file was modifiedclang/lib/Format/UnwrappedLineParser.cpp
Commit 06677d6a9faef9f57c3b3c79906e4bba18ebee8a by dvyukov
sanitizer_common: remove SANITIZER_USE_MALLOC

It was introduced in:
9cffc9550b75 tsan: allow to force use of __libc_malloc in sanitizer_common
and used in:
512a18e51819 tsan: add standalone deadlock detector
and later used for Go support.
But now both uses are gone. Nothing defines SANITIZER_USE_MALLOC.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114514
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_allocator.cpp
Commit a68b52e0a33382f241a250c1d67bed1141727447 by dvyukov
tsan: add another fork deadlock test

The test tries to provoke internal allocator to be locked during fork
and then force the child process to use the internal allocator.
This test sometimes deadlocks with the new tsan runtime.

Depends on D114514.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114515
The file was addedcompiler-rt/test/tsan/Linux/fork_multithreaded4.cpp
Commit f911c397dc9ee5849dff265b872b8df7458bf1e0 by jeremy.morse
[DebugInfo] Adjust x86 location-list tests for instruction referencing

This patch updates location lists in various x86 tests to reflect what
instruction referencing produces. There are two flavours of change:
* Not following a register copy immediately, because instruction
   referencing can make some slightly smarter decisions,
* Extended ranges, due to having additional information.

The register changes aren't that interesting, it's just a choice between
equally legitimate registers that instr-ref does differently. The extended
ranges are largely due to following stack restores better.

Differential Revision: https://reviews.llvm.org/D114362
The file was modifiedllvm/test/DebugInfo/COFF/register-variables.ll
The file was modifiedllvm/test/DebugInfo/X86/basic-block-sections-debug-loclist-4.ll
The file was modifiedllvm/test/DebugInfo/X86/sdag-combine.ll
The file was modifiedllvm/test/DebugInfo/X86/pieces-3.ll
The file was modifiedllvm/test/DebugInfo/X86/spill-nospill.ll
The file was modifiedllvm/test/DebugInfo/X86/live-debug-values-remove-range.ll
The file was modifiedllvm/test/DebugInfo/COFF/pieces.ll
The file was modifiedllvm/test/DebugInfo/X86/basic-block-sections-debug-loclist-5.ll
Commit b7bf937bbee38c2db0c0640176ef618d9c746538 by nemanja.i.ibm
[PowerPC] Provide XL-compatible vec_round implementation

The XL implementation of vec_round for vector double uses
"round-to-nearest, ties to even" just as the vector float
`version does. However clang and gcc use "round-to-nearest-away"
for vector double and "round-to-nearest, ties to even"
for vector float.

The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__
macro similarly to other instances of incompatibility.

Differential revision: https://reviews.llvm.org/D113642
The file was modifiedclang/test/CodeGen/builtins-ppc-xlcompat.c
The file was modifiedclang/lib/Headers/altivec.h
The file was modifiedllvm/test/CodeGen/PowerPC/read-set-flm.ll
The file was modifiedllvm/lib/Target/PowerPC/PPCISelLowering.cpp
The file was modifiedclang/test/CodeGen/builtins-ppc-vsx.c
Commit e3d8ebe158562fb945d473319f4f5c2de25a9a02 by djordje.todorovic
[llvm-dwarfdump][Statistics] Handle LTO cases with cross CU referencing

With link-time optimizations enabled, resulting DWARF mayend up containing
cross CU references (through the DW_AT_abstract_origin attribute).
Consider the following example:

// sum.c
__attribute__((always_inline)) int sum(int a, int b)
{
     return a + b;
}
// main.c
extern int sum(int, int);
int main()
{
     int a = 5, b = 10, c = sum(a, b);
     return 0;
}

Compiled as follows:

$ clang -g -flto -fuse-ld=lld main.c sum.c -o main

Results in the following DWARF:

-- sum.c CU: abstract instance tree
...
0x000000b0:   DW_TAG_subprogram
                DW_AT_name ("sum")
                DW_AT_decl_file ("sum.c")
                DW_AT_decl_line (1)
                DW_AT_prototyped (true)
                DW_AT_type (0x000000d3 "int")
                DW_AT_external (true)
                DW_AT_inline (DW_INL_inlined)

0x000000bc:     DW_TAG_formal_parameter
                  DW_AT_name ("a")
                  DW_AT_decl_file ("sum.c")
                  DW_AT_decl_line (1)
                  DW_AT_type (0x000000d3 "int")

0x000000c7:     DW_TAG_formal_parameter
                  DW_AT_name ("b")
                  DW_AT_decl_file ("sum.c")
                  DW_AT_decl_line (1)
                  DW_AT_type (0x000000d3 "int")
...
-- main.c CU: concrete inlined instance tree
...
0x0000006d:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x00000000000000b0 "sum")
                  DW_AT_low_pc (0x00000000002016ef)
                  DW_AT_high_pc (0x00000000002016f1)
                  DW_AT_call_file ("main.c")
                  DW_AT_call_line (5)
                  DW_AT_call_column (0x19)

0x00000081:       DW_TAG_formal_parameter
                    DW_AT_location (DW_OP_reg0 RAX)
                    DW_AT_abstract_origin (0x00000000000000bc "a")

0x00000088:       DW_TAG_formal_parameter
                    DW_AT_location (DW_OP_reg2 RCX)
                    DW_AT_abstract_origin (0x00000000000000c7 "b")
...

Note that each entry within the concrete inlined instance tree in
the main.c CU has a DW_AT_abstract_origin attribute which
refers to a corresponding entry within the abstract instance
tree in the sum.c CU.
llvm-dwarfdump --statistics did not properly report
DW_TAG_formal_parameters/DW_TAG_variables from concrete inlined
instance trees which had 0% location coverage and which
referred to a different CU, mainly because information about abstract
instance trees and their parameters/variables was stored
locally - just for the currently processed CU,
rather than globally - for all CUs.
In particular, if the concrete inlined instance tree from
the example above was to look like this
(i.e. parameter b has 0% location coverage, hence why it's missing):

0x0000006d:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x00000000000000b0 "sum")
                  DW_AT_low_pc (0x00000000002016ef)
                  DW_AT_high_pc (0x00000000002016f1)
                  DW_AT_call_file ("main.c")
                  DW_AT_call_line (5)
                  DW_AT_call_column (0x19)

0x00000081:       DW_TAG_formal_parameter
                    DW_AT_location (DW_OP_reg0 RAX)
                    DW_AT_abstract_origin (0x00000000000000bc "a")

llvm-dwarfdump --statistics would have not reported b as such.

Patch by Dimitrije Milosevic.

Differential revision: https://reviews.llvm.org/D113465
The file was modifiedllvm/tools/llvm-dwarfdump/Statistics.cpp
The file was addedllvm/test/tools/llvm-dwarfdump/X86/LTO_CCU_zero_loc_cov.ll
Commit 823fc8aa0681a861d1b74790ba77fd1f591c90b5 by spatel
[InstSimplify] add tests for xor logic; NFC
The file was modifiedllvm/test/Transforms/InstSimplify/xor.ll
Commit b326c058146fbd5d89f7c8ce9fb932b3851200d7 by spatel
[InstSimplify] fold xor logic of 2 variables, part 2

(~a & b) ^ (a | b) --> a

This is the swapped and/or (Demorgan?) sibling fold for
the fold added with D114462 ( 892648b18a8c ).

This case is easier to specify because we are returning
a root value, not a 'not':
https://alive2.llvm.org/ce/z/SRzj4f
The file was modifiedllvm/test/Transforms/InstSimplify/xor.ll
The file was modifiedllvm/lib/Analysis/InstructionSimplify.cpp
Commit 19be7f9702547c35224960f1a846e344576f8e31 by llvm-dev
[X86] Add D113970 tests cases for or-lea with no common bits.

Added tests are permutations of the pattern: (X & ~M) or (Y & M).

Differential Revision: https://reviews.llvm.org/D114078
The file was modifiedllvm/test/CodeGen/X86/or-lea.ll
Commit 73fd36963cc62931d695c9fda2026664962df754 by llvm-dev
[X86] Add BMI test coverage for for or-lea with no common bits tests

Ensure D113970 handles andnot patterns as well.
The file was modifiedllvm/test/CodeGen/X86/or-lea.ll
Commit a7648eb2aaf848e903dca46bb9efb75809570ef1 by flo
[LV] Use patterns in some induction tests, to make more robust. (NFC)
The file was modifiedllvm/test/Transforms/LoopVectorize/induction_plus.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/vplan-printing.ll
The file was modifiedllvm/test/Transforms/LoopVectorize/induction.ll
Commit d32787230d52af709d67a0583a15727054231a0a by benny.kra
Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl"

This reverts commit 3cf4a2c6203b5777d56f0c04fb743b85a041d6f9.

It makes llc hang on the following test case.
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 {
entry:
  br label %while.body117.i

while.body117.i:                                  ; preds = %cleanup149.i, %entry
  %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ]
  %0 = load i16, i16* undef, align 2
  %1 = icmp eq i16 undef, -10240
  br i1 %1, label %fail.i, label %cleanup149.i

cleanup149.i:                                     ; preds = %while.body117.i
  %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2
  store i16 %or130.i, i16* %out.6269.i, align 2
  br label %while.body117.i

fail.i:                                           ; preds = %while.body117.i
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #1

attributes #0 = { "target-features"="+neon,+v8a" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
```
The file was modifiedllvm/lib/Target/ARM/ARMInstrInfo.td
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-128.ll
The file was modifiedllvm/lib/Target/ARM/ARMInstrThumb.td
The file was modifiedllvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-256.ll
The file was modifiedllvm/lib/Target/ARM/ARMInstrThumb2.td
The file was modifiedllvm/test/CodeGen/X86/vector-rotate-512.ll
The file was modifiedllvm/test/CodeGen/X86/rotate_vec.ll
The file was modifiedllvm/lib/Target/ARM/ARMISelLowering.cpp
Commit 080ef0b6a698e41bbc356f5d96eb550f248642e2 by bradley.smith
[AArch64][SVE] Recognize all ones mask during fixed mask generation

Differential Revision: https://reviews.llvm.org/D114431
The file was modifiedllvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll
The file was modifiedllvm/lib/Target/AArch64/AArch64ISelLowering.cpp
The file was addedllvm/test/CodeGen/AArch64/sve-fixed-length-mask-opt.ll
The file was modifiedllvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll
Commit 8b86752c60f1d18f367fdd29f47bc3bd8646bbd2 by flo
[VPlan] Remove unused VPInstruction constructor. (NFC)

VPInstruction inherits from VPValue, so the constructor taking
ArrayRef<VPValue*> covers all cases that would be covered by the removed
constructor.
The file was modifiedllvm/lib/Transforms/Vectorize/VPlan.h
Commit 408c0cc4eb6099d06cbd51fd4e205c2b39b2c8af by gchatelet
[libc] Fix wrong type for load/store of Repeated elements
The file was modifiedlibc/src/string/memory_utils/elements.h