SuccessChanges

Summary

  1. [libc][NFC] Instead of erroring, skip math targets with missing implementations. (details)
  2. [llvm-nm] Support the -V option, print that the tool is compatible with GNU nm (details)
  3. [mlir][NFC] Add helper for common pattern of replaceAllUsesExcept (details)
  4. [mlir][tosa] Add tosa.div integer lowering to linalg.generic. (details)
  5. [OpenMP] Prevent Attributor from deleting functions in OpenMPOptCGSCC pass (details)
  6. [CMake][ELF] Link libLLVM.so and libclang-cpp.so with -Bsymbolic-functions (details)
  7. [libc] Enable fmaf and fma on x86_64. (details)
  8. [mlir][tosa] Add lowering to tosa.abs for integer cases (details)
  9. [NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test (details)
  10. [X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom (details)
  11. Revert "[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()" (details)
  12. Revert "[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again" (details)
  13. [AA] Use isIdentifiedFunctionLocal() (NFC) (details)
  14. [CaptureTracking] Use isIdentifiedFunctionLocal() (NFC) (details)
  15. [clang-repl] Temporarily disable the execute.cpp test on ppc64. (details)
  16. [docs] Add page on opaque pointer types (details)
  17. Don't run MachineVerifier on sjlj-unwind-inline-asm test because of known issue (PR39439) (details)
  18. [Clang][OpenMP] Allow unified_shared_memory for Pascal-generation GPUs. (details)
  19. [IR] Introduce the opaque pointer type (details)
  20. Widen `name` stencil to support `TypeLoc` nodes. (details)
  21. [mlir][Linalg] Add ComprehensiveBufferize for functions(step 1/n) (details)
  22. [mlir][Linalg] Add support for vector.transfer ops to comprehensive bufferization (2/n). (details)
  23. AMDGPU/GlobalISel: Implement tail calls (details)
  24. AMDGPU/GlobalISel: Don't hardcode stack alignment in assert message (details)
  25. [gn] Don't pass -fprofile-instr-generate to linker on Windows (details)
Commit 7deb5ef44f2865f1b34cff98176a1a723107cf08 by sivachandra
[libc][NFC] Instead of erroring, skip math targets with missing implementations.

Fixes Aarch64 bot.
The file was modifiedlibc/src/math/CMakeLists.txt
Commit b42fb6811e25322f7e55d3f76fe13a6829202219 by martin
[llvm-nm] Support the -V option, print that the tool is compatible with GNU nm

This unlocks some codepaths in libtool.

Differential Revision: https://reviews.llvm.org/D102321
The file was modifiedllvm/tools/llvm-nm/llvm-nm.cpp
The file was modifiedllvm/docs/CommandGuide/llvm-nm.rst
The file was addedllvm/test/tools/llvm-nm/libtool-version.test
Commit 12874e93a15219ccfaff42a0536b2b5368c6f304 by silvasean
[mlir][NFC] Add helper for common pattern of replaceAllUsesExcept

This covers the extremely common case of replacing all uses of a Value
with a new op that is itself a user of the original Value.

This should also be a little bit more efficient than the
`SmallPtrSet<Operation *, 1>{op}` idiom that was being used before.

Differential Revision: https://reviews.llvm.org/D102373
The file was modifiedmlir/lib/IR/Value.cpp
The file was modifiedmlir/lib/Dialect/Affine/Transforms/AffineLoopNormalize.cpp
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
The file was modifiedmlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
The file was modifiedmlir/include/mlir/IR/Value.h
Commit 0831793ed962105b51057c02df413abef4767e7c by rob.suderman
[mlir][tosa] Add tosa.div integer lowering to linalg.generic.

Lowering div elementwise op to the linalg dialect. Since tosa only supports integer division, that is the only version that is currently implemented.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D102430
The file was modifiedmlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
The file was modifiedmlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir
Commit 8b57ed09bd238b00b5d096f9be7ef9f831428044 by huberjn
[OpenMP] Prevent Attributor from deleting functions in OpenMPOptCGSCC pass

Summary:
This patch prevents the Attributor instances made in the CGSCC pass from
deleting functions. This prevents the attributor from changing the call
graph while OpenMPOpt is working with it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102363
The file was modifiedllvm/lib/Transforms/IPO/OpenMPOpt.cpp
Commit 4f05f4c8e66bc76b1d94f5283494404382e3bacd by i
[CMake][ELF] Link libLLVM.so and libclang-cpp.so with -Bsymbolic-functions

llvm-dev message: https://lists.llvm.org/pipermail/llvm-dev/2021-May/150465.html

In an ELF shared object, a default visibility defined symbol is preemptible by
default. This creates some missed optimization opportunities.
-Bsymbolic-functions is more aggressive than our current -fvisibility-inlines-hidden
(present since 2012) as it applies to all function definitions.  It can

* avoid PLT for cross-TU function calls && reduce dynamic symbol lookup
* reduce dynamic symbol lookup for taking function addresses and optimize out GOT/TOC on x86-64/ppc64

In a -DLLVM_TARGETS_TO_BUILD=X86 build, the number of JUMP_SLOT decreases from 12716 to 1628, and the number of GLOB_DAT decreases from 1918 to 1313
The built clang with `-DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on` is significantly faster.
See the Linux kernel build result https://bugs.archlinux.org/task/70697

Note: the performance of -fno-semantic-interposition -Bsymbolic-functions
libLLVM.so and libclang-cpp.so is close to a PIE binary linking against
`libLLVM*.a` and `libclang*.a`. When the host compiler is Clang,
-Bsymbolic-functions is the major contributor.  On x86-64 (with GOTPCRELX) and
ppc64 ELFv2, the GOT/TOC relocations can be optimized.

Some implication:

Interposing a subset of functions is no longer supported.
(This is fragile on ELF and unsupported on Mach-O at all. For Mach-O we don't
use `ld -interpose` or `-flat_namespace`)

Compiling a program which takes the address of any LLVM function with
`{gcc,clang} -fno-pic` and expects the address to equal to the address taken
from libLLVM.so or libclang-cpp.so is unsupported. I am fairly confident that
llvm-project shouldn't have different behaviors depending on such pointer
equality (as we've been using -fvisibility-inlines-hidden which applies to
inline functions for a long time), but if we accidentally do, users should be
aware that they should not make assumption on pointer equality in `-fno-pic`
mode.

See more on https://maskray.me/blog/2021-05-09-fno-semantic-interposition

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D102090
The file was modifiedclang/tools/clang-shlib/CMakeLists.txt
The file was modifiedllvm/tools/llvm-shlib/CMakeLists.txt
Commit b47539a14dc8a40e8710aef2fb75d2486f271dba by sivachandra
[libc] Enable fmaf and fma on x86_64.

They require clang-11 or above for building and hence had to be disabled
as the bots did not have clang-11 or higher. Bots have now been upgraded
so we can enable these functions now.
The file was modifiedlibc/config/linux/x86_64/entrypoints.txt
Commit f97d970a49fb1f95cd3ac599369e53325129f769 by rob.suderman
[mlir][tosa] Add lowering to tosa.abs for integer cases

Integer case requires decomposing to simple LLVM operatons.

Differential Revision: https://reviews.llvm.org/D101809
The file was modifiedmlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
The file was modifiedmlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir
Commit 6c4596793d43703923552e791716a3d511e28fe0 by lebedev.ri
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test
The file was addedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
Commit aa0dcb3ba4b93e4499208def080ced98f3a89ad5 by lebedev.ri
[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom

While both the SOG and Agner insist that it is zero-cycle,
i can not confirm that claim. While it clearly breaks the dependency,
i can not come up with a snippet, or measurement approach,
to end up with IPC bigger than 4, which, to me, means that it actually
consumes execution resource of an FP unit for a cycle.
The file was modifiedllvm/lib/Target/X86/X86ScheduleZnver3.td
The file was modifiedllvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
Commit 6b95fd199d96e3ba5c28a23b17b74203522bdaa8 by lebedev.ri
Revert "[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()"

Depends on a commit that is about to be reverted.

This reverts commit 69ed93a4355123a45c1d7216aea7cd53d07a361b.
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
Commit 5fddc3312bad7e62493f1605385fad5e589e6450 by lebedev.ri
Revert "[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again"

As reported in post-commit feedback, this has issues with e.g. <16 x i1>:
https://llvm.godbolt.org/z/jxPvdGEW4

This reverts commit c02476f3158f2908ef0a6f628210b5380bd33695.
The file was modifiedllvm/lib/Target/X86/X86TargetTransformInfo.cpp
The file was modifiedllvm/test/Analysis/CostModel/X86/load_store.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
The file was modifiedllvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Commit dce158c58d851747c2e1c188dcf4baa4620b5516 by nikita.ppv
[AA] Use isIdentifiedFunctionLocal() (NFC)

This condition is equivalent to isIdentifiedFunctionLocal(),
and this is also what we semantically want to check here.
The file was modifiedllvm/lib/Analysis/AliasAnalysis.cpp
Commit 425781bce01f2f1d5f553d3b2bf9ebbd6e15068c by nikita.ppv
[CaptureTracking] Use isIdentifiedFunctionLocal() (NFC)

These conditions together exactly match isIdentifiedFunctionLocal(),
and this is also what we logically want to check for here.
The file was modifiedllvm/lib/Analysis/CaptureTracking.cpp
Commit 71a0609a2b533dbcd6826ad774b6bee5e9818644 by Lang Hames
[clang-repl] Temporarily disable the execute.cpp test on ppc64.

This test is failing on some builders (see [1]) with the following error:

error: Added modules have incompatible data layouts:
  e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs
  E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)

The JIT layout is correct, but some IR module added to the JIT is using a
little-endian layout instead.

This commit disables the test on ppc64 until we can investigate further and
fix the bug.

[1] https://lab.llvm.org/staging/#/builders/126/builds/371
The file was modifiedclang/test/Interpreter/execute.cpp
Commit 772bdef6afb661dfd67b3e5c77befa56d249bff8 by aeubanks
[docs] Add page on opaque pointer types

Reviewed By: dblaikie, dexonsmith

Differential Revision: https://reviews.llvm.org/D102292
The file was modifiedllvm/docs/UserGuides.rst
The file was addedllvm/docs/OpaquePointers.rst
Commit 93d56922fabaf52eec8d1d4e28e04fa47eb1c797 by amanieu
Don't run MachineVerifier on sjlj-unwind-inline-asm test because of known issue (PR39439)

Fixes buildbot failure (https://lab.llvm.org/buildbot/#/builders/16/builds/10825).

Reviewed By: Amanieu

Differential Revision: https://reviews.llvm.org/D102433
The file was modifiedllvm/test/CodeGen/X86/sjlj-unwind-inline-asm-codegen.ll
Commit 83ff0ff46337422171fb36f934bd56c2bc1be15c by llvm-project
[Clang][OpenMP] Allow unified_shared_memory for Pascal-generation GPUs.

The Pascal architecture supports the page migration engine required for
unified_shared_memory, as indicated by NVIDIA:
* https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
* https://developer.nvidia.com/blog/beyond-gpu-memory-limits-unified-memory-pascal/
* https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements

The limitation was introduced in D54493 which justified the cut-off by
the requirement for unified addressing. However, Unified Virtual
Addressing (UVA) is already available with sm20 (Fermi, Kepler,
Maxwell):
* https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#basics-of-uva-cuda-memory-management

Unified shared memory might even be possible with these, but with
migration of entire allocations on kernel startup.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D101595
The file was modifiedclang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
The file was modifiedclang/test/OpenMP/requires_codegen.cpp
Commit 2155dc51d700c9fb5f29d79eaacf5e1470e4d8ca by aeubanks
[IR] Introduce the opaque pointer type

The opaque pointer type is essentially just a normal pointer type with a
null pointee type.

This also adds support for the opaque pointer type to the bitcode
reader/writer, as well as to textual IR.

To avoid confusion with existing pointer types, we disallow creating a
pointer to an opaque pointer.

Opaque pointer types should not be widely used at this point since many
parts of LLVM still do not support them. The next steps are to add some
very simple use cases of opaque pointers to make sure they work, then
start pretending that all pointers are opaque pointers and see what
breaks.

https://lists.llvm.org/pipermail/llvm-dev/2021-May/150359.html

Reviewed By: dblaikie, dexonsmith, pcc

Differential Revision: https://reviews.llvm.org/D101704
The file was modifiedllvm/lib/AsmParser/LLLexer.cpp
The file was modifiedllvm/lib/IR/Type.cpp
The file was modifiedllvm/docs/ReleaseNotes.rst
The file was modifiedllvm/lib/IR/LLVMContextImpl.h
The file was modifiedllvm/include/llvm/Bitcode/LLVMBitCodes.h
The file was modifiedllvm/include/llvm/IR/DerivedTypes.h
The file was addedllvm/test/Assembler/invalid-opaque-ptr.ll
The file was modifiedllvm/docs/LangRef.rst
The file was addedllvm/test/Assembler/opaque-ptr.ll
The file was modifiedllvm/lib/Bitcode/Reader/BitcodeReader.cpp
The file was modifiedllvm/lib/Bitcode/Writer/BitcodeWriter.cpp
The file was modifiedllvm/lib/IR/AsmWriter.cpp
The file was modifiedllvm/lib/AsmParser/LLParser.cpp
Commit be5c7c5d8230428f024bd656beb48ef8462985ff by steveire
Widen `name` stencil to support `TypeLoc` nodes.

Differential Revision: https://reviews.llvm.org/D102185
The file was modifiedclang/include/clang/Tooling/Transformer/RangeSelector.h
The file was modifiedclang/lib/Tooling/Transformer/RangeSelector.cpp
The file was modifiedclang/unittests/Tooling/RangeSelectorTest.cpp
Commit 1e01a8919f8d0fdc8c2f5f679fcc541b61381b0f by nicolas.vasilache
[mlir][Linalg] Add ComprehensiveBufferize for functions(step 1/n)

This is the first step towards upstreaming comprehensive bufferization following the
discourse post: https://llvm.discourse.group/t/rfc-linalg-on-tensors-update-and-comprehensive-bufferization-rfc/3373/6.

This first commit introduces a basic pass for bufferizing within function boundaries,
assuming that the inplaceable function boundaries have been marked as such.

Differential revision: https://reviews.llvm.org/D101693
The file was modifiedmlir/lib/Dialect/Linalg/IR/LinalgTypes.cpp
The file was addedmlir/test/Dialect/Linalg/comprehensive-func-bufferize.mlir
The file was modifiedmlir/include/mlir/Dialect/Linalg/Passes.td
The file was addedmlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferize.cpp
The file was modifiedmlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt
The file was modifiedmlir/include/mlir/Dialect/Linalg/Passes.h
Commit bebf5d56bff75cd5b74b58cbdcb965885a82916f by nicolas.vasilache
[mlir][Linalg] Add support for vector.transfer ops to comprehensive bufferization (2/n).

Differential revision: https://reviews.llvm.org/D102395
The file was modifiedmlir/test/Dialect/Linalg/comprehensive-func-bufferize.mlir
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferize.cpp
Commit 6a70874d27c73cf8b55a568449fd92f97b5bb7b3 by Matthew.Arsenault
AMDGPU/GlobalISel: Implement tail calls

Or at least the sibling call cases which the DAG already handles.
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUCallLowering.h
The file was addedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll
The file was addedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/call-constant.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
Commit 85394d9ed71bea61000a502abc1f89a3981bee59 by Matthew.Arsenault
AMDGPU/GlobalISel: Don't hardcode stack alignment in assert message
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
Commit 5ba4a0e890c8106a10d32b0d34af1b214b9bdf3a by rnk
[gn] Don't pass -fprofile-instr-generate to linker on Windows

Avoids a warning from the linker. The user still has to put the resource
directory on the linker search path, and I can't find a clean way to do
that automatically in gn.
The file was modifiedllvm/utils/gn/build/BUILD.gn