FailedChanges

Summary

  1. [Clang] remove text extension from diag::err_drv_invalid_value_with_suggestion (details)
  2. Make clangd CompletionModel not depend on directory layout. (details)
  3. [lld-macho] Have --reproduce account for path rerooting (details)
  4. [lld-macho] Preliminary support for ARM_RELOC_BR24 (details)
  5. [hwasan] Fix missing synchronization in AllocThread. (details)
  6. [libomptarget] Initial documentation on amdgpu offload (details)
  7. [WebAssembly] Set alignment to 1 for SIMD memory intrinsics (details)
  8. [libc++] NFC: Remove stray semicolon in from-scratch config files (details)
  9. [libcxx] [ci] Add a Windows CI configuration for a statically linked libc++ (details)
  10. [lld-macho] Try to unbreak build (details)
  11. Add fuzzer for Rust demangler (details)
  12. [WebAssembly] Update narrowing builtin function operand types (details)
  13. [WebAssembly] Fix constness of pointer params to load intrinsics (details)
  14. [libc++] Move <__sso_allocator> out of include/ into src/. NFCI. (details)
  15. [libc++] [LIBCXX-DEBUG-FIXME] Fix an iterator-invalidation issue in string::assign. (details)
  16. [libc++] [LIBCXX-DEBUG-FIXME] Iterating a string::iterator "off the end" is UB. (details)
  17. [libc++] [LIBCXX-DEBUG-FIXME] Our `__debug_less` breaks some complexity guarantees. (details)
  18. [libc++] [LIBCXX-DEBUG-FIXME] std::advance shouldn't use ADL `>=` on the _Distance type. (details)
  19. [libc++] [LIBCXX-DEBUG-FIXME] Stop using invalid iterators to insert into sets/maps. (details)
  20. [scudo] Align objects with alignas (details)
  21. [mlir][tosa] Add tosa.depthwise lowering to existing linalg.depthwise_conv (details)
  22. [lld] Convert LLVM_CMAKE_PATH to a CMake path (details)
  23. [WebAssembly] Add SIMD const_splat intrinsics (details)
  24. [NFC][X86][Codegen] Add some tests for 64-bit shift by (32-x) (details)
  25. Preserve metadata on masked intrinsics in auto-upgrade (details)
  26. [Utils][NFC] Rename replace-function-regex in update_cc_test_checks (details)
  27. [MachineCSE][NFC]: Refactor and comment on preventing CSE for isConvergent instrs (details)
  28. [mlir] Add polynomial approximation for math::ExpM1 (details)
  29. GlobalISel: Use DAG call lowering infrastructure in a more compatible way (details)
  30. X86/GlobalISel: Use generic version of splitToValueTypes (details)
  31. AMDGPU/GlobalISel: Remove unnecessary override (details)
  32. GlobalISel: Update documentation (details)
  33. [clangd] Split CC and refs limit and increase refs limit to 1000 (details)
  34. [AMDGPU] Improve global SADDR selection (details)
  35. When performing template argument deduction to select a partial (details)
  36. ARM/GlobalISel: Don't store a MachineInstrBuilder reference (details)
  37. AMDGPU: Add a few more tail call tests (details)
Commit aefbfbcbd776f5549b18cd6083d6408f661efacc by ndesaulniers
[Clang] remove text extension from diag::err_drv_invalid_value_with_suggestion

This hinders translations, as per:
https://clang.llvm.org/docs/InternalsManual.html#the-format-string

Reviewed By: MaskRay, xbolva00

Differential Revision: https://reviews.llvm.org/D101387
The file was modifiedflang/lib/Frontend/CompilerInvocation.cpp
The file was modifiedclang/test/Driver/stack-protector-guard.c
The file was modifiedclang/include/clang/Basic/DiagnosticDriverKinds.td
The file was modifiedclang/lib/Driver/ToolChains/Clang.cpp
The file was modifiedflang/test/Driver/fixed-line-length.f90
Commit 7907c46fe6195728fafd843b8c0fb19a3e68e9ad by harald
Make clangd CompletionModel not depend on directory layout.

The current code accounts for two possible layouts, but there is at
least a third supported layout: clang-tools-extra may also be checked
out as clang/tools/extra with the releases, which was not yet handled.
Rather than treating that as a special case, use the location of
CompletionModel.cmake to handle all three cases. This should address the
problems that prompted D96787 and the problems that prompted the
proposed revert D100625.

Reviewed By: usaxena95

Differential Revision: https://reviews.llvm.org/D101851
The file was modifiedclang-tools-extra/clangd/quality/CompletionModel.cmake
Commit 20f51ffe67d12ab72c917dc4b371b55c80321393 by jezng
[lld-macho] Have --reproduce account for path rerooting

We need to account for path rerooting when generating the response
file. We could either reroot the paths before generating the file, or pass
through the original filenames and change just the syslibroot. I've opted for
the latter, in order that the reproduction run more closely mirrors the
original.

We must also be careful *not* to make an absolute path relative if it is
shadowed by a rerooted path. See repro6.tar in reroot-path.s for
details.

I've moved the call to `createResponseFile()` after the initialization of
`config->systemLibraryRoots`, since it now needs to know what those roots are.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D101224
The file was modifiedlld/MachO/DriverUtils.cpp
The file was modifiedlld/test/MachO/reroot-path.s
The file was modifiedlld/MachO/Driver.cpp
The file was modifiedlld/MachO/Driver.h
Commit 8806df4778349dac4e8744b9ae50f43b80eedda3 by jezng
[lld-macho] Preliminary support for ARM_RELOC_BR24

ARM_RELOC_BR24 is used for BL/BLX instructions from within ARM (i.e. not
Thumb) code. This diff just handles the basic case: branches from ARM to
ARM, or from ARM to Thumb where no shimming is required. (See comments
in ARM.cpp for why shims are required.)

Note: I will likely be deprioritizing ARM work for the near future to
focus on other parts of LLD. Apologies for the half-done state of this;
I'm just trying to wrap up what I've already worked on.

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D101814
The file was addedlld/test/MachO/arm-branch-relocs.s
The file was modifiedlld/MachO/Arch/ARM.cpp
Commit 18959a6a094c6469fc2fd5cc167fda7cbe3f163b by eugenis
[hwasan] Fix missing synchronization in AllocThread.

The problem was introduced in D100348.

It's really hard to trigger the bug in a stress test - the race is just too
narrow - but the new checks in Thread::Init should at least provide usable
diagnostic if the problem ever returns.

Differential Revision: https://reviews.llvm.org/D101881
The file was modifiedcompiler-rt/lib/hwasan/hwasan_thread.cpp
The file was modifiedcompiler-rt/lib/hwasan/hwasan_thread_list.h
Commit 25fe17d3c1041de7e2dc5df865d7f65fd074f9a6 by jonathanchesterfield
[libomptarget] Initial documentation on amdgpu offload

[libomptarget] Initial documentation on amdgpu offload

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101927
The file was modifiedopenmp/docs/SupportAndFAQ.rst
Commit 89333b35a7a909d29ae53fddcfb4792d87223b96 by tlively
[WebAssembly] Set alignment to 1 for SIMD memory intrinsics

The WebAssembly SIMD intrinsics in wasm_simd128.h generally try not to require
any particular alignment for memory operations to be maximally flexible. For
builtin memory access functions and their corresponding LLVM IR intrinsics,
there's no way to set the expected alignment, so the best we can do is set the
alignment to 1 in the backend. This change means that the alignment hints in the
emitted code will no longer be incorrect when users use the intrinsics to access
unaligned data.

Differential Revision: https://reviews.llvm.org/D101850
The file was modifiedllvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
The file was modifiedllvm/test/CodeGen/WebAssembly/simd-load-lane-offset.ll
The file was modifiedllvm/test/CodeGen/WebAssembly/simd-load-zero-offset.ll
Commit 7fbc7bfdfddd85e12156556e3c074f6dcef865df by Louis Dionne
[libc++] NFC: Remove stray semicolon in from-scratch config files
The file was modifiedlibcxx/test/configs/libcxx-trunk-static.cfg.in
The file was modifiedlibcxx/test/configs/libcxx-trunk-shared.cfg.in
Commit 9b24ff9cd2efe1d8319af023ffb69efdaf4cd5ce by martin
[libcxx] [ci] Add a Windows CI configuration for a statically linked libc++

On Windows, static vs DLL linking affects details in quite a few
cases, so it's good to have coverage for both cases.

Testing with static linking also increases coverage for a number of
cases and individual checks that have had to be waived for the DLL
case, and allows testing libc++experimental, increasing the number
of test cases actually executed by 180 (176 new tests from
libc++experimental and 4 ones that are XFAIL windows-dll).

Also drop the "generic-" prefix from these configuration names, as
they're perhaps not what the "generic" prefix intended originally
in the other generic-posix configurations.

Differential Revision: https://reviews.llvm.org/D101565
The file was modifiedlibcxx/test/libcxx/experimental/memory/memory.resource.adaptor/memory.resource.adaptor.mem/db_deallocate.pass.cpp
The file was modifiedlibcxx/utils/ci/run-buildbot
The file was modifiedlibcxx/utils/ci/buildkite-pipeline.yml
Commit 75ba35130080f91494b0ceb90c0501b50787b1cc by jezng
[lld-macho] Try to unbreak build

Looks like the PointerUnion casting cares about const-ness...
The file was modifiedlld/MachO/Arch/ARM.cpp
Commit 0e7c2aeaa8c0fe25178f7fc4c61cd92321cdde76 by dblaikie
Add fuzzer for Rust demangler

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101823
The file was addedllvm/tools/llvm-rust-demangle-fuzzer/DummyDemanglerFuzzer.cpp
The file was addedllvm/tools/llvm-rust-demangle-fuzzer/llvm-rust-demangle-fuzzer.cpp
The file was addedllvm/tools/llvm-rust-demangle-fuzzer/CMakeLists.txt
Commit 627a52695537dd2bea068630887431febbf06856 by tlively
[WebAssembly] Update narrowing builtin function operand types

Make the inputs to all narrowing builtins signed, which is how they are
interpreted by the underlying instructions (only the result changes sign
between instructions).

Differential Revision: https://reviews.llvm.org/D101883
The file was modifiedclang/test/CodeGen/builtins-wasm.c
The file was modifiedclang/include/clang/Basic/BuiltinsWebAssembly.def
Commit 602f318cfdac999a8604f1588159326b1a1a1a23 by tlively
[WebAssembly] Fix constness of pointer params to load intrinsics

Update the SIMD builtin load functions to take pointers to const data and update
the intrinsics themselves to not cast away constness.

Differential Revision: https://reviews.llvm.org/D101884
The file was modifiedclang/lib/Headers/wasm_simd128.h
The file was modifiedclang/test/Headers/wasm.c
The file was modifiedclang/test/CodeGen/builtins-wasm.c
The file was modifiedclang/include/clang/Basic/BuiltinsWebAssembly.def
Commit 0b10bb7ddd3c92465ef12d52e88614e6b4c5ef27 by arthur.j.odwyer
[libc++] Move <__sso_allocator> out of include/ into src/. NFCI.

This allocator is not intended for libc++'s users to use;
it's strictly an implementation detail of `src/locale.cpp`.
So, move it to the `src/include/` directory.

Drive-by const-qualify its comparison operators.

For consistency with `__hidden_allocator` (defined in `src/thread.cpp`),
do *not* remove it from "libcxx/lib/libc++unexp.exp",
"libcxx/utils/symcheck-blacklists/linux_blacklist.txt", etc.

Differential Revision: https://reviews.llvm.org/D101293
The file was modifiedlibcxx/src/CMakeLists.txt
The file was modifiedlibcxx/src/locale.cpp
The file was modifiedlibcxx/include/module.modulemap
The file was removedlibcxx/include/__sso_allocator
The file was addedlibcxx/src/include/sso_allocator.h
The file was modifiedlibcxx/include/CMakeLists.txt
Commit db9425cb060bd076fcdcbb5a37bfd992deff2086 by arthur.j.odwyer
[libc++] [LIBCXX-DEBUG-FIXME] Fix an iterator-invalidation issue in string::assign.

This appears to be a bug in our string::assign: when assigning into
a longer string, from a shorter snippet of itself, we invalidate
iterators before doing the copy. We should invalidate them afterward.
Also drive-by improve the formatting of a function header.

Differential Revision: https://reviews.llvm.org/D101675
The file was modifiedlibcxx/test/std/strings/basic.string/string.modifiers/string_assign/iterator.pass.cpp
The file was modifiedlibcxx/include/string
Commit 12dd9cdf1a8267e0c5db4f191f2598648de02619 by arthur.j.odwyer
[libc++] [LIBCXX-DEBUG-FIXME] Iterating a string::iterator "off the end" is UB.

The range of char pointers [data, data+size] is a valid closed range,
but the range [begin, end) is valid only half-open.

Differential Revision: https://reviews.llvm.org/D101676
The file was modifiedlibcxx/test/std/input.output/filesystems/class.path/path.nonmember/path.factory.pass.cpp
Commit 165ad89947e8ef6c08c80eb067d85b4fa9074904 by arthur.j.odwyer
[libc++] [LIBCXX-DEBUG-FIXME] Our `__debug_less` breaks some complexity guarantees.

`__debug_less` ends up running the comparator up-to-twice per comparison,
because whenever `(x < y)` it goes on to verify that `!(y < x)`.
This breaks the strict "Complexity" guarantees of algorithms like
`inplace_merge`, which we test in the test suite. So, just skip the
complexity assertions in debug mode.

Differential Revision: https://reviews.llvm.org/D101677
The file was modifiedlibcxx/docs/DesignDocs/DebugMode.rst
The file was modifiedlibcxx/test/std/algorithms/alg.sorting/alg.merge/inplace_merge_comp.pass.cpp
Commit 9571b8f238f97bce01bcf3c84a4f87cfb1c00dbf by arthur.j.odwyer
[libc++] [LIBCXX-DEBUG-FIXME] std::advance shouldn't use ADL `>=` on the _Distance type.

Convert to a primitive type first; then use primitive `>=` on that value.

Differential Revision: https://reviews.llvm.org/D101678
The file was modifiedlibcxx/include/iterator
The file was modifiedlibcxx/test/std/iterators/iterator.primitives/iterator.operations/robust_against_adl.pass.cpp
Commit 9ea2db2c513534aa63acc087b8dc744c37119d02 by arthur.j.odwyer
[libc++] [LIBCXX-DEBUG-FIXME] Stop using invalid iterators to insert into sets/maps.

This simply applies Howard's commit 4c80bfbd53caf consistently
across all the associative and unordered container tests.

"unord.set/insert_hint_const_lvalue.pass.cpp" failed with `-D_LIBCPP_DEBUG=1`
before this patch; it was the only one that incorrectly reused
invalid iterator `e`. The others already used valid iterators
(generally `c.end()`); I'm just making them all match the same pattern
of usage: "e, then r, then c.end() for the rest."

Differential Revision: https://reviews.llvm.org/D101679
The file was modifiedlibcxx/test/std/containers/unord/unord.map/unord.map.modifiers/insert_hint_rvalue.pass.cpp
The file was modifiedlibcxx/test/std/containers/unord/unord.set/insert_hint_const_lvalue.pass.cpp
The file was modifiedlibcxx/test/std/containers/unord/unord.set/insert_hint_rvalue.pass.cpp
The file was modifiedlibcxx/test/std/containers/unord/unord.map/unord.map.modifiers/insert_hint_const_lvalue.pass.cpp
The file was modifiedlibcxx/test/std/containers/unord/unord.multiset/insert_hint_const_lvalue.pass.cpp
The file was modifiedlibcxx/test/std/containers/unord/unord.multimap/unord.multimap.modifiers/insert_hint_const_lvalue.pass.cpp
Commit 1d767b13bfad806bf584e0b054eb7d00a494591d by Vitaly Buka
[scudo] Align objects with alignas

Operator new must align allocations for types with large alignment.

Before c++17 behavior was implementation defined and both clang and gc++
before 11 ignored alignment. Miss-aligned objects mysteriously crashed
tests on Ubuntu 14.

Alternatives are compile with -std=c++17 or -faligned-new, but they were
discarded as less portable.

Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D101874
The file was modifiedcompiler-rt/lib/scudo/standalone/tests/combined_test.cpp
The file was modifiedcompiler-rt/lib/scudo/standalone/tests/primary_test.cpp
Commit 7abb56c78ba7bb9e2a91f61a65bb8feb69a92865 by rob.suderman
[mlir][tosa] Add tosa.depthwise lowering to existing linalg.depthwise_conv

Implements support for undialated depthwise convolution using the existing
depthwise convolution operation. Once convolutions migrate to yaml defined
versions we can rewrite for cleaner implementation.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D101579
The file was modifiedmlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir
The file was modifiedmlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
Commit 662a58fa0534508c2c37b22425bfdf16b9d985a8 by isuruf
[lld] Convert LLVM_CMAKE_PATH to a CMake path

Otherwise I get the following error on windows.
```
CMake Error at D:/bld/lld_1569206597988/work/build/CMakeFiles/CMakeTmp/CMakeLists.txt:2 (set):
  Syntax error in cmake code at

    D:/bld/lld_1569206597988/work/build/CMakeFiles/CMakeTmp/CMakeLists.txt:2

  when parsing string

    D:\bld\lld_1569206597988\_h_env\Library\lib\cmake\llvm

  Invalid character escape '\b'.

CMake Error at D:/bld/lld_1569206597988/_build_env/Library/share/cmake-3.15/Modules/CheckSymbolExists.cmake:100 (try_compile):
  Failed to configure test project build system.
Call Stack (most recent call first):
  D:/bld/lld_1569206597988/_build_env/Library/share/cmake-3.15/Modules/CheckSymbolExists.cmake:57 (__CHECK_SYMBOL_EXISTS_IMPL)
  D:/bld/lld_1569206597988/_h_env/Library/lib/cmake/llvm/HandleLLVMOptions.cmake:943 (check_symbol_exists)
  CMakeLists.txt:56 (include)
```

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D68158
The file was modifiedlld/CMakeLists.txt
Commit 81fce29d6e1f0a83e8a4170c7f24cdd93869d55a by tlively
[WebAssembly] Add SIMD const_splat intrinsics

These intrinsics do not correspond to their own underlying instruction, but are
a convenience for the common case of materializing a constant vector that has
the same value in each lane.

Differential Revision: https://reviews.llvm.org/D101885
The file was modifiedclang/test/Headers/wasm.c
The file was modifiedclang/lib/Headers/wasm_simd128.h
Commit 40147c33d17eca98d186628272a076a1bb3e6868 by lebedev.ri
[NFC][X86][Codegen] Add some tests for 64-bit shift by (32-x)
The file was addedllvm/test/CodeGen/X86/64-bit-shift-by-32-minus-y.ll
Commit 1817dae1924144c19b9caec196f574c51d6d9957 by kparzysz
Preserve metadata on masked intrinsics in auto-upgrade

When auto-upgrade was replacing a call to a masked intrinsic, it would
not copy the metadata from the original call.

If an intrinsic had metadata, but did not need any updates, the metadata
would stay, but if an update was needed, the would end up being removed.
A similar effect could be observed with masked_expandload and
masked_compressstore, which at the moment are not handled by auto-upgrade:
the metadata remained untouched.

Differential Revision: https://reviews.llvm.org/D101201
The file was modifiedllvm/lib/IR/AutoUpgrade.cpp
The file was addedllvm/test/Bitcode/upgrade-masked-keep-metadata.ll
Commit 78a7d8c4dd1076dccfde2c48fc924d8f5529f4d1 by georgakoudis1
[Utils][NFC] Rename replace-function-regex in update_cc_test_checks

This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101934
The file was modifiedclang/test/OpenMP/nvptx_target_teams_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
The file was modifiedllvm/utils/update_llc_test_checks.py
The file was modifiedclang/test/OpenMP/target_parallel_for_debug_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
The file was modifiedllvm/utils/UpdateTestChecks/common.py
The file was modifiedclang/test/OpenMP/nvptx_target_parallel_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_data_sharing.cpp
The file was modifiedclang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_target_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_lambda_capturing.cpp
The file was modifiedclang/test/utils/update_cc_test_checks/Inputs/generated-funcs-regex.c
The file was modifiedclang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_parallel_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_parallel_for_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
The file was modifiedclang/test/utils/update_cc_test_checks/Inputs/generated-funcs-regex.c.expected
The file was modifiedclang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
The file was modifiedclang/test/OpenMP/target_parallel_debug_codegen.cpp
The file was modifiedclang/test/OpenMP/nvptx_allocate_codegen.cpp
The file was modifiedclang/test/utils/update_cc_test_checks/generated-funcs-regex.test
The file was modifiedllvm/utils/update_analyze_test_checks.py
Commit a11489ae3e36063c64921439cbab89d1f3280f4a by mkitzan
[MachineCSE][NFC]: Refactor and comment on preventing CSE for isConvergent instrs

- Move the code preventing CSE of `isConvergent` instrs into
  `ProcessBlockCSE` (from `isProfitableToCSE`)
- Add comments explaining why `isConvergent` is used to prevent
  CSE of non-local instrs in MachineCSE and the new test
The file was modifiedllvm/lib/CodeGen/MachineCSE.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
Commit 0edc4bc84aa246ee1f156982e19a1b8b5fbecf4c by ezhulenev
[mlir] Add polynomial approximation for math::ExpM1

This approximation matches the one in Eigen.

```
name                      old cpu/op  new cpu/op  delta
BM_mlir_Expm1_f32/10      90.9ns ± 4%  52.2ns ± 4%  -42.60%    (p=0.000 n=74+87)
BM_mlir_Expm1_f32/100      837ns ± 3%   231ns ± 4%  -72.43%    (p=0.000 n=79+69)
BM_mlir_Expm1_f32/1k      8.43µs ± 3%  1.58µs ± 5%  -81.30%    (p=0.000 n=77+83)
BM_mlir_Expm1_f32/10k     83.8µs ± 3%  15.4µs ± 5%  -81.65%    (p=0.000 n=83+69)
BM_eigen_s_Expm1_f32/10   68.8ns ±17%  72.5ns ±14%   +5.40%  (p=0.000 n=118+115)
BM_eigen_s_Expm1_f32/100   694ns ±11%   717ns ± 2%   +3.34%   (p=0.000 n=120+75)
BM_eigen_s_Expm1_f32/1k   7.69µs ± 2%  7.97µs ±11%   +3.56%   (p=0.000 n=95+117)
BM_eigen_s_Expm1_f32/10k  88.0µs ± 1%  89.3µs ± 6%   +1.45%   (p=0.000 n=74+106)
BM_eigen_v_Expm1_f32/10   44.3ns ± 6%  45.0ns ± 8%   +1.45%   (p=0.018 n=81+111)
BM_eigen_v_Expm1_f32/100   351ns ± 1%   360ns ± 9%   +2.58%    (p=0.000 n=73+99)
BM_eigen_v_Expm1_f32/1k   3.31µs ± 1%  3.42µs ± 9%   +3.37%   (p=0.000 n=71+100)
BM_eigen_v_Expm1_f32/10k  33.7µs ± 8%  34.1µs ± 9%   +1.04%    (p=0.007 n=99+98)
```

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D101852
The file was modifiedmlir/lib/Dialect/Math/Transforms/PolynomialApproximation.cpp
The file was modifiedmlir/test/Dialect/Math/polynomial-approximation.mlir
The file was modifiedmlir/test/mlir-cpu-runner/math_polynomial_approx.mlir
Commit fa0b93b5a0866aad3ce517daab6cd91cc67823ad by Matthew.Arsenault
GlobalISel: Use DAG call lowering infrastructure in a more compatible way

Unfortunately the current call lowering code is built on top of the
legacy MVT/DAG based code. However, GlobalISel was not using it the
same way. In short, the DAG passes legalized types to the assignment
function, and GlobalISel was passing the original raw type if it was
simple.

I do believe the DAG lowering is conceptually broken since it requires
picking a type up front before knowing how/where the value will be
passed. This ends up being a problem for AArch64, which wants to pass
i1/i8/i16 values as a different size if passed on the stack or in
registers.

The argument type decision is split across 3 different places which is
hard to follow. SelectionDAG builder uses
getRegisterTypeForCallingConv to pick a legal type, tablegen gives the
illusion of controlling the type, and the target may have additional
hacks in the C++ part of the call lowering. AArch64 hacks around this
by not using the standard AnalyzeFormalArguments and special casing
i1/i8/i16 by looking at the underlying type of the original IR
argument.

I believe people have generally assumed the calling convention code is
processing the original types, and I've discovered a number of dead
paths in several targets.

x86 actually relies on the opposite behavior from AArch64, and relies
on x86_32 and x86_64 sharing calling convention code where the 64-bit
cases implicitly do not work on x86_32 due to using the pre-legalized
types.

AMDGPU targets without legal i16/f16 have always used a broken ABI
that promotes to i32/f32. GlobalISel accidentally fixed this to be the
ABI we should have, but this fixes it so we're using the worse ABI
that is compatible with the DAG. Ideally we would fix the DAG to match
the old GlobalISel behavior, but I don't wish to fight that battle.

A new native GlobalISel call lowering framework should let the target
process the incoming types directly.

CCValAssigns select a "ValVT" and "LocVT" but the meanings of these
aren't entirely clear. Different targets don't use them consistently,
even within their own call lowering code. My current belief is the
intent was "ValVT" is supposed to be the legalized value type to use
in the end, and and LocVT was supposed to be the ABI passed type
(which is also legalized).

With the default CCState::Analyze functions always passing the same
type for these arguments, these only differ when the TableGen part of
the lowering decide to promote the type from one legal type to
another. AArch64's i1/i8/i16 hack ends up inverting the meanings of
these values, so I had to add an additional hack to let the target
interpret how large the argument memory is.

Since targets don't consistently interpret ValVT and LocVT, this
doesn't produce quite equivalent code to the initial DAG
lowerings. I've opted to consistently interpret LocVT as the in-memory
size for stack passed values, and ValVT as the register type to assign
from that memory. We therefore produce extending loads directly out of
the IRTranslator, whereas the DAG would emit regular loads of smaller
values. This will also produce loads/stores that are wider than the
argument value if the allocated stack slot is larger (and there will
be undef padding bytes). If we had the optimizations to reduce
load/stores based on truncated values, this wouldn't produce a
different end result.

Since ValVT/LocVT are more consistently interpreted, we now will emit
more G_BITCASTS as requested by the CCAssignFn. For example AArch64
was directly assigning types to some physical vector registers which
according to the tablegen spec should have been casted to a vector
with a different element type.

This also moves the responsibility for inserting
G_ASSERT_SEXT/G_ASSERT_ZEXT from the target ValueHandlers into the
generic code, which is closer to how SelectionDAGBuilder works.

I had to xfail an x86 test since I don't see a quick way to fix it
right now (I filed bug 50035 for this). It's broken independently of
this change, and only triggers since now we end up with more ands
which hit the improperly handled selection pattern.

I also observed that FP arguments that need promotion (e.g. f16 passed
as f32) are broken, and use regular G_TRUNC and G_ANYEXT.

TLDR; the current call lowering infrastructure is bad and nobody has
ever understood how it chooses types.
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/roundeven.ll
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
The file was modifiedllvm/lib/CodeGen/GlobalISel/CallLowering.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/fpow.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/call-translator.ll
The file was modifiedllvm/test/CodeGen/X86/GlobalISel/irtranslator-callingconv.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/arm64-callingconv.ll
The file was modifiedllvm/include/llvm/CodeGen/GlobalISel/CallLowering.h
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/fma.ll
The file was modifiedllvm/test/CodeGen/ARM/GlobalISel/arm-legalize-vfp4.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/arm64-callingconv-ios.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll
The file was modifiedllvm/test/CodeGen/X86/GlobalISel/add-scalar.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
The file was modifiedllvm/test/CodeGen/ARM/GlobalISel/arm-param-lowering.ll
The file was modifiedllvm/test/CodeGen/X86/GlobalISel/ext.ll
The file was modifiedllvm/test/CodeGen/X86/GlobalISel/callingconv.ll
The file was modifiedllvm/test/CodeGen/ARM/GlobalISel/arm-irtranslator.ll
The file was modifiedllvm/lib/Target/ARM/ARMCallLowering.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll
The file was modifiedllvm/test/CodeGen/ARM/GlobalISel/arm-unsupported.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/irtranslator-reductions.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
The file was modifiedllvm/lib/Target/X86/X86CallLowering.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/dummy-target.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll
The file was modifiedllvm/test/CodeGen/ARM/GlobalISel/arm-isel.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
The file was modifiedllvm/test/CodeGen/X86/GlobalISel/memop-scalar-x32.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/shl-ext-reduce.ll
Commit 23ae35e858da37c753b8efaac965046358ec3818 by Matthew.Arsenault
X86/GlobalISel: Use generic version of splitToValueTypes

The custom insert of an unmerge and the callback weirdness should be
unnecessary. Since handleAssignments should now use
getRegisterTypeForCalling conv as SelectionDAG builder would, this
should now just be able to use the generic code. X86-32 relies on the
generated CCAssignFns not seeing illegal types and sharing code with
x86_64, so i64 values would incorrectly be assigned to 64-bit
registers.
The file was modifiedllvm/lib/Target/X86/X86CallLowering.h
The file was modifiedllvm/lib/Target/X86/X86CallLowering.cpp
The file was modifiedllvm/test/CodeGen/X86/GlobalISel/irtranslator-callingconv.ll
Commit 8fc4eb9e732006b3b4f0b224c79ab097f3026f85 by Matthew.Arsenault
AMDGPU/GlobalISel: Remove unnecessary override

This is the same as the default implementation
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
Commit e723b511e6e951444d2a646a23fc2e9cf4faecd4 by Matthew.Arsenault
GlobalISel: Update documentation
The file was modifiedllvm/docs/GlobalISel/IRTranslator.rst
Commit e623ce6188d698422d4ead24065056d6a869e6f8 by kbobyrev
[clangd] Split CC and refs limit and increase refs limit to 1000

Related discussion: https://github.com/clangd/clangd/discussions/761

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D101902
The file was modifiedclang-tools-extra/clangd/tool/ClangdMain.cpp
The file was modifiedclang-tools-extra/clangd/ClangdLSPServer.h
The file was modifiedclang-tools-extra/clangd/ClangdLSPServer.cpp
Commit 909a5ccf3be7868b24320aaaf0e588b56ba6e3f3 by Stanislav.Mekhanoshin
[AMDGPU] Improve global SADDR selection

An address can be a uniform sum of two i64 bit values.
That regularly happens in a loop where index is an induction
variable promoted to 64 bit by the LSR. We can materialize
zero in a VGPR and still use SADDR form of the load.

Differential Revision: https://reviews.llvm.org/D101591
The file was modifiedllvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-global-saddr.mir
The file was modifiedllvm/test/CodeGen/AMDGPU/global_atomics_i64.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/offset-split-global.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/global_atomics.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/global-saddr-load.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
Commit 6bbfa0fd408e81055c360c2e059554dd76fd7f09 by richard
When performing template argument deduction to select a partial
specialization while substituting a partial template parameter pack,
don't try to extend the existing deduction.

This caused us to select the wrong partial specialization in some rare
cases. A recent change to libc++ caused this to happen in practice for
code using std::conjunction.
The file was modifiedclang/lib/Sema/SemaTemplateDeduction.cpp
The file was modifiedclang/test/SemaTemplate/partial-spec-instantiate.cpp
Commit 6e88539ab16de1cbe1b6b0a7f2922fd5e710cab9 by Matthew.Arsenault
ARM/GlobalISel: Don't store a MachineInstrBuilder reference

This is basically a pointer anyway
The file was modifiedllvm/lib/Target/ARM/ARMCallLowering.cpp
Commit ef5f0adecd02d92cbb1a713ac7316f6768269412 by Matthew.Arsenault
AMDGPU: Add a few more tail call tests

Add some cases I noticed were missing when porting to GlobalISel. The
cases that required any argument splitting did not work at first.
The file was modifiedllvm/test/CodeGen/AMDGPU/sibling-call.ll