Changes

Summary

  1. [AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls (details)
  2. Revert "[X86FixupLEAs] Sub register usage of LEA dest should block LEA/SUB optimization" (details)
  3. Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB" (details)
  4. [lit] Attempt for fix tests failing because of 'warning: non-portable path to file' (details)
  5. Revert "Allow signposts to take advantage of deferred string substitution" (details)
  6. [MLIR] Simplify affine.if ops with trivial conditions (details)
  7. [VPlan] Add more sinking/merging tests with predicated loads/stores. (details)
  8. [clang] NRVO: Improvements and handling of more cases. (details)
  9. Revert "Revert "DirectoryWatcher: add an implementation for Windows"" (details)
  10. [X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't guarantee upper 32 bits are zero. (details)
  11. [CHR] Don't run ControlHeightReduction if any BB has address taken (details)
  12. [llvm-objcopy] Exclude empty sections in IHexWriter output (details)
  13. Use dyn_cast_or_null instead of dyn_cast in FunctionLike::verifyTrait (NFC) (details)
  14. [NFC][X86][Codegen] Add shuffle test that would benefit from sorting in reduceBuildVecToShuffle() (details)
  15. Simplify getArgAttrDict/getResultAttrDict by removing unnecessary checks (details)
  16. [ORC-RT] Split Simple-Packed-Serialization code into its own header. (details)
  17. [X86] Check immediate before get it. (details)
  18. llvm-objcopy: fix section size truncation/extension when dumping sections (details)
  19. [runtimes] Fix umbrella component targets (details)
Commit c27e8141b3d1265d2ab1cb951c4330b961fab9ee by Madhur.Amilkanthwar
[AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls

This patch computes max SGPRs and VGPRs used by module
in presence of indirect calls and makes that
as register requirement for functions/kernels
which makes indirect calls.

This patch also refactors code AMDGPUSubTarget.cpp
which add a "base" variants of getMaxNumSGPRs which
is used by MachineFunction and new Function version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103636
The file was modifiedllvm/test/CodeGen/AMDGPU/amdpal-callable.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/indirect-call.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll
The file was modifiedllvm/lib/Target/AMDGPU/GCNSubtarget.h
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
The file was modifiedllvm/test/CodeGen/AMDGPU/agpr-register-count.ll
Commit e087b4f14986f8330336b005c6ebb41f2bdaea63 by flo
Revert "[X86FixupLEAs] Sub register usage of LEA dest should block LEA/SUB optimization"

This reverts commit f35bcea1d4748889b8240defdf00cb7a71cbe070 because it
depends on 1b748faf2bae246e2fc77d88420df13c2e60f4df, which breaks
building the llvm-test-suite with -verify-machineinstrs on X86.

See 154adc0f135cff3f8a8861c335d2b88c8049d098 for more details.
The file was modifiedllvm/test/CodeGen/X86/lea-opt2.ll
The file was modifiedllvm/lib/Target/X86/X86FixupLEAs.cpp
Commit 5cd66420ccb196d2af2abfb8e27c74b0e5721718 by flo
Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"

This reverts commit 1b748faf2bae246e2fc77d88420df13c2e60f4df because it
breaks building the llvm-test-suite with -verify-machineinstrs on X86:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/

Running llc -verify-machineinstr on X86 crashes on the IR below:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

    %struct.widget = type { i32, i32, i32, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8**, i32*, i32***, i32**, i32, i32, i32, i32, %struct.baz*, %struct.wobble.1*, i32, i32, i32, i32, i32, i32, %struct.quux.2*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32***, i32***, i32****, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork*, %struct.wombat.0*, %struct.wobble*, i32, i32*, i32*, i32*, i32, i32*, i32*, i32*, i32 (%struct.widget*, %struct.eggs*)*, i32, i32, i32, i32 }
    %struct.snork = type { %struct.spam*, %struct.zot, i32 (%struct.wombat*, %struct.widget*, %struct.snork*)* }
    %struct.spam = type { i32, i32, i32, i32, i8*, i32 }
    %struct.zot = type { i32, i32, i32, i32, i32, i8*, i32* }
    %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32*, i32*)*, void (%struct.wombat*, %struct.widget*, %struct.zot*)* }
    %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] }
    %struct.quux = type { i16, i8 }
    %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] }
    %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1*, %struct.wobble.1*, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* }
    %struct.zot.3 = type { i64, i16, i16, i16 }

    define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr {
    bb:
      %tmp = load i32, i32* undef, align 4
      %tmp2 = sdiv i32 %tmp, 6
      %tmp3 = sdiv i32 undef, 6
      %tmp4 = load i32, i32* undef, align 4
      %tmp5 = icmp eq i32 %tmp4, 4
      %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2
      %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0
      %tmp8 = zext i16 undef to i32
      %tmp9 = zext i16 undef to i32
      %tmp10 = load i16, i16* undef, align 2
      %tmp11 = zext i16 %tmp10 to i32
      %tmp12 = zext i16 undef to i32
      %tmp13 = zext i16 undef to i32
      %tmp14 = zext i16 undef to i32
      %tmp15 = load i16, i16* undef, align 2
      %tmp16 = zext i16 %tmp15 to i32
      %tmp17 = zext i16 undef to i32
      %tmp18 = sub nsw i32 %tmp8, %tmp9
      %tmp19 = shl nsw i32 undef, 1
      %tmp20 = add nsw i32 %tmp19, %tmp18
      %tmp21 = sub nsw i32 %tmp11, %tmp12
      %tmp22 = shl nsw i32 undef, 1
      %tmp23 = add nsw i32 %tmp22, %tmp21
      %tmp24 = sub nsw i32 %tmp13, %tmp14
      %tmp25 = shl nsw i32 undef, 1
      %tmp26 = add nsw i32 %tmp25, %tmp24
      %tmp27 = sub nsw i32 %tmp16, %tmp17
      %tmp28 = shl nsw i32 undef, 1
      %tmp29 = add nsw i32 %tmp28, %tmp27
      %tmp30 = sub nsw i32 %tmp20, %tmp29
      %tmp31 = sub nsw i32 %tmp23, %tmp26
      %tmp32 = shl nsw i32 %tmp30, 1
      %tmp33 = add nsw i32 %tmp32, %tmp31
      store i32 %tmp33, i32* undef, align 4
      %tmp34 = mul nsw i32 %tmp31, -2
      %tmp35 = add nsw i32 %tmp34, %tmp30
      store i32 %tmp35, i32* undef, align 4
      %tmp36 = select i1 %tmp5, i32 undef, i32 undef
      br label %bb37

    bb37:                                             ; preds = %bb
      %tmp38 = load i32, i32* undef, align 4
      %tmp39 = ashr i32 %tmp38, %tmp6
      %tmp40 = load i32, i32* undef, align 4
      %tmp41 = sdiv i32 %tmp39, %tmp40
      store i32 %tmp41, i32* undef, align 4
      ret void
    }
The file was modifiedllvm/include/llvm/CodeGen/TargetInstrInfo.h
The file was modifiedllvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.h
The file was modifiedllvm/test/CodeGen/X86/vp2intersect_multiple_pairs.ll
The file was modifiedllvm/test/CodeGen/X86/lea-opt2.ll
The file was modifiedllvm/lib/CodeGen/TwoAddressInstructionPass.cpp
The file was modifiedllvm/lib/Target/X86/X86FixupLEAs.cpp
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.cpp
Commit 8e62797963875e0cf93fcabda9e18bc0eff5da11 by kbessonova
[lit] Attempt for fix tests failing because of 'warning: non-portable path to file'

This is an attempt to fix clang test failures due to 'nonportable-include-path'
warnings on Windows when a path to llvm-project's base directory contains some
uppercase letters (excluding a drive letter).

The issue originates from 2 problems:
* discovery.py loads site config in lower case causing all the paths
based on __file__ and requested within the config file to be in lowercase as well,
* neither os.path.abspath() nor os.path.realpath() (both used to obtain paths of
config files, sources, object directories, etc) do not return paths in the correct
case for Windows (at least consistently for all python versions).

As os.path library doesn't seem to provide any relaible way to restore
the case for paths on Windows, this patch proposes to use pathlib.resolve().
pathlib is a part of Python 3.4 while llvm lit requires Python 3.6.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D103014
The file was modifiedllvm/utils/lit/lit/discovery.py
The file was modifiedllvm/cmake/modules/AddLLVM.cmake
Commit b4583a5ad73b633c3eac5ffbad93f2405e1418ab by flo
Revert "Allow signposts to take advantage of deferred string substitution"

This reverts commit 4fc93a3a1f95ef5a0a57750fc621f2411ea445a8 because it
breaks LLDB builds on certain macOS platform & SDK combinations, e.g.
http://green.lab.llvm.org/green/job/lldb-cmake-standalone/3288/consoleFull#-195476041949ba4694-19c4-4d7e-bec5-911270d8a58c
The file was modifiedllvm/lib/Support/Signposts.cpp
The file was modifiedllvm/include/llvm/Support/Signposts.h
The file was modifiedlldb/include/lldb/Utility/Timer.h
The file was modifiedllvm/lib/Support/Timer.cpp
The file was modifiedlldb/source/Utility/Timer.cpp
Commit 466e5aba6495644eb8ba84c3e7f07bf802ff84f0 by uday
[MLIR] Simplify affine.if ops with trivial conditions

The commit simplifies affine.if ops :
The affine if operation gets removed if the condition is universally true or false and then/else block is merged with the parent block.

Signed-off-by: Shashij Gupta shashij.gupta@polymagelabs.com

Reviewed By: bondhugula, pr4tgpt

Differential Revision: https://reviews.llvm.org/D104015
The file was modifiedmlir/lib/Dialect/Affine/IR/AffineOps.cpp
The file was modifiedmlir/test/Dialect/Affine/simplify-affine-structures.mlir
The file was modifiedmlir/test/Dialect/Affine/loop-unswitch.mlir
Commit 0d9e8f5f4b68252c6caa1ef81a30777b2f5d7242 by flo
[VPlan] Add more sinking/merging tests with predicated loads/stores.
The file was modifiedllvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll
Commit 1e50c3d785f4563873ab1ce86559f2a1285b5678 by mizvekov
[clang] NRVO: Improvements and handling of more cases.

This expands NRVO propagation for more cases:

Parse analysis improvement:
* Lambdas and Blocks with dependent return type can have their variables
  marked as NRVO Candidates.

Variable instantiation improvements:
* Fixes crash when instantiating NRVO variables in Blocks.
* Functions, Lambdas, and Blocks which have auto return type have their
  variables' NRVO status propagated. For Blocks with non-auto return type,
  as a limitation, this propagation does not consider the actual return
  type.

This also implements exclusion of VarDecls which are references to
dependent types.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: Quuxplusone

Differential Revision: https://reviews.llvm.org/D99696
The file was modifiedclang/lib/Sema/Sema.cpp
The file was modifiedclang/test/CodeGen/nrvo-tracking.cpp
The file was modifiedclang/lib/Sema/SemaCoroutine.cpp
The file was modifiedclang/lib/Sema/SemaStmt.cpp
The file was modifiedclang/lib/Sema/SemaExprCXX.cpp
The file was modifiedclang/include/clang/Sema/Sema.h
The file was modifiedclang/lib/Sema/SemaTemplateInstantiateDecl.cpp
Commit 76f1baa7875acd88bdd4b431eed6e2d2decfc0fe by Saleem Abdulrasool
Revert "Revert "DirectoryWatcher: add an implementation for Windows""

This reverts commit 0ec1cf13f2a4e31aa2c5ccc665c5fbdcd3a94577.

Restore the implementation with some minor tweaks:
- Use std::unique_ptr for the path instead of std::vector
  * Stylistic improvement as the buffer is already heap allocated, this
    just makes it clearer.
- Correct the notification buffer allocation size
  * Memory usage fix: we were allocating 4x the computed size
- Correct the passing of the buffer size to RDC
  * Memory usage fix: we were reporting 1/4th of the size
- Convert the operation event to auto-reset
  * Bug Fix: we never reset the event
- Remove `FILE_NOTIFY_CHANGE_LAST_ACCESS` from RDC events
  * Memory usage fix: we never needed this notification
- Fold events for the notification action
  * Stylistic improvement to be clear how the events map
- Update comment
  * Stylistic improvement to be clear what the RAII controls
- Fix the race condition that was uncovered previously
  * We would return from the construction before the watcher thread
    began execution.  The test would then proceed to begin execution,
    and we would miss the initial notifications.  We now ensure that the
    watcher thread is initialized before we return.  This ensures that
    we do not miss the initial notifications.

Running the test on a SSD was able to uncover the access pattern.  This
now seems to pass reliably where it was previously flaky locally.
The file was modifiedclang/unittests/DirectoryWatcher/CMakeLists.txt
The file was modifiedclang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp
Commit c997867dc084a1bcf631816f964b3ff49a297ba3 by craig.topper
[X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't guarantee upper 32 bits are zero.

The freeze issue was reported here
https://llvm.discourse.group/t/bug-or-feature-freeze-instruction/3639

I don't have a test for AssertAlign. I just noticed it was missing
and assume it should be similar to the other two Asserts.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104178
The file was modifiedllvm/lib/Target/X86/X86InstrCompiler.td
The file was modifiedllvm/test/CodeGen/X86/freeze.ll
Commit fae7debadcea335d4aaddee82406a8d10426e730 by lxfind
[CHR] Don't run ControlHeightReduction if any BB has address taken

This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.

Reviewed By: aeubanks, wenlei, hoy

Differential Revision: https://reviews.llvm.org/D103867
The file was modifiedllvm/test/Transforms/PGOProfile/chr.ll
The file was modifiedllvm/lib/Transforms/Instrumentation/ControlHeightReduction.cpp
Commit 5899278758b66a5e72c1ae9695651c2aef7406c4 by i
[llvm-objcopy] Exclude empty sections in IHexWriter output

IHexWriter was evaluating a section's physical address when deciding if
that section should be written to an output. This approach does not
account for a zero-sized section that has the same physical address as a
sized section. The behavior varies from GNU objcopy, and may result in a
HEX file that does not include all program sections.

The IHexWriter now excludes zero-sized sections when deciding what
should be written to the output. This affects the contents of the
writer's `Sections` collection; we will not try to insert multiple
sections that could have the same physical address. The behavior seems
consistent with GNU objcopy, which always excludes empty sections,
no matter the address.

The new test case evaluates the IHexWriter behavior when provided a
variety of empty sections that overlap or append a filled section. See
the input file's comments for more information. Given that test input,
and the change to the IHexWriter, GNU objcopy and llvm-objcopy produce
the same output.

Reviewed By: jhenderson, MaskRay, evgeny777

Differential Revision: https://reviews.llvm.org/D101332
The file was modifiedllvm/test/tools/llvm-objcopy/ELF/Inputs/ihex-elf-sections2.yaml
The file was addedllvm/test/tools/llvm-objcopy/ELF/ihex-writer-empty-sections.test
The file was modifiedllvm/tools/llvm-objcopy/ELF/Object.cpp
Commit 8bc1ce0f61da0b2d0c3a17aec898df7001336ea5 by joker.eph
Use dyn_cast_or_null instead of dyn_cast in FunctionLike::verifyTrait (NFC)

This is making the verifier more tolerant to cases where a "null"
Attribute would be inserted in the array of func arguments/results
attributes.
The file was modifiedmlir/include/mlir/IR/FunctionSupport.h
Commit 2db64e199aa3b48d3495da3620c8cdb045da94f3 by lebedev.ri
[NFC][X86][Codegen] Add shuffle test that would benefit from sorting in reduceBuildVecToShuffle()
The file was modifiedllvm/test/CodeGen/X86/oddshuffles.ll
Commit 152c9871e6ac7ba2a14dcc64e812b79193421846 by joker.eph
Simplify getArgAttrDict/getResultAttrDict by removing unnecessary checks

There is a slight change in behavior: if the arg dictionnary is empty
then we return this empty dictionnary instead of a null attribute.
This is more consistent with accessing it through:

  ArrayAttr args_attr = func_op.getAllArgAttrs();
  args_attr[num].cast<DictionnaryAttr>() ...

Differential Revision: https://reviews.llvm.org/D104189
The file was modifiedmlir/lib/IR/FunctionSupport.cpp
Commit 49f4a58d53c72abee6da3443028fff6bb2d8fe35 by Lang Hames
[ORC-RT] Split Simple-Packed-Serialization code into its own header.

This will simplify integration of this code into LLVM -- The
Simple-Packed-Serialization code can be copied near-verbatim, but
WrapperFunctionResult will require more adaptation.
The file was modifiedcompiler-rt/lib/orc/wrapper_function_utils.h
The file was modifiedcompiler-rt/lib/orc/unittests/CMakeLists.txt
The file was addedcompiler-rt/lib/orc/simple_packed_serialization.h
The file was addedcompiler-rt/lib/orc/unittests/simple_packed_serialization_test.cpp
The file was modifiedcompiler-rt/lib/orc/common.h
The file was modifiedcompiler-rt/lib/orc/CMakeLists.txt
The file was modifiedcompiler-rt/lib/orc/unittests/wrapper_function_utils_test.cpp
Commit 9eb2f723c24523194b833779d20b027bf89a4f55 by yuanke.luo
[X86] Check immediate before get it.

For CMP imm instruction, when the operand 1 is symbol address we should
check if it is immediate first. Here is the example code.
`CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def
$eflags`
Many thanks to Craig, Topper for the test case to reproduce this issue.

Differential Revision: https://reviews.llvm.org/D104037
The file was addedllvm/test/CodeGen/X86/unfoldMemoryOperand.mir
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.cpp
Commit 02c718301b305dff87aa4b204b7b3e6fc647999d by dblaikie
llvm-objcopy: fix section size truncation/extension when dumping sections

Since this only comes up with inputs containing sections at least 4GB
large (I guess I could use a bzero section or something, so the input
file doesn't have to be 4GB, but even then the output file would have to
be 4GB, right?) I've skipped testing this. If there's a nice way to test
this without needing 4GB inputs or output files.

The subtlety here is demonstrated by this code:

struct t { operator uint64_t(); };
static_assert(std::is_same_v<int, decltype(std::declval<bool>() ? 0 : std::declval<t>())>);
static_assert(std::is_same_v<uint64_t, decltype(std::declval<bool>() ? 0 : std::declval<uint64_t>())>);

Because of this difference, the original source code was getting an int
type (truncating the actual size) and then extending it again, resulting
in bogus values (I haven't thought through this hard enough to explain
why the resulting value was 0xffff... - sign extension, possible UB, but
in any case it's the wrong answer - in this particular case I was
looking at that resulted in a size so large that we couldn't open a file
large enough to write to and ended up with a rather vague:

error: 'file_name.o': Invalid argument
The file was modifiedllvm/tools/llvm-objcopy/ELF/Object.cpp
Commit aa93603ff6a4bfd92c64e4f75d60ae9c84d5cdf5 by smeenai
[runtimes] Fix umbrella component targets

When we're building the runtimes for multiple platform targets, we
create umbrella build targets for each distribution component, but those
targets didn't have any dependencies and were just no-ops. Make the
umbrella target depend on the sub-targets for each platform to fix this,
which is consistent with the behavior of the umbrella targets for each
runtime, and also consistent with the behavior when we've only specified
the default target.
The file was modifiedllvm/runtimes/CMakeLists.txt