Changes

Changes from Git (git http://labmaster3.local/git/llvm-project.git)

Summary

  1. [Flang][test] Fix Windows buildbot. (details)
  2. A post-processing for BFI inference (details)
  3. [AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls (details)
  4. Revert "[X86FixupLEAs] Sub register usage of LEA dest should block LEA/SUB optimization" (details)
  5. Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB" (details)
  6. [lit] Attempt for fix tests failing because of 'warning: non-portable path to file' (details)
  7. Revert "Allow signposts to take advantage of deferred string substitution" (details)
  8. [MLIR] Simplify affine.if ops with trivial conditions (details)
  9. [VPlan] Add more sinking/merging tests with predicated loads/stores. (details)
  10. [clang] NRVO: Improvements and handling of more cases. (details)
Commit dbc262968f8ed4b25281d5855eda8f5a699b95c6 by llvm-project
[Flang][test] Fix Windows buildbot.

Commit 1b241b9b400bdfc5b8e0d157f0f46436677927b8 /
patch https://reviews.llvm.org/D104130 introduced an new test which
calls a UNIX shell script. Add
REQUIRES: shell
to not run it on Windows.
The file was modifiedflang/test/Semantics/modfile41.f90
Commit 0a0800c4d10c250ffb152b5f059d6f9a19ed8efe by aktoon
A post-processing for BFI inference

The current implementation for computing relative block frequencies does
not handle correctly control-flow graphs containing irreducible loops. This
results in suboptimally generated binaries, whose perf can be up to 5%
worse than optimal.

To resolve the problem, we apply a post-processing step, which iteratively
updates block frequencies based on the frequencies of their predesessors.
This corresponds to finding the stationary point of the Markov chain by
an iterative method aka "PageRank computation". The algorithm takes at
most O(|E| * IterativeBFIMaxIterations) steps but typically converges faster.

It is turned on by passing option `use-iterative-bfi-inference`
and applied only for functions containing profile data and irreducible loops.

Tested on SPEC06/17, where it is helping to get correct profile counts for one of
the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5%
for binaries containing functions with hot irreducible loops.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D103289
The file was addedllvm/test/Transforms/SampleProfile/Inputs/profile-correlation-irreducible-loops.prof
The file was modifiedllvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h
The file was addedllvm/test/Transforms/SampleProfile/profile-correlation-irreducible-loops.ll
The file was modifiedllvm/lib/Analysis/BlockFrequencyInfoImpl.cpp
Commit c27e8141b3d1265d2ab1cb951c4330b961fab9ee by Madhur.Amilkanthwar
[AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls

This patch computes max SGPRs and VGPRs used by module
in presence of indirect calls and makes that
as register requirement for functions/kernels
which makes indirect calls.

This patch also refactors code AMDGPUSubTarget.cpp
which add a "base" variants of getMaxNumSGPRs which
is used by MachineFunction and new Function version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103636
The file was modifiedllvm/test/CodeGen/AMDGPU/indirect-call.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/agpr-register-count.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/amdpal-callable.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
The file was modifiedllvm/lib/Target/AMDGPU/GCNSubtarget.h
Commit e087b4f14986f8330336b005c6ebb41f2bdaea63 by flo
Revert "[X86FixupLEAs] Sub register usage of LEA dest should block LEA/SUB optimization"

This reverts commit f35bcea1d4748889b8240defdf00cb7a71cbe070 because it
depends on 1b748faf2bae246e2fc77d88420df13c2e60f4df, which breaks
building the llvm-test-suite with -verify-machineinstrs on X86.

See 154adc0f135cff3f8a8861c335d2b88c8049d098 for more details.
The file was modifiedllvm/test/CodeGen/X86/lea-opt2.ll
The file was modifiedllvm/lib/Target/X86/X86FixupLEAs.cpp
Commit 5cd66420ccb196d2af2abfb8e27c74b0e5721718 by flo
Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"

This reverts commit 1b748faf2bae246e2fc77d88420df13c2e60f4df because it
breaks building the llvm-test-suite with -verify-machineinstrs on X86:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/

Running llc -verify-machineinstr on X86 crashes on the IR below:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

    %struct.widget = type { i32, i32, i32, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8**, i32*, i32***, i32**, i32, i32, i32, i32, %struct.baz*, %struct.wobble.1*, i32, i32, i32, i32, i32, i32, %struct.quux.2*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32***, i32***, i32****, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork*, %struct.wombat.0*, %struct.wobble*, i32, i32*, i32*, i32*, i32, i32*, i32*, i32*, i32 (%struct.widget*, %struct.eggs*)*, i32, i32, i32, i32 }
    %struct.snork = type { %struct.spam*, %struct.zot, i32 (%struct.wombat*, %struct.widget*, %struct.snork*)* }
    %struct.spam = type { i32, i32, i32, i32, i8*, i32 }
    %struct.zot = type { i32, i32, i32, i32, i32, i8*, i32* }
    %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32*, i32*)*, void (%struct.wombat*, %struct.widget*, %struct.zot*)* }
    %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] }
    %struct.quux = type { i16, i8 }
    %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] }
    %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1*, %struct.wobble.1*, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* }
    %struct.zot.3 = type { i64, i16, i16, i16 }

    define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr {
    bb:
      %tmp = load i32, i32* undef, align 4
      %tmp2 = sdiv i32 %tmp, 6
      %tmp3 = sdiv i32 undef, 6
      %tmp4 = load i32, i32* undef, align 4
      %tmp5 = icmp eq i32 %tmp4, 4
      %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2
      %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0
      %tmp8 = zext i16 undef to i32
      %tmp9 = zext i16 undef to i32
      %tmp10 = load i16, i16* undef, align 2
      %tmp11 = zext i16 %tmp10 to i32
      %tmp12 = zext i16 undef to i32
      %tmp13 = zext i16 undef to i32
      %tmp14 = zext i16 undef to i32
      %tmp15 = load i16, i16* undef, align 2
      %tmp16 = zext i16 %tmp15 to i32
      %tmp17 = zext i16 undef to i32
      %tmp18 = sub nsw i32 %tmp8, %tmp9
      %tmp19 = shl nsw i32 undef, 1
      %tmp20 = add nsw i32 %tmp19, %tmp18
      %tmp21 = sub nsw i32 %tmp11, %tmp12
      %tmp22 = shl nsw i32 undef, 1
      %tmp23 = add nsw i32 %tmp22, %tmp21
      %tmp24 = sub nsw i32 %tmp13, %tmp14
      %tmp25 = shl nsw i32 undef, 1
      %tmp26 = add nsw i32 %tmp25, %tmp24
      %tmp27 = sub nsw i32 %tmp16, %tmp17
      %tmp28 = shl nsw i32 undef, 1
      %tmp29 = add nsw i32 %tmp28, %tmp27
      %tmp30 = sub nsw i32 %tmp20, %tmp29
      %tmp31 = sub nsw i32 %tmp23, %tmp26
      %tmp32 = shl nsw i32 %tmp30, 1
      %tmp33 = add nsw i32 %tmp32, %tmp31
      store i32 %tmp33, i32* undef, align 4
      %tmp34 = mul nsw i32 %tmp31, -2
      %tmp35 = add nsw i32 %tmp34, %tmp30
      store i32 %tmp35, i32* undef, align 4
      %tmp36 = select i1 %tmp5, i32 undef, i32 undef
      br label %bb37

    bb37:                                             ; preds = %bb
      %tmp38 = load i32, i32* undef, align 4
      %tmp39 = ashr i32 %tmp38, %tmp6
      %tmp40 = load i32, i32* undef, align 4
      %tmp41 = sdiv i32 %tmp39, %tmp40
      store i32 %tmp41, i32* undef, align 4
      ret void
    }
The file was modifiedllvm/lib/CodeGen/TwoAddressInstructionPass.cpp
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.h
The file was modifiedllvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll
The file was modifiedllvm/lib/Target/X86/X86InstrInfo.cpp
The file was modifiedllvm/test/CodeGen/X86/lea-opt2.ll
The file was modifiedllvm/include/llvm/CodeGen/TargetInstrInfo.h
The file was modifiedllvm/test/CodeGen/X86/vp2intersect_multiple_pairs.ll
The file was modifiedllvm/lib/Target/X86/X86FixupLEAs.cpp
Commit 8e62797963875e0cf93fcabda9e18bc0eff5da11 by kbessonova
[lit] Attempt for fix tests failing because of 'warning: non-portable path to file'

This is an attempt to fix clang test failures due to 'nonportable-include-path'
warnings on Windows when a path to llvm-project's base directory contains some
uppercase letters (excluding a drive letter).

The issue originates from 2 problems:
* discovery.py loads site config in lower case causing all the paths
based on __file__ and requested within the config file to be in lowercase as well,
* neither os.path.abspath() nor os.path.realpath() (both used to obtain paths of
config files, sources, object directories, etc) do not return paths in the correct
case for Windows (at least consistently for all python versions).

As os.path library doesn't seem to provide any relaible way to restore
the case for paths on Windows, this patch proposes to use pathlib.resolve().
pathlib is a part of Python 3.4 while llvm lit requires Python 3.6.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D103014
The file was modifiedllvm/utils/lit/lit/discovery.py
The file was modifiedllvm/cmake/modules/AddLLVM.cmake
Commit b4583a5ad73b633c3eac5ffbad93f2405e1418ab by flo
Revert "Allow signposts to take advantage of deferred string substitution"

This reverts commit 4fc93a3a1f95ef5a0a57750fc621f2411ea445a8 because it
breaks LLDB builds on certain macOS platform & SDK combinations, e.g.
http://green.lab.llvm.org/green/job/lldb-cmake-standalone/3288/consoleFull#-195476041949ba4694-19c4-4d7e-bec5-911270d8a58c
The file was modifiedlldb/source/Utility/Timer.cpp
The file was modifiedllvm/lib/Support/Signposts.cpp
The file was modifiedllvm/include/llvm/Support/Signposts.h
The file was modifiedllvm/lib/Support/Timer.cpp
The file was modifiedlldb/include/lldb/Utility/Timer.h
Commit 466e5aba6495644eb8ba84c3e7f07bf802ff84f0 by uday
[MLIR] Simplify affine.if ops with trivial conditions

The commit simplifies affine.if ops :
The affine if operation gets removed if the condition is universally true or false and then/else block is merged with the parent block.

Signed-off-by: Shashij Gupta shashij.gupta@polymagelabs.com

Reviewed By: bondhugula, pr4tgpt

Differential Revision: https://reviews.llvm.org/D104015
The file was modifiedmlir/test/Dialect/Affine/loop-unswitch.mlir
The file was modifiedmlir/lib/Dialect/Affine/IR/AffineOps.cpp
The file was modifiedmlir/test/Dialect/Affine/simplify-affine-structures.mlir
Commit 0d9e8f5f4b68252c6caa1ef81a30777b2f5d7242 by flo
[VPlan] Add more sinking/merging tests with predicated loads/stores.
The file was modifiedllvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll
Commit 1e50c3d785f4563873ab1ce86559f2a1285b5678 by mizvekov
[clang] NRVO: Improvements and handling of more cases.

This expands NRVO propagation for more cases:

Parse analysis improvement:
* Lambdas and Blocks with dependent return type can have their variables
  marked as NRVO Candidates.

Variable instantiation improvements:
* Fixes crash when instantiating NRVO variables in Blocks.
* Functions, Lambdas, and Blocks which have auto return type have their
  variables' NRVO status propagated. For Blocks with non-auto return type,
  as a limitation, this propagation does not consider the actual return
  type.

This also implements exclusion of VarDecls which are references to
dependent types.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: Quuxplusone

Differential Revision: https://reviews.llvm.org/D99696
The file was modifiedclang/lib/Sema/Sema.cpp
The file was modifiedclang/include/clang/Sema/Sema.h
The file was modifiedclang/lib/Sema/SemaExprCXX.cpp
The file was modifiedclang/lib/Sema/SemaCoroutine.cpp
The file was modifiedclang/lib/Sema/SemaTemplateInstantiateDecl.cpp
The file was modifiedclang/lib/Sema/SemaStmt.cpp
The file was modifiedclang/test/CodeGen/nrvo-tracking.cpp