1. [Flang][test] Fix Windows buildbot. (details)
  2. A post-processing for BFI inference (details)
  3. [AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls (details)
Commit dbc262968f8ed4b25281d5855eda8f5a699b95c6 by llvm-project
[Flang][test] Fix Windows buildbot.

Commit 1b241b9b400bdfc5b8e0d157f0f46436677927b8 /
patch introduced an new test which
calls a UNIX shell script. Add
to not run it on Windows.
The file was modifiedflang/test/Semantics/modfile41.f90
Commit 0a0800c4d10c250ffb152b5f059d6f9a19ed8efe by aktoon
A post-processing for BFI inference

The current implementation for computing relative block frequencies does
not handle correctly control-flow graphs containing irreducible loops. This
results in suboptimally generated binaries, whose perf can be up to 5%
worse than optimal.

To resolve the problem, we apply a post-processing step, which iteratively
updates block frequencies based on the frequencies of their predesessors.
This corresponds to finding the stationary point of the Markov chain by
an iterative method aka "PageRank computation". The algorithm takes at
most O(|E| * IterativeBFIMaxIterations) steps but typically converges faster.

It is turned on by passing option `use-iterative-bfi-inference`
and applied only for functions containing profile data and irreducible loops.

Tested on SPEC06/17, where it is helping to get correct profile counts for one of
the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5%
for binaries containing functions with hot irreducible loops.

Reviewed By: hoy, wenlei, davidxl

Differential Revision:
The file was addedllvm/test/Transforms/SampleProfile/profile-correlation-irreducible-loops.ll
The file was modifiedllvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h
The file was addedllvm/test/Transforms/SampleProfile/Inputs/
The file was modifiedllvm/lib/Analysis/BlockFrequencyInfoImpl.cpp
Commit c27e8141b3d1265d2ab1cb951c4330b961fab9ee by Madhur.Amilkanthwar
[AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls

This patch computes max SGPRs and VGPRs used by module
in presence of indirect calls and makes that
as register requirement for functions/kernels
which makes indirect calls.

This patch also refactors code AMDGPUSubTarget.cpp
which add a "base" variants of getMaxNumSGPRs which
is used by MachineFunction and new Function version.

Reviewed By: arsenm

Differential Revision:
The file was modifiedllvm/test/CodeGen/AMDGPU/amdpal-callable.ll
The file was modifiedllvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
The file was modifiedllvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
The file was modifiedllvm/test/CodeGen/AMDGPU/agpr-register-count.ll
The file was modifiedllvm/lib/Target/AMDGPU/GCNSubtarget.h
The file was modifiedllvm/test/CodeGen/AMDGPU/indirect-call.ll