FailedChanges

Summary

  1. [clang-scan-deps] strip the --serialize-diagnostics argument This ensures that clang-scan-deps won't write out diagnostics when scanning dependencies.
  2. [GlobalISel] Defer setting HasCalls on MachineFrameInfo to selection time. We currently always set the HasCalls on MFI during translation and legalization if we're handling a call or legalizing to a libcall. However, if that call is later optimized to a tail call then we don't need the flag. The flag being set to true causes frame lowering to always save and restore FP/LR, which adds unnecessary code. This change does the same thing as SelectionDAG and ports over some code that scans instructions after selection, using TargetInstrInfo to determine if target opcodes are known calls. Code size geomean improvements on CTMark: -O0 : 0.1% -Os : 0.3% Differential Revision: https://reviews.llvm.org/D67868
  3. [Inliner] Remove incorrect early exit during switch cost computation Summary: The CallAnalyzer::visitSwitchInst has an early exit when the estimated lower bound of the switch cost will put the overall cost of the inline above the threshold. However, this code is not correctly estimating the lower bound for switches that can be transformed into bit tests, leading to unnecessary lost inlines, and also differing behavior with optimization remarks enabled. First, the early exit is controlled by whether ComputeFullInlineCost is enabled or not, and that in turn is disabled by default but enabled when enabling -pass-remarks=missed. This by itself wouldn't lead to a problem, except that as described below, the lower bound can be above the real lower bound, so we can sometimes get different inline decisions with inline remarks enabled, which is problematic. The early exit was added in along with a new switch cost model in D31085. The reason why this early exit was added is due to a concern one reviewer raised about compile time for large switches: https://reviews.llvm.org/D31085?id=94559#inline-276200 However, the code just below there calls getEstimatedNumberOfCaseClusters, which in turn immediately calls BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case does a linear scan of the cases to get the high and low values. The bit test handling in particular is guarded by whether the number of cases fits into the max bit width. There is no suggestion that anyone measured a compile time issue, it appears to be theoretical. The problem is that the reviewer's comment about the lower bound calculation is incorrect, specifically in the case of a switch that can be lowered to a bit test. This isn't followed up on the comment thread, but the author does add a FIXME to that effect above the early exit added when they subsequently revised the patch. As a result, we were incorrectly early exiting and not inlining functions with switch statements that would be lowered to bit tests in cases where we were nearing the threshold. Combined with the fact that this early exit was skipped with opt remarks enabled, this caused different inlining decisions to be made when -pass-remarks=missed is enabled to debug the missing inline. Remove the early exit for the above reasons. I also copied over an existing AArch64 inlining test to X86, and adjusted the threshold so that the bit test inline only occurs with the fix in this patch. Reviewers: davidxl Subscribers: eraman, kristof.beyls, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67716
Revision 372444 by arphaman:
[clang-scan-deps] strip the --serialize-diagnostics argument

This ensures that clang-scan-deps won't write out diagnostics when
scanning dependencies.
Change TypePath in RepositoryPath in Workspace
The file was modified/cfe/trunk/include/clang/Tooling/ArgumentsAdjusters.h (diff)clang.src/include/clang/Tooling/ArgumentsAdjusters.h
The file was modified/cfe/trunk/lib/Tooling/ArgumentsAdjusters.cpp (diff)clang.src/lib/Tooling/ArgumentsAdjusters.cpp
The file was added/cfe/trunk/test/ClangScanDeps/Inputs/strip_diag_serialize.jsonclang.src/test/ClangScanDeps/Inputs/strip_diag_serialize.json
The file was added/cfe/trunk/test/ClangScanDeps/strip_diag_serialize.cppclang.src/test/ClangScanDeps/strip_diag_serialize.cpp
The file was modified/cfe/trunk/tools/clang-scan-deps/ClangScanDeps.cpp (diff)clang.src/tools/clang-scan-deps/ClangScanDeps.cpp
Revision 372443 by aemerson:
[GlobalISel] Defer setting HasCalls on MachineFrameInfo to selection time.

We currently always set the HasCalls on MFI during translation and legalization if
we're handling a call or legalizing to a libcall. However, if that call is later
optimized to a tail call then we don't need the flag. The flag being set to true
causes frame lowering to always save and restore FP/LR, which adds unnecessary code.

This change does the same thing as SelectionDAG and ports over some code that scans
instructions after selection, using TargetInstrInfo to determine if target opcodes
are known calls.

Code size geomean improvements on CTMark:
-O0 : 0.1%
-Os : 0.3%

Differential Revision: https://reviews.llvm.org/D67868
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/CodeGen/GlobalISel/IRTranslator.cpp (diff)llvm.src/lib/CodeGen/GlobalISel/IRTranslator.cpp
The file was modified/llvm/trunk/lib/CodeGen/GlobalISel/InstructionSelect.cpp (diff)llvm.src/lib/CodeGen/GlobalISel/InstructionSelect.cpp
The file was modified/llvm/trunk/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (diff)llvm.src/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
The file was modified/llvm/trunk/test/CodeGen/AArch64/GlobalISel/call-translator.ll (diff)llvm.src/test/CodeGen/AArch64/GlobalISel/call-translator.ll
The file was added/llvm/trunk/test/CodeGen/AArch64/GlobalISel/tail-call-no-save-fp-lr.llllvm.src/test/CodeGen/AArch64/GlobalISel/tail-call-no-save-fp-lr.ll
Revision 372440 by tejohnson:
[Inliner] Remove incorrect early exit during switch cost computation

Summary:
The CallAnalyzer::visitSwitchInst has an early exit when the estimated
lower bound of the switch cost will put the overall cost of the inline
above the threshold. However, this code is not correctly estimating the
lower bound for switches that can be transformed into bit tests, leading
to unnecessary lost inlines, and also differing behavior with
optimization remarks enabled.

First, the early exit is controlled by whether ComputeFullInlineCost is
enabled or not, and that in turn is disabled by default but enabled when
enabling -pass-remarks=missed. This by itself wouldn't lead to a
problem, except that as described below, the lower bound can be above
the real lower bound, so we can sometimes get different inline decisions
with inline remarks enabled, which is problematic.

The early exit was added in along with a new switch cost model in D31085.
The reason why this early exit was added is due to a concern one reviewer
raised about compile time for large switches:
https://reviews.llvm.org/D31085?id=94559#inline-276200

However, the code just below there calls
getEstimatedNumberOfCaseClusters, which in turn immediately calls
BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case
does a linear scan of the cases to get the high and low values. The
bit test handling in particular is guarded by whether the number of
cases fits into the max bit width. There is no suggestion that anyone
measured a compile time issue, it appears to be theoretical.

The problem is that the reviewer's comment about the lower bound
calculation is incorrect, specifically in the case of a switch that can
be lowered to a bit test. This isn't followed up on the comment
thread, but the author does add a FIXME to that effect above the early
exit added when they subsequently revised the patch.

As a result, we were incorrectly early exiting and not inlining
functions with switch statements that would be lowered to bit tests in
cases where we were nearing the threshold. Combined with the fact that
this early exit was skipped with opt remarks enabled, this caused
different inlining decisions to be made when -pass-remarks=missed is
enabled to debug the missing inline.

Remove the early exit for the above reasons.

I also copied over an existing AArch64 inlining test to X86, and
adjusted the threshold so that the bit test inline only occurs with the
fix in this patch.

Reviewers: davidxl

Subscribers: eraman, kristof.beyls, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67716
Change TypePath in RepositoryPath in Workspace
The file was modified/llvm/trunk/lib/Analysis/InlineCost.cpp (diff)llvm.src/lib/Analysis/InlineCost.cpp
The file was added/llvm/trunk/test/Transforms/Inline/X86/switch.llllvm.src/test/Transforms/Inline/X86/switch.ll