1. [AArch64][GlobalISel] Refactor + improve CMN, ADDS, and ADD emit functions (details)
  2. [LICM] Make Loop ICM profile aware again (details)
  3. SVML support for log10, sqrt (details)
Commit ffe9986de4297fdeddcd0b0b9bac2a28c45f661b by Jessica Paquette
[AArch64][GlobalISel] Refactor + improve CMN, ADDS, and ADD emit functions

These functions were extremely similar:

- `emitADD`
- `emitADDS`
- `emitCMN`

Refactor them a little, introducing a more generic `emitInstr` function to
do most of the work.

Also add support for the immediate + shifted register addressing modes in each
of them.

Update select-uaddo.mir to show that selecing ADDS now supports folding
immediates + shifts. (I don't think this can impact CMN, because the CMN checks
require a G_SUB with a non-constant on the RHS.)

This is around a 0.02% code size improvement on CTMark at -O3.

Differential Revision:
The file was modifiedllvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (diff)
The file was modifiedllvm/test/CodeGen/AArch64/GlobalISel/select-uaddo.mir (diff)
Commit 2c391a5a14aeb34e970aba85c5aa540656fe47ca by aktoon
[LICM] Make Loop ICM profile aware again

D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults.

This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block.

Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates.

ninja check

Reviewed By: asbirlea

Differential Revision:
The file was modifiedllvm/include/llvm/Transforms/Utils/LoopUtils.h (diff)
The file was modifiedllvm/test/Transforms/LICM/sink.ll (diff)
The file was modifiedllvm/lib/Passes/PassBuilder.cpp (diff)
The file was modifiedllvm/lib/Transforms/Scalar/LICM.cpp (diff)
The file was addedllvm/test/Transforms/LICM/no-hoist-prof.ll
The file was addedllvm/test/Transforms/LICM/Inputs/
Commit 056534dc2b15ed1d276bead76f054cc7ac9d2bf1 by aktoon
SVML support for log10, sqrt

Although LLVM supports vectorization of loops containing log10/sqrt, it did not support using SVML implementation of it. Added support so that when clang is invoked with -fveclib=SVML now an appropriate SVML library log2 implementation will be invoked.

Follow up on:

Added unit tests to svml-calls.ll, svml-calls-finite.ll. Can be run with llvm-lint.
Created a simple c++ file that tests log10/sqrt, and used clang+ to build it, and output final assembly.

Reviewed By: craig.topper

Differential Revision:
The file was modifiedllvm/test/Transforms/Util/add-TLI-mappings.ll (diff)
The file was modifiedllvm/include/llvm/Analysis/VecFuncs.def (diff)
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll (diff)
The file was modifiedllvm/test/Transforms/LoopVectorize/X86/svml-calls.ll (diff)