Changes

Summary

  1. [Sanitizers][PGO] missing return statement (details)
  2. [X86][Atom] Add missing UOps override to AtomWriteResPair multiclass (details)
  3. [X86][Atom] MUL/DIV instructions require both ports, not either. (details)
  4. [libc++][doc] Update format status. (details)
  5. [Sanitizers] netbsd build fix due to wordexp interception. (details)
  6. [mlir] Async: check awaited operand error state after sync await (details)
  7. [X86][SLM] Fix MUL uops, latency and throughput (details)
  8. [X86][SLM] RMW instructions don't require an extra uop (details)
  9. [X86][SLM] WriteVecIMul instructions only take 1uop (details)
  10. [NFC] Added tests for PR51565 (details)
  11. Revert rG994da657076900f5ad7fe593c3b5e5f89ab3d53d "[X86][SLM] WriteVecIMul instructions only take 1uop" (details)
Commit 08c3cdb8b8423c6bab14670bca5fd6b19a1fdfce by David CARLIER
[Sanitizers][PGO] missing return statement
The file was modifiedcompiler-rt/lib/profile/InstrProfilingUtil.c
Commit 0d0f39b0f3ee3b7d41a6caca6896a90d1a316437 by llvm-dev
[X86][Atom] Add missing UOps override to AtomWriteResPair multiclass

Make it easier to describe microcoded instructions.
The file was modifiedllvm/lib/Target/X86/X86ScheduleAtom.td
Commit 7d062d2c478b713ba7ac31729259d747fdd7d0b3 by llvm-dev
[X86][Atom] MUL/DIV instructions require both ports, not either.

Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM.
The file was modifiedllvm/test/tools/llvm-mca/X86/Atom/resources-x86_64.s
The file was modifiedllvm/lib/Target/X86/X86ScheduleAtom.td
Commit fea130cec9525ede6f1b2f0c161e7f436df5f49d by koraq
[libc++][doc] Update format status.

Marked the entries solely depending on D103357 or D96664 as complete.
Initial work on implementing P2216 has started.
The file was modifiedlibcxx/docs/Status/FormatPaper.csv
The file was modifiedlibcxx/docs/Status/FormatIssues.csv
Commit 2833a2edac7d21965e7a27707dba2ef4bc37a4d2 by David CARLIER
[Sanitizers] netbsd build fix due to wordexp interception.
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.cpp
The file was modifiedcompiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.h
Commit fd52b4357a6eb718c2c7f9cfe1d8f55ef195edb1 by ezhulenev
[mlir] Async: check awaited operand error state after sync await

Previously only await inside the async function (coroutine after lowering to async runtime) would check the error state

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D109229
The file was modifiedmlir/test/Dialect/Async/async-to-async-runtime.mlir
The file was modifiedmlir/test/Dialect/Async/async-runtime-policy-based-ref-counting.mlir
The file was modifiedmlir/lib/Dialect/Async/Transforms/AsyncRuntimeRefCounting.cpp
The file was modifiedmlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
The file was modifiedmlir/test/Conversion/AsyncToLLVM/convert-to-llvm.mlir
Commit da965a77d566b9295a5928ca4c989650131bfc0b by llvm-dev
[X86][SLM] Fix MUL uops, latency and throughput

These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with).

Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.
The file was modifiedllvm/lib/Target/X86/X86ScheduleSLM.td
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-x86_64.s
Commit c6371020a801f1da327ec3dcdfa0818fbd6f657a by llvm-dev
[X86][SLM] RMW instructions don't require an extra uop

For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM:

"The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and
store instructions go through addresses generation phase in program order to avoid on-the-fly memory
ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions."

Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-x86_64.s
The file was modifiedllvm/lib/Target/X86/X86ScheduleSLM.td
Commit 994da657076900f5ad7fe593c3b5e5f89ab3d53d by llvm-dev
[X86][SLM] WriteVecIMul instructions only take 1uop

The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.

I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.

But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
The file was modifiedllvm/lib/Target/X86/X86ScheduleSLM.td
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-sse41.s
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-sse2.s
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-ssse3.s
Commit 73e1ba62158992e273fd4875cd11e07f64c81844 by Dávid Bolvanský
[NFC] Added tests for PR51565
The file was addedllvm/test/Transforms/InstCombine/icmp-rotate.ll
Commit ac51d69208719a5d0b8609f46c793240ed9ff6bd by llvm-dev
Revert rG994da657076900f5ad7fe593c3b5e5f89ab3d53d "[X86][SLM] WriteVecIMul instructions only take 1uop"

This changed some codegen tests that I forgot about in my rebase, I'll recommit shortly with a fix.
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-sse2.s
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-sse41.s
The file was modifiedllvm/test/tools/llvm-mca/X86/SLM/resources-ssse3.s
The file was modifiedllvm/lib/Target/X86/X86ScheduleSLM.td