Commit
fea130cec9525ede6f1b2f0c161e7f436df5f49d
by koraq[libc++][doc] Update format status.
Marked the entries solely depending on D103357 or D96664 as complete. Initial work on implementing P2216 has started.
|
 | libcxx/docs/Status/FormatPaper.csv |
 | libcxx/docs/Status/FormatIssues.csv |
Commit
2833a2edac7d21965e7a27707dba2ef4bc37a4d2
by David CARLIER[Sanitizers] netbsd build fix due to wordexp interception.
|
 | compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.cpp |
 | compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_netbsd.h |
Commit
fd52b4357a6eb718c2c7f9cfe1d8f55ef195edb1
by ezhulenev[mlir] Async: check awaited operand error state after sync await
Previously only await inside the async function (coroutine after lowering to async runtime) would check the error state
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D109229
|
 | mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp |
 | mlir/test/Dialect/Async/async-to-async-runtime.mlir |
 | mlir/test/Dialect/Async/async-runtime-policy-based-ref-counting.mlir |
 | mlir/lib/Dialect/Async/Transforms/AsyncRuntimeRefCounting.cpp |
 | mlir/test/Conversion/AsyncToLLVM/convert-to-llvm.mlir |
Commit
da965a77d566b9295a5928ca4c989650131bfc0b
by llvm-dev[X86][SLM] Fix MUL uops, latency and throughput
These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with).
Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.
|
 | llvm/lib/Target/X86/X86ScheduleSLM.td |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-x86_64.s |
Commit
c6371020a801f1da327ec3dcdfa0818fbd6f657a
by llvm-dev[X86][SLM] RMW instructions don't require an extra uop
For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM:
"The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and store instructions go through addresses generation phase in program order to avoid on-the-fly memory ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions."
Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.
|
 | llvm/lib/Target/X86/X86ScheduleSLM.td |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-x86_64.s |
Commit
994da657076900f5ad7fe593c3b5e5f89ab3d53d
by llvm-dev[X86][SLM] WriteVecIMul instructions only take 1uop
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.
I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.
But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
|
 | llvm/test/tools/llvm-mca/X86/SLM/resources-ssse3.s |
 | llvm/lib/Target/X86/X86ScheduleSLM.td |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-sse2.s |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-sse41.s |
Commit
73e1ba62158992e273fd4875cd11e07f64c81844
by Dávid Bolvanský[NFC] Added tests for PR51565
|
 | llvm/test/Transforms/InstCombine/icmp-rotate.ll |
Commit
ac51d69208719a5d0b8609f46c793240ed9ff6bd
by llvm-devRevert rG994da657076900f5ad7fe593c3b5e5f89ab3d53d "[X86][SLM] WriteVecIMul instructions only take 1uop"
This changed some codegen tests that I forgot about in my rebase, I'll recommit shortly with a fix.
|
 | llvm/lib/Target/X86/X86ScheduleSLM.td |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-sse41.s |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-ssse3.s |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-sse2.s |
Commit
2005ae15a66dd5d8a9845f3652192a70bd36d921
by llvm-dev[X86][SLM] WriteVecIMul instructions only take 1uop (REAPPLIED)
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.
I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.
But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
|
 | llvm/test/CodeGen/X86/slow-pmulld.ll |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-sse2.s |
 | llvm/lib/Target/X86/X86ScheduleSLM.td |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-ssse3.s |
 | llvm/test/tools/llvm-mca/X86/SLM/resources-sse41.s |
Commit
cb8d96e72f4c41b86738a5347a5c15e98037f358
by llvm-devFix Wdocumentation unknown parameter warning. NFCI.
|
 | llvm/lib/ProfileData/SampleProfReader.cpp |