1. [ARM] Increase MVE gather/scatter cost by MVECostFactor. (details)
  2. [llvm/Object] - Make dyn_cast<XCOFFObjectFile> work as it should. (details)
  3. [mlir][PDL] Add a PDL Interpreter Dialect (details)
  4. [Scheduling] Implement a new way to cluster loads/stores (details)
  5. [DWARFYAML] Make the unit_length and header_length fields optional. (details)
Commit 677c1590c03474c8238fbc21b9c0dae9b5e5f4d2 by
[ARM] Increase MVE gather/scatter cost by MVECostFactor.

MVE Gather scatter codegeneration is looking a lot better than it used
to, but still has some issues. The instructions we currently model as 1
cycle per element, which is a bit low for some cases. Increasing the
cost by the MVECostFactor brings them in-line with our other instruction
costs. This will have the effect of only generating then when the extra
benefit is more likely to overcome some of the issues. Notably in
running out of registers and vectorizing loops that could otherwise be
SLP vectorized.

In the short-term whilst we look at other ways of dealing with those
more directly, we can increase the costs of gathers to make them more
likely to be beneficial when created.

Differential Revision:
The file was modifiedllvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (diff)
The file was modifiedllvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll (diff)
The file was modifiedllvm/test/Analysis/CostModel/ARM/mve-gather-scatter-cost.ll (diff)
Commit 92c527e5a2b49fb1213ceda97738d4caf414666a by grimar
[llvm/Object] - Make dyn_cast<XCOFFObjectFile> work as it should.

Currently, `dyn_cast<XCOFFObjectFile>` always does cast and returns a pointer,
even when we pass `ELF`/`Wasm`/`Mach-O` or `COFF` instead of `XCOFF`.

It happens because `XCOFFObjectFile` class does not implement `classof`.
I've fixed it and added a unit test.

Differential revision:
The file was modifiedllvm/include/llvm/Object/XCOFFObjectFile.h (diff)
The file was modifiedllvm/unittests/Object/XCOFFObjectFileTest.cpp (diff)
Commit d289a97f91443177b605926668512479c2cee37b by riddleriver
[mlir][PDL] Add a PDL Interpreter Dialect

The PDL Interpreter dialect provides a lower level abstraction compared to the PDL dialect, and is targeted towards low level optimization and interpreter code generation. The dialect operations encapsulates low-level pattern match and rewrite "primitives", such as navigating the IR (Operation::getOperand), creating new operations (OpBuilder::create), etc. Many of the operations within this dialect also fuse branching control flow with some form of a predicate comparison operation. This type of fusion reduces the amount of work that an interpreter must do when executing.

An example of this representation is shown below:

// The following high level PDL pattern:
pdl.pattern : benefit(1) {
  %resultType = pdl.type
  %inputOperand = pdl.input
  %root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
  pdl.rewrite %root {
    pdl.replace %root with (%inputOperand)

// May be represented in the interpreter dialect as follows:
module {
  func @matcher(%arg0: !pdl.operation) {
    pdl_interp.check_operation_name of %arg0 is "foo.op" -> ^bb2, ^bb1
    pdl_interp.check_operand_count of %arg0 is 1 -> ^bb3, ^bb1
    pdl_interp.check_result_count of %arg0 is 1 -> ^bb4, ^bb1
    %0 = pdl_interp.get_operand 0 of %arg0
    pdl_interp.is_not_null %0 : !pdl.value -> ^bb5, ^bb1
    %1 = pdl_interp.get_result 0 of %arg0
    pdl_interp.is_not_null %1 : !pdl.value -> ^bb6, ^bb1
    pdl_interp.record_match @rewriters::@rewriter(%0, %arg0 : !pdl.value, !pdl.operation) : benefit(1), loc([%arg0]), root("foo.op") -> ^bb1
  module @rewriters {
    func @rewriter(%arg0: !pdl.value, %arg1: !pdl.operation) {
      pdl_interp.replace %arg1 with(%arg0)

Differential Revision:
The file was modifiedmlir/lib/IR/Builders.cpp (diff)
The file was modifiedmlir/include/mlir/IR/Attributes.h (diff)
The file was addedmlir/include/mlir/Dialect/PDLInterp/IR/PDLInterp.h
The file was modifiedmlir/tools/mlir-tblgen/OpFormatGen.cpp (diff)
The file was addedmlir/include/mlir/Dialect/PDLInterp/IR/
The file was modifiedmlir/lib/Dialect/CMakeLists.txt (diff)
The file was modifiedmlir/include/mlir/Dialect/CMakeLists.txt (diff)
The file was modifiedmlir/include/mlir/IR/Builders.h (diff)
The file was addedmlir/lib/Dialect/PDLInterp/IR/CMakeLists.txt
The file was modifiedmlir/test/Dialect/PDL/ops.mlir (diff)
The file was modifiedmlir/include/mlir/Dialect/PDL/IR/ (diff)
The file was modifiedmlir/lib/Parser/Parser.h (diff)
The file was addedmlir/test/Dialect/PDLInterp/ops.mlir
The file was addedmlir/lib/Dialect/PDLInterp/IR/PDLInterp.cpp
The file was modifiedmlir/lib/Parser/AttributeParser.cpp (diff)
The file was modifiedmlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (diff)
The file was modifiedmlir/include/mlir/IR/OpImplementation.h (diff)
The file was modifiedmlir/include/mlir/InitAllDialects.h (diff)
The file was modifiedmlir/test/Dialect/PDL/invalid.mlir (diff)
The file was modifiedmlir/lib/Dialect/PDL/IR/PDL.cpp (diff)
The file was modifiedmlir/lib/Parser/Parser.cpp (diff)
The file was addedmlir/include/mlir/Dialect/PDLInterp/IR/CMakeLists.txt
The file was addedmlir/include/mlir/Dialect/PDLInterp/CMakeLists.txt
The file was addedmlir/lib/Dialect/PDLInterp/CMakeLists.txt
The file was modifiedmlir/include/mlir/Dialect/PDL/IR/ (diff)
Commit ebf3b188c6edcce7e90ddcacbe7c51c90d95b0ac by qshanz
[Scheduling] Implement a new way to cluster loads/stores

Before calling target hook to determine if two loads/stores are clusterable,
we put them into different groups to avoid fake cluster due to dependency.
For now, we are putting the loads/stores into the same group if they have
the same predecessor. We assume that, if two loads/stores have the same
predecessor, it is likely that, they didn't have dependency for each other.

However, one SUnit might have several predecessors and for now, we just
pick up the first predecessor that has non-data/non-artificial dependency,
which is too arbitrary. And we are struggling to fix it.

So, I am proposing some better implementation.
1. Collect all the loads/stores that has memory info first to reduce the complexity.
2. Sort these loads/stores so that we can stop the seeking as early as possible.
3. For each load/store, seeking for the first non-dependency instruction with the
   sorted order, and check if they can cluster or not.

Reviewed By: Jay Foad

Differential Revision:
The file was modifiedllvm/test/CodeGen/AMDGPU/stack-realign.ll (diff)
The file was modifiedllvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll (diff)
The file was modifiedllvm/test/CodeGen/AArch64/aarch64-stp-cluster.ll (diff)
The file was modifiedllvm/lib/CodeGen/MachineScheduler.cpp (diff)
The file was modifiedllvm/include/llvm/CodeGen/ScheduleDAGInstrs.h (diff)
The file was modifiedllvm/test/CodeGen/AMDGPU/max.i16.ll (diff)
Commit 8daa3264a3329ad34a0b210afdd8699f27d66db2 by Xing
[DWARFYAML] Make the unit_length and header_length fields optional.

This patch makes the unit_length and header_length fields of line tables
optional. yaml2obj is able to infer them for us.

Reviewed By: jhenderson

Differential Revision:
The file was modifiedllvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp (diff)
The file was modifiedllvm/include/llvm/ObjectYAML/DWARFYAML.h (diff)
The file was modifiedllvm/tools/obj2yaml/dwarf2yaml.cpp (diff)
The file was modifiedllvm/lib/ObjectYAML/DWARFEmitter.cpp (diff)
The file was modifiedllvm/test/tools/yaml2obj/ELF/DWARF/debug-line.yaml (diff)
The file was modifiedllvm/lib/ObjectYAML/DWARFYAML.cpp (diff)