Getting from linalg to spirv via gpu

Hello,

we are currently trying to get a feeling for how MLIR works and what functionality is already in place.
But we are having issues to translate a linalg.matmul all the way down to spirv with passes already integrated in MLIR.

This is the code we are trying to lower to spirv:

module {
func @matmul_linalg(%A: memref<2048x2048xf64>, %B: memref<2048x2048xf64>, %C: memref<2048x2048xf64>) {
    linalg.matmul ins(%A, %B : memref<2048x2048xf64>, memref<2048x2048xf64>)
        outs(%C: memref<2048x2048xf64>)
    return
}

func @main() {
    %A = memref.alloc() : memref<2048x2048xf64>
    %B = memref.alloc() : memref<2048x2048xf64>
    %C = memref.alloc() : memref<2048x2048xf64>
    
    %cf1 = constant 1.00000e+00 : f64
    
    linalg.fill(%A, %cf1) : memref<2048x2048xf64>, f64
    linalg.fill(%B, %cf1) : memref<2048x2048xf64>, f64
    linalg.fill(%C, %cf1) : memref<2048x2048xf64>, f64
    
    call @matmul_linalg(%A, %B, %C) : (memref<2048x2048xf64>, memref<2048x2048xf64>, memref<2048x2048xf64>) -> ()
    return
}
}

With these passes:

mlir-opt 
    -linalg-tile-to-parallel-loops="linalg-tile-sizes=8,4"
    -convert-linalg-to-parallel-loops
    -test-gpu-greedy-parallel-loop-mapping
    -convert-parallel-loops-to-gpu
    -gpu-kernel-outlining 
    -lower-affine -canonicalize
    -fold-memref-subview-ops
    -convert-std-to-spirv
    -test-spirv-entry-point-abi="workgroup-size=4,8"
    -convert-gpu-to-spirv
  test_code.mlir

We get this error:

test_code.mlir:12:10: error: unhandled allocation type
    %A = memref.alloc() : memref<2048x2048xf64>
         ^
test_code.mlir:12:10: note: see current operation: %4 = "memref.alloc"() {operand_segment_sizes = dense<0> : vector<2xi32>} : () -> memref<2048x2048xf64>
test_code.mlir:13:10: error: unhandled allocation type
    %B = memref.alloc() : memref<2048x2048xf64>
         ^
test_code.mlir:13:10: note: see current operation: %5 = "memref.alloc"() {operand_segment_sizes = dense<0> : vector<2xi32>} : () -> memref<2048x2048xf64>
loops.mlir:14:10: error: unhandled allocation type
    %C = memref.alloc() : memref<2048x2048xf64>
         ^
test_code.mlir:14:10: note: see current operation: %6 = "memref.alloc"() {operand_segment_sizes = dense<0> : vector<2xi32>} : () -> memref<2048x2048xf64>
test_code.mlir:6:5: error: failed to materialize conversion for result #0 of operation 'std.constant' that remained live after conversion
    linalg.matmul ins(%A, %B : memref<2048x2048xf64>, memref<2048x2048xf64>)
    ^
test_code.mlir:6:5: note: see current operation: %c1_0 = "std.constant"() {value = 1 : index} : () -> index
test_code.mlir:6:5: note: see existing live user here: gpu.launch_func  @matmul_linalg_kernel::@matmul_linalg_kernel blocks in (%5, %7, %c1_0) threads in (%c1_0, %c1_0, %c1_0) args(<<UNKNOWN SSA VALUE>> : memref<2048x2048xf64>, <<UNKNOWN SSA VALUE>> : memref<2048x2048xf64>, <<UNKNOWN SSA VALUE>> : memref<2048x2048xf64>)

We then tried to simplify our test-code to the one from [this]{Use MLIR/IREE for GPU CodeGen - #11 by MaheshRavishankar}(#11) posting (essentially deleting the main-function and adding a spv.target_env):

module attributes {
  spv.target_env =
    #spv.target_env<#spv.vce<v1.3, [Shader], [SPV_KHR_storage_buffer_storage_class]>,
    {max_compute_workgroup_invocations = 128 : i32,
     max_compute_workgroup_size = dense<[128, 128, 64]> : vector<3xi32>}>} {
  func @matmul(%A: memref<2048x2048xf64>, %B: memref<2048x2048xf64>, %C: memref<2048x2048xf64>) {
    linalg.matmul ins(%A, %B : memref<2048x2048xf64>, memref<2048x2048xf64>) outs(%C: memref<2048x2048xf64>)
    return
  }
}

The --legalize-std-for-spirv mentioned in the posting was replaced about [2 weeks ago]{[mlir] Move memref.subview patterns to MemRef/Transforms/ · llvm/llvm-project@0deeaac · GitHub}, so we tried to replace it with multiple combinations of --fold-memref-subview-ops, or --convert-scf-to-std and --convert-std-to-spirv.

Replacing --legalize-std-for-spirv with --fold-memref-subview-ops --convert-std-to-spirv yields this error:

spirv_code.mlir:7:5: error: failed to materialize conversion for result #0 of operation 'std.constant' that remained live after conversion
    linalg.matmul ins(%A, %B : memref<2048x2048xf64>, memref<2048x2048xf64>) outs(%C: memref<2048x2048xf64>)
    ^
spirv_code.mlir:7:5: note: see current operation: %c1 = "std.constant"() {value = 1 : index} : () -> index
spirv_code.mlir:7:5: note: see existing live user here: gpu.launch_func  @matmul_kernel::@matmul_kernel blocks in (%c256, %c512, %c1) threads in (%c8, %c4, %c1) args(<<UNKNOWN SSA VALUE>> : memref<2048x2048xf64>, <<UNKNOWN SSA VALUE>> : memref<2048x2048xf64>, <<UNKNOWN SSA VALUE>> : memref<2048x2048xf64>)

So it seems we are missing a puzzle-piece to get from linalg to gpu to spirv. Since we are absolute beginners there might be an easy fix :wink:

Thank you, dasungesagte.