Generate SPIRV binary from MLIR dialect kernels to run it on OCL runtime

Hello everyone,

I am recently working on a project to generate SPIRV binary from MLIR dialect kernels and make the SPIRV binary run on OCL runtime. OCL runtime provides a program builder for SPIRV binary called clCreateProgramWithIL. I am using the clCreateProgramWithIL() function to load the SPIRV binary and hoping to run it. But it failed. It seems that the clCreateProgramWithIL() function cannot recognize the format of SPIRV binary generated by mlir-opt lowering and mlir-vulkan-runner.

I also tried a different way to generate the SPIRV binary, not from MLIR dialects, but from .cl kernels. The generated SPIRV binary can successfully run on OCL runtime using the clCreateProgramWithIL() function.

When I compare the two SPIRV binary files, they look different. The MLIR one looks like this:
\03\02#\07\00\00\01\00\16\00\00\003\00\00\00\00\00\00\00\11\00\02\00\01\00\00\00\0A\00\0B\00SPV_KHR_storage_buffer_storage_class\00\00\00\00\0E\00\03\00\00\00\00\00\01\00\00\00\0F\00\07\00\05\00\00\00\16\00\00\00kernel_addi\00\04\00\00\00\10\00\06\00\16\00\00\00\11\00\00\00\01\00\00\00\01\00\00\00\01\00\00\00\05\00\09\00\04\00\00\00__builtin_var_WorkgroupId__\00\05\00\07\00\09\00\00\00kernel_addi_arg_0\00\00\00\05\00\07\00\0E\00\00\00kernel_addi_arg_1\00\00\00\05\00\07\00\13\00\00\00kernel_addi_arg_2\00\00\00\05\00\05\00\16\00\00\00kernel_addi\00G\00\04\00\04\00\00\00\0B\00\00\00\1A\00\00\00G\00\04\00\07\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\06\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\06\00\00\00\02\00\00\00G\00\04\00\09\00\00\00!\00\00\00\00\00\00\00G\00\04\00\09\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0C\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\0B\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\0B\00\00\00\02\00\00\00G\00\04\00\0E\00\00\00!\00\00\00\01\00\00\00G\00\04\00\0E\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\11\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\10\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\10\00\00\00\02\00\00\00G\00\04\00\13\00\00\00!\00\00\00\02\00\00\00G\00\04\00\13\00\00\00\22\00\00\00\00\00\00\00\15\00\04\00\03\00\00\00 \00\00\00\00\00\00\00\17\00\04\00\02\00\00\00\03\00\00\00\03\00\00\00 \00\04\00\01\00\00\00\01\00\00\00\02\00\00\00;\00\04\00\01\00\00\00\04\00\00\00\01\00\00\00+\00\04\00\03\00\00\00\08\00\00\00\08\00\00\00\1C\00\04\00\07\00\00\00\03\00\00\00\08\00\00\00\1E\00\03\00\06\00\00\00\07\00\00\00 \00\04\00\05\00\00\00\0C\00\00\00\06\00\00\00;\00\04\00\05\00\00\00\09\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\0D\00\00\00@\00\00\00\1C\00\04\00\0C\00\00\00\03\00\00\00\0D\00\00\00\1E\00\03\00\0B\00\00\00\0C\00\00\00 \00\04\00\0A\00\00\00\0C\00\00\00\0B\00\00\00;\00\04\00\0A\00\00\00\0E\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\12\00\00\00\00\02\00\00\1C\00\04\00\11\00\00\00\03\00\00\00\12\00\00\00\1E\00\03\00\10\00\00\00\11\00\00\00 \00\04\00\0F\00\00\00\0C\00\00\00\10\00\00\00;\00\04\00\0F\00\00\00\13\00\00\00\0C\00\00\00\13\00\02\00\15\00\00\00!\00\03\00\14\00\00\00\15\00\00\00+\00\04\00\03\00\00\00\1E\00\00\00\00\00\00\00+\00\04\00\03\00\00\00\1F\00\00\00\01\00\00\00 \00\04\00\22\00\00\00\0C\00\00\00\03\00\00\006\00\05\00\15\00\00\00\16\00\00\00\00\00\00\00\14\00\00\00\F8\00\02\00\17\00\00\00=\00\04\00\02\00\00\00\18\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\19\00\00\00\18\00\00\00\00\00\00\00=\00\04\00\02\00\00\00\1A\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1B\00\00\00\1A\00\00\00\01\00\00\00=\00\04\00\02\00\00\00\1C\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1D\00\00\00\1C\00\00\00\02\00\00\00\84\00\05\00\03\00\00\00 \00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00!\00\00\00\1E\00\00\00 \00\00\00A\00\06\00\22\00\00\00#\00\00\00\09\00\00\00\1E\00\00\00!\00\00\00=\00\04\00\03\00\00\00$\00\00\00#\00\00\00\84\00\05\00\03\00\00\00%\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00&\00\00\00\1E\00\00\00%\00\00\00\84\00\05\00\03\00\00\00'\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00(\00\00\00&\00\00\00'\00\00\00A\00\06\00\22\00\00\00)\00\00\00\0E\00\00\00\1E\00\00\00(\00\00\00=\00\04\00\03\00\00\00*\00\00\00)\00\00\00\80\00\05\00\03\00\00\00+\00\00\00$\00\00\00*\00\00\00\84\00\05\00\03\00\00\00,\00\00\00\0D\00\00\00\1D\00\00\00\80\00\05\00\03\00\00\00-\00\00\00\1E\00\00\00,\00\00\00\84\00\05\00\03\00\00\00.\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00/\00\00\00-\00\00\00.\00\00\00\84\00\05\00\03\00\00\000\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\001\00\00\00/\00\00\000\00\00\00A\00\06\00\22\00\00\002\00\00\00\13\00\00\00\1E\00\00\001\00\00\00>\00\03\002\00\00\00+\00\00\00\FD\00\01\008\00\01\00

While the .cl one looks like this:
^C^B#^G^@^@^A^@^N^@^F^@4^@^@^@^@^@^@^@^Q^@^B^@^D^@^@^@^Q^@^B^@^E^@^@^@^Q^@^B^@^F^@^@^@^Q^@^B^@^K^@^@^@^K^@^E^@^A^@^@^@OpenCL.std^@^@^N^@^C^@^B^@^@^@^B^@^@^@^O^@^F^@^F^@^@^@^K^@^@^@gemmN^@^@^@^E^@^@^@^G^@^N^@3^@^@^@kernel_arg_type.gemmN.float*,float*,int,float*,^@^C^@^C^@^C^@^@^@p~N^A^@^E^@^K^@^E^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^E^@^C^@^L^@^@^@A^@^@^@^E^@^C^@^M^@^@^@B^@^@^@^E^@^C^@^N^@^@^@N^@^@^@^E^@^C^@^O^@^@^@C^@^@^@^E^@^D^@^P^@^@^@entry^@^@^@^E^@^G^@^Q^@^@^@for.cond.cleanup^@^@^@^@^E^@^E^@^R^@^@^@for.body^@^@^@^@^E^@^D^@^T^@^@^@call^@^@^@^@^E^@^D^@^U^@^@^@conv^@^@^@^@^E^@^D^@^W^@^@^@call1^@^@^@^E^@^D^@^X^@^@^@conv2^@^@^@^E^@^D^@^[^@^@^@cmp26^@^@^@^E^@^C^@^^@^@^@mul^@^E^@^E^@^_^@^@^@acc.0.lcssa^@^E^@^D^@ ^@^@^@add10^@^@^@^E^@^E^@!^@^@^@idxprom11^@^@^@^E^@^E^@"^@^@^@arrayidx12^@^@^E^@^C^@#^@^@^@inc^@^E^@^D^@^@^@^@k.028^@^@^@^E^@^D^@%^@^@^@acc.027^@^E^@^C^@&^@^@^@add^@^E^@^D^@'^@^@^@idxprom^@^E^@^E^@(^@^@^@arrayidx^@^@^@^@^E^@^D^@*^@^@^@mul4^@^@^@^@^E^@^D^@+^@^@^@add5^@^@^@^@^E^@^E^@,^@^@^@idxprom6^@^@^@^@^E^@^E^@-^@^@^@arrayidx7^@^@^@^E^@^C^@2^@^@^@cmp^@G^@^D^@^E^@^@^@^K^@^@^@^\^@^@^@G^@^C^@^E^@^@^@^V^@^@^@G^@^D^@^L^@^@^@&^@^@^@^E^@^@^@G^@^D^@^M^@^@^@&^@^@^@^E^@^@^@G^@^D^@^O^@^@^@&^@^@^@^E^@^@^@G^@^D^@^L^@^@^@&^@^@^@^F^@^@^@G^@^D^@^M^@^@^@&^@^@^@^F^@^@^@G^@^M^@^E^@^@^@)^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^A^@^@^@^U^@^D^@^B^@^@^@@^@^@^@^@^@^@^@^U^@^D^@ ^@^@^@ ^@^@^@^@^@^@^@+^@^D^@ ^@^@^@^Y^@^@^@^@^@^@^@+^@^D^@ ^@^@^@0^@^@^@^A^@^@^@^W^@^D^@^C^@^@^@^B^@^@^@^C^@^@^@ ^@^D^@^D^@^@^@^A^@^@^@^C^@^@^@^S^@^B^@^F^@^@^@^V^@^C^@^G^@^@^@ ^@^@^@ ^@^D^@^H^@^@^@^E^@^@^@^G^@^@^@!^@^G^@ ^@^@^@^F^@^@^@^H^@^@^@^H^@^@^@ ^@^@^@^H^@^@^@^T^@^B^@^Z^@^@^@;^@^D^@^D^@^@^@^E^@^@^@^A^@^@^@+^@^D^@^G^@^@^@^]^@^@^@^@^@^@^@6^@^E^@^F^@^@^@^K^@^@^@^@^@^@^@ ^@^@^@7^@^C^@^H^@^@^@^L^@^@^@7^@^C^@^H^@^@^@^M^@^@^@7^@^C^@ ^@^@^@^N^@^@^@7^@^C^@^H^@^@^@^O^@^@^@?^@^B^@^P^@^@^@=^@^F^@^C^@^@^@^S^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^T^@^@^@^S^@^@^@^@^@^@^@q^@^D^@ ^@^@^@^U^@^@^@^T^@^@^@=^@^F^@^C^@^@^@^V^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^W^@^@^@^V^@^@^@^A^@^@^@q^@^D^@ ^@^@^@^X^@^@^@^W^@^@^@?^@^E^@^Z^@^@^@^[^@^@^@^N^@^@^@^Y^@^@^@~D^@^E^@ ^@^@^@^\^@^@^@^U^@^@^@^N^@^@^@?^@^D^@^[^@^@^@^R^@^@^@^Q^@^@^@?^@^B^@^Q^@^@^@?^@^G^@^G^@^@^@^_^@^@^@^]^@^@^@^P^@^@^@^^^@^@^@^R^@^@^@~@^@^E^@ ^@^@^@ ^@^@^@^\^@^@^@^X^@^@^@r^@^D^@^B^@^@^@!^@^@^@ ^@^@^@F^@^E^@^H^@^@^@"^@^@^@^O^@^@^@!^@^@^@>^@^E^@"^@^@^@^_^@^@^@^B^@^@^@^D^@^@^@?^@^A^@?^@^B^@^R^@^@^@?^@^G^@ ^@^@^@^@^@^@#^@^@^@^R^@^@^@^Y^@^@^@^P^@^@^@?^@^G^@^G^@^@^@%^@^@^@^^^@^@^@^R^@^@^@^]^@^@^@^P^@^@^@~@^@^E^@ ^@^@^@&^@^@^@^@^@^@^\^@^@^@r^@^D^@^B^@^@^@'^@^@^@&^@^@^@F^@^E^@^H^@^@^@(^@^@^@^L^@^@^@'^@^@^@=^@^F^@^G^@^@^@)^@^@^@(^@^@^@^B^@^@^@^D^@^@^@~D^@^E^@ ^@^@^@*^@^@^@^@^@^@^N^@^@^@~@^@^E^@ ^@^@^@+^@^@^@^@^@^@^X^@^@^@r^@^D^@^B^@^@^@,^@^@^@+^@^@^@F^@^E^@^H^@^@^@-^@^@^@^M^@^@^@,^@^@^@=^@^F^@^G^@^@^@.^@^@^@-^@^@^@^B^@^@^@^D^@^@^@^L^@^H^@^G^@^@^@^^^@^@^@^A^@^@^@^@^@^@)^@^@^@.^@^@^@%^@^@^@~@^@^E^@ ^@^@^@#^@^@^@$^@^@^@0^@^@^@?^@^E^@^Z^@^@^@2^@^@^@#^@^@^@^N^@^@^@?^@^D^@2^@^@^@^R^@^@^@^Q^@^@^@8^@^A^@

I am looking for your help or any suggestions on this. The project needs me to generate the SPIRV binary from MLIR dialect kernels and make the generated SPIRV binary run on OCL runtime. I am not sure if the way I generate SPIRV binary from MLIR dialect is correct or not.

Thank you very much for your help.

Here is what I did to generate the two SPIRV binary files in the two cases.

(1) Generate the SPIRV binary from MLIR dialect

I used addi.mlir from MLIR vulkan examples

$ cat addi.mlir
// RUN: mlir-vulkan-runner %s --shared-libs=%vulkan_wrapper_library_dir/libvulkan-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void | FileCheck %s

// CHECK-COUNT-64: [3, 3, 3, 3, 3, 3, 3, 3]
module attributes {
  gpu.container_module,
  spv.target_env = #spv.target_env<
	#spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>
} {
  gpu.module @kernels {
	gpu.func @kernel_addi(%arg0 : memref<8xi32>, %arg1 : memref<8x8xi32>, %arg2 : memref<8x8x8xi32>)
  	kernel attributes { spv.entry_point_abi = {local_size = dense<[1, 1, 1]>: vector<3xi32>}} {
  	%x = "gpu.block_id"() {dimension = "x"} : () -> index
  	%y = "gpu.block_id"() {dimension = "y"} : () -> index
  	%z = "gpu.block_id"() {dimension = "z"} : () -> index
  	%0 = load %arg0[%x] : memref<8xi32>
  	%1 = load %arg1[%y, %x] : memref<8x8xi32>
  	%2 = addi %0, %1 : i32
  	store %2, %arg2[%z, %y, %x] : memref<8x8x8xi32>
  	gpu.return
	}
  }
 
  func @main() {
	%arg0 = alloc() : memref<8xi32>
	%arg1 = alloc() : memref<8x8xi32>
	%arg2 = alloc() : memref<8x8x8xi32>
	%value0 = constant 0 : i32
	%value1 = constant 1 : i32
	%value2 = constant 2 : i32
	%arg3 = memref_cast %arg0 : memref<8xi32> to memref<?xi32>
	%arg4 = memref_cast %arg1 : memref<8x8xi32> to memref<?x?xi32>
	%arg5 = memref_cast %arg2 : memref<8x8x8xi32> to memref<?x?x?xi32>
	call @fillResource1DInt(%arg3, %value1) : (memref<?xi32>, i32) -> ()
	call @fillResource2DInt(%arg4, %value2) : (memref<?x?xi32>, i32) -> ()
	call @fillResource3DInt(%arg5, %value0) : (memref<?x?x?xi32>, i32) -> ()
 
	%cst1 = constant 1 : index
	%cst8 = constant 8 : index
	gpu.launch_func @kernels::@kernel_addi
    	blocks in (%cst8, %cst8, %cst8) threads in (%cst1, %cst1, %cst1)
    	args(%arg0 : memref<8xi32>, %arg1 : memref<8x8xi32>, %arg2 : memref<8x8x8xi32>)
	%arg6 = memref_cast %arg5 : memref<?x?x?xi32> to memref<*xi32>
	call @print_memref_i32(%arg6) : (memref<*xi32>) -> ()
	return
  }
  func @fillResource1DInt(%0 : memref<?xi32>, %1 : i32)
  func @fillResource2DInt(%0 : memref<?x?xi32>, %1 : i32)
  func @fillResource3DInt(%0 : memref<?x?x?xi32>, %1 : i32)
  func @print_memref_i32(%ptr : memref<*xi32>)
}

$ mlir-opt -convert-linalg-to-parallel-loops -test-gpu-greedy-parallel-loop-mapping -convert-parallel-loops-to-gpu -gpu-kernel-outlining -canonicalize -legalize-std-for-spirv -test-spirv-entry-point-abi addi.mlir &> addi_before_spirv.mlir

$ mlir-vulkan-runner addi_before_spirv.mlir --shared-libs=…/…/lib/libvulkan-runtime-wrappers.dylib,…/…/lib/libmlir_runner_utils.dylib --entry-point-result=void -print-ir-after-all

...
 
// *** IR Dump After ConvertVulkanLaunchFuncToVulkanCalls ***
 
 
module attributes {gpu.container_module, llvm.data_layout = "", spv.target_env = #spv.target_env<#spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>} {
  llvm.mlir.global internal constant @kernel_addi_spv_entry_point_name("kernel_addi\00")
  llvm.mlir.global internal constant @SPIRV_BIN("\03\02#\07\00\00\01\00\16\00\00\003\00\00\00\00\00\00\00\11\00\02\00\01\00\00\00\0A\00\0B\00SPV_KHR_storage_buffer_storage_class\00\00\00\00\0E\00\03\00\00\00\00\00\01\00\00\00\0F\00\07\00\05\00\00\00\16\00\00\00kernel_addi\00\04\00\00\00\10\00\06\00\16\00\00\00\11\00\00\00\01\00\00\00\01\00\00\00\01\00\00\00\05\00\09\00\04\00\00\00__builtin_var_WorkgroupId__\00\05\00\07\00\09\00\00\00kernel_addi_arg_0\00\00\00\05\00\07\00\0E\00\00\00kernel_addi_arg_1\00\00\00\05\00\07\00\13\00\00\00kernel_addi_arg_2\00\00\00\05\00\05\00\16\00\00\00kernel_addi\00G\00\04\00\04\00\00\00\0B\00\00\00\1A\00\00\00G\00\04\00\07\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\06\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\06\00\00\00\02\00\00\00G\00\04\00\09\00\00\00!\00\00\00\00\00\00\00G\00\04\00\09\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0C\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\0B\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\0B\00\00\00\02\00\00\00G\00\04\00\0E\00\00\00!\00\00\00\01\00\00\00G\00\04\00\0E\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\11\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\10\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\10\00\00\00\02\00\00\00G\00\04\00\13\00\00\00!\00\00\00\02\00\00\00G\00\04\00\13\00\00\00\22\00\00\00\00\00\00\00\15\00\04\00\03\00\00\00 \00\00\00\00\00\00\00\17\00\04\00\02\00\00\00\03\00\00\00\03\00\00\00 \00\04\00\01\00\00\00\01\00\00\00\02\00\00\00;\00\04\00\01\00\00\00\04\00\00\00\01\00\00\00+\00\04\00\03\00\00\00\08\00\00\00\08\00\00\00\1C\00\04\00\07\00\00\00\03\00\00\00\08\00\00\00\1E\00\03\00\06\00\00\00\07\00\00\00 \00\04\00\05\00\00\00\0C\00\00\00\06\00\00\00;\00\04\00\05\00\00\00\09\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\0D\00\00\00@\00\00\00\1C\00\04\00\0C\00\00\00\03\00\00\00\0D\00\00\00\1E\00\03\00\0B\00\00\00\0C\00\00\00 \00\04\00\0A\00\00\00\0C\00\00\00\0B\00\00\00;\00\04\00\0A\00\00\00\0E\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\12\00\00\00\00\02\00\00\1C\00\04\00\11\00\00\00\03\00\00\00\12\00\00\00\1E\00\03\00\10\00\00\00\11\00\00\00 \00\04\00\0F\00\00\00\0C\00\00\00\10\00\00\00;\00\04\00\0F\00\00\00\13\00\00\00\0C\00\00\00\13\00\02\00\15\00\00\00!\00\03\00\14\00\00\00\15\00\00\00+\00\04\00\03\00\00\00\1E\00\00\00\00\00\00\00+\00\04\00\03\00\00\00\1F\00\00\00\01\00\00\00 \00\04\00\22\00\00\00\0C\00\00\00\03\00\00\006\00\05\00\15\00\00\00\16\00\00\00\00\00\00\00\14\00\00\00\F8\00\02\00\17\00\00\00=\00\04\00\02\00\00\00\18\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\19\00\00\00\18\00\00\00\00\00\00\00=\00\04\00\02\00\00\00\1A\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1B\00\00\00\1A\00\00\00\01\00\00\00=\00\04\00\02\00\00\00\1C\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1D\00\00\00\1C\00\00\00\02\00\00\00\84\00\05\00\03\00\00\00 \00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00!\00\00\00\1E\00\00\00 \00\00\00A\00\06\00\22\00\00\00#\00\00\00\09\00\00\00\1E\00\00\00!\00\00\00=\00\04\00\03\00\00\00$\00\00\00#\00\00\00\84\00\05\00\03\00\00\00%\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00&\00\00\00\1E\00\00\00%\00\00\00\84\00\05\00\03\00\00\00'\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00(\00\00\00&\00\00\00'\00\00\00A\00\06\00\22\00\00\00)\00\00\00\0E\00\00\00\1E\00\00\00(\00\00\00=\00\04\00\03\00\00\00*\00\00\00)\00\00\00\80\00\05\00\03\00\00\00+\00\00\00$\00\00\00*\00\00\00\84\00\05\00\03\00\00\00,\00\00\00\0D\00\00\00\1D\00\00\00\80\00\05\00\03\00\00\00-\00\00\00\1E\00\00\00,\00\00\00\84\00\05\00\03\00\00\00.\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00/\00\00\00-\00\00\00.\00\00\00\84\00\05\00\03\00\00\000\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\001\00\00\00/\00\00\000\00\00\00A\00\06\00\22\00\00\002\00\00\00\13\00\00\00\1E\00\00\001\00\00\00>\00\03\002\00\00\00+\00\00\00\FD\00\01\008\00\01\00")
  llvm.func @malloc(!llvm.i64) -> !llvm.ptr<i8>
  llvm.func @main() {
	%0 = llvm.mlir.constant(0 : i32) : !llvm.i32
	%1 = llvm.mlir.constant(1 : i32) : !llvm.i32
	%2 = llvm.mlir.constant(2 : i32) : !llvm.i32
	%3 = llvm.mlir.constant(1 : index) : !llvm.i64
	%4 = llvm.mlir.constant(8 : index) : !llvm.i64
	%5 = llvm.mlir.constant(8 : index) : !llvm.i64
	%6 = llvm.mlir.null : !llvm.ptr<i32>
	%7 = llvm.mlir.constant(1 : index) : !llvm.i64
	%8 = llvm.getelementptr %6[%7] : (!llvm.ptr<i32>, !llvm.i64) -> !llvm.ptr<i32>
	%9 = llvm.ptrtoint %8 : !llvm.ptr<i32> to !llvm.i64
	%10 = llvm.mul %5, %9 : !llvm.i64
	%11 = llvm.call @malloc(%10) : (!llvm.i64) -> !llvm.ptr<i8>
...

I took the binary code between the double quotations in @SPIRV_BIN(“”) out and used it as the SPIRV binary.

(2) Generate SPIRV binary from .cl kernel

I used the gemmN.cl kernel from shoc opencl benchmark suite.

I followed the example use-case provided by Ben Ashbaugh at https://github.com/bashbaug/SimpleOpenCLSamples/tree/master/samples/05_spirvkernelfromfile to generate the SPIRV binary, and run it on OCL runtime.

cat gemmN.cl

#ifdef DOUBLE_PRECISION
#define FPTYPE double
#else
#define FPTYPE float
#endif
 
__kernel void gemmN(const __global FPTYPE* A,
                	const __global FPTYPE* B, int N,
                	__global FPTYPE* C)
{
   
	// Thread identifiers
	const int globalRow = get_global_id(0); // Row ID of C (0..N)
	const int globalCol = get_global_id(1); // Col ID of C (0..N)
 
	// Compute a single element (loop over K)
	FPTYPE acc = 0.0f;
	for (int k=0; k<N; k++) {
       	acc += A[globalRow*N + k] * B[k*N + globalCol];
	}
 
	// Store the result
	C[globalRow*N + globalCol] = acc;

clang -c -cl-std=CL1.2 -target spir64 -emit-llvm -Xclang -finclude-default-header -O3 gemmN.cl -o gemmN64.ll

llvm-spirv gemmN64.ll -o gemmN64.spv

vim gemmN64.spv

^C^B#^G^@^@^A^@^N^@^F^@4^@^@^@^@^@^@^@^Q^@^B^@^D^@^@^@^Q^@^B^@^E^@^@^@^Q^@^B^@^F^@^@^@^Q^@^B^@^K^@^@^@^K^@^E^@^A^@^@^@OpenCL.std^@^@^N^@^C^@^B^@^@^@^B^@^@^@^O^@^F^@^F^@^@^@^K^@^@^@gemmN^@^@^@^E^@^@^@^G^@^N^@3^@^@^@kernel_arg_type.gemmN.float*,float*,int,float*,^@^C^@^C^@^C^@^@^@p~N^A^@^E^@^K^@^E^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^E^@^C^@^L^@^@^@A^@^@^@^E^@^C^@^M^@^@^@B^@^@^@^E^@^C^@^N^@^@^@N^@^@^@^E^@^C^@^O^@^@^@C^@^@^@^E^@^D^@^P^@^@^@entry^@^@^@^E^@^G^@^Q^@^@^@for.cond.cleanup^@^@^@^@^E^@^E^@^R^@^@^@for.body^@^@^@^@^E^@^D^@^T^@^@^@call^@^@^@^@^E^@^D^@^U^@^@^@conv^@^@^@^@^E^@^D^@^W^@^@^@call1^@^@^@^E^@^D^@^X^@^@^@conv2^@^@^@^E^@^D^@^[^@^@^@cmp26^@^@^@^E^@^C^@^\^@^@^@mul^@^E^@^E^@^_^@^@^@acc.0.lcssa^@^E^@^D^@ ^@^@^@add10^@^@^@^E^@^E^@!^@^@^@idxprom11^@^@^@^E^@^E^@"^@^@^@arrayidx12^@^@^E^@^C^@#^@^@^@inc^@^E^@^D^@$^@^@^@k.028^@^@^@^E^@^D^@%^@^@^@acc.027^@^E^@^C^@&^@^@^@add^@^E^@^D^@'^@^@^@idxprom^@^E^@^E^@(^@^@^@arrayidx^@^@^@^@^E^@^D^@*^@^@^@mul4^@^@^@^@^E^@^D^@+^@^@^@add5^@^@^@^@^E^@^E^@,^@^@^@idxprom6^@^@^@^@^E^@^E^@-^@^@^@arrayidx7^@^@^@^E^@^C^@2^@^@^@cmp^@G^@^D^@^E^@^@^@^K^@^@^@^\^@^@^@G^@^C^@^E^@^@^@^V^@^@^@G^@^D^@^L^@^@^@&^@^@^@^E^@^@^@G^@^D^@^M^@^@^@&^@^@^@^E^@^@^@G^@^D^@^O^@^@^@&^@^@^@^E^@^@^@G^@^D^@^L^@^@^@&^@^@^@^F^@^@^@G^@^D^@^M^@^@^@&^@^@^@^F^@^@^@G^@^M^@^E^@^@^@)^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^A^@^@^@^U^@^D^@^B^@^@^@@^@^@^@^@^@^@^@^U^@^D^@	^@^@^@ ^@^@^@^@^@^@^@+^@^D^@	^@^@^@^Y^@^@^@^@^@^@^@+^@^D^@   ^@^@^@0^@^@^@^A^@^@^@^W^@^D^@^C^@^@^@^B^@^@^@^C^@^@^@ ^@^D^@^D^@^@^@^A^@^@^@^C^@^@^@^S^@^B^@^F^@^@^@^V^@^C^@^G^@^@^@ ^@^@^@ ^@^D^@^H^@^@^@^E^@^@^@^G^@^@^@!^@^G^@
^@^@^@^F^@^@^@^H^@^@^@^H^@^@^@  ^@^@^@^H^@^@^@^T^@^B^@^Z^@^@^@;^@^D^@^D^@^@^@^E^@^@^@^A^@^@^@+^@^D^@^G^@^@^@^]^@^@^@^@^@^@^@6^@^E^@^F^@^@^@^K^@^@^@^@^@^@^@
^@^@^@7^@^C^@^H^@^@^@^L^@^@^@7^@^C^@^H^@^@^@^M^@^@^@7^@^C^@ 	^@^@^@^N^@^@^@7^@^C^@^H^@^@^@^O^@^@^@?^@^B^@^P^@^@^@=^@^F^@^C^@^@^@^S^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^T^@^@^@^S^@^@^@^@^@^@^@q^@^D^@    	^@^@^@^U^@^@^@^T^@^@^@=^@^F^@^C^@^@^@^V^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^W^@^@^@^V^@^@^@^A^@^@^@q^@^D^@  	^@^@^@^X^@^@^@^W^@^@^@?^@^E^@^Z^@^@^@^[^@^@^@^N^@^@^@^Y^@^@^@~D^@^E^@   ^@^@^@^\^@^@^@^U^@^@^@^N^@^@^@?^@^D^@^[^@^@^@^R^@^@^@^Q^@^@^@?^@^B^@^Q^@^@^@?^@^G^@^G^@^@^@^_^@^@^@^]^@^@^@^P^@^@^@^^^@^@^@^R^@^@^@~@^@^E^@ 	^@^@^@ ^@^@^@^\^@^@^@^X^@^@^@r^@^D^@^B^@^@^@!^@^@^@ ^@^@^@F^@^E^@^H^@^@^@"^@^@^@^O^@^@^@!^@^@^@>^@^E^@"^@^@^@^_^@^@^@^B^@^@^@^D^@^@^@?^@^A^@?^@^B^@^R^@^@^@?^@^G^@  	^@^@^@$^@^@^@#^@^@^@^R^@^@^@^Y^@^@^@^P^@^@^@?^@^G^@^G^@^@^@%^@^@^@^^^@^@^@^R^@^@^@^]^@^@^@^P^@^@^@~@^@^E^@  	^@^@^@&^@^@^@$^@^@^@^\^@^@^@r^@^D^@^B^@^@^@'^@^@^@&^@^@^@F^@^E^@^H^@^@^@(^@^@^@^L^@^@^@'^@^@^@=^@^F^@^G^@^@^@)^@^@^@(^@^@^@^B^@^@^@^D^@^@^@~D^@^E^@ 	^@^@^@*^@^@^@$^@^@^@^N^@^@^@~@^@^E^@	^@^@^@+^@^@^@*^@^@^@^X^@^@^@r^@^D^@^B^@^@^@,^@^@^@+^@^@^@F^@^E^@^H^@^@^@-^@^@^@^M^@^@^@,^@^@^@=^@^F^@^G^@^@^@.^@^@^@-^@^@^@^B^@^@^@^D^@^@^@^L^@^H^@^G^@^@^@^^^@^@^@^A^@^@^@*^@^@^@)^@^@^@.^@^@^@%^@^@^@~@^@^E^@ ^@^@^@#^@^@^@$^@^@^@0^@^@^@?^@^E^@^Z^@^@^@2^@^@^@#^@^@^@^N^@^@^@?^@^D^@2^@^@^@^R^@^@^@^Q^@^@^@8^@^A^@

spirv-dis gemmN64.spv

; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 52
; Schema: 0
           	OpCapability Addresses
           	OpCapability Linkage
           	OpCapability Kernel
           	OpCapability Int64
      	%1 = OpExtInstImport "OpenCL.std"
           	OpMemoryModel Physical64 OpenCL
           	OpEntryPoint Kernel %11 "gemmN" %__spirv_BuiltInGlobalInvocationId
     	%51 = OpString "kernel_arg_type.gemmN.float*,float*,int,float*,"
           	OpSource OpenCL_C 102000
           	OpName %__spirv_BuiltInGlobalInvocationId "__spirv_BuiltInGlobalInvocationId"
           	OpName %A "A"
           	OpName %B "B"
           	OpName %N "N"
           	OpName %C "C"
           	OpName %entry "entry"
           	OpName %for_cond_cleanup "for.cond.cleanup"
           	OpName %for_body "for.body"
           	OpName %call "call"
           	OpName %conv "conv"
 	          OpName %call1 "call1"
           	OpName %conv2 "conv2"
           	OpName %cmp26 "cmp26"
           	OpName %mul "mul"
           	OpName %acc_0_lcssa "acc.0.lcssa"
           	OpName %add10 "add10"
           	OpName %idxprom11 "idxprom11"
           	OpName %arrayidx12 "arrayidx12"
           	OpName %inc "inc"
           	OpName %k_028 "k.028"
           	OpName %acc_027 "acc.027"
           	OpName %add "add"
           	OpName %idxprom "idxprom"
           	OpName %arrayidx "arrayidx"
           	OpName %mul4 "mul4"
           	OpName %add5 "add5"
           	OpName %idxprom6 "idxprom6"
           	OpName %arrayidx7 "arrayidx7"
           	OpName %cmp "cmp"
           	OpDecorate %__spirv_BuiltInGlobalInvocationId BuiltIn GlobalInvocationId
           	OpDecorate %__spirv_BuiltInGlobalInvocationId Constant
           	OpDecorate %A FuncParamAttr NoCapture
           	OpDecorate %B FuncParamAttr NoCapture
           	OpDecorate %C FuncParamAttr NoCapture
           	OpDecorate %A FuncParamAttr NoWrite
           	OpDecorate %B FuncParamAttr NoWrite
           	OpDecorate %__spirv_BuiltInGlobalInvocationId LinkageAttributes "__spirv_BuiltInGlobalInvocationId" Import
  	%ulong = OpTypeInt 64 0
   	%uint = OpTypeInt 32 0
 	%uint_0 = OpConstant %uint 0
 	%uint_1 = OpConstant %uint 1
	%v3ulong = OpTypeVector %ulong 3
%_ptr_Input_v3ulong = OpTypePointer Input %v3ulong
   	%void = OpTypeVoid
  	%float = OpTypeFloat 32
%_ptr_CrossWorkgroup_float = OpTypePointer CrossWorkgroup %float
     	%10 = OpTypeFunction %void %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float %uint %_ptr_CrossWorkgroup_float
   	%bool = OpTypeBool
%__spirv_BuiltInGlobalInvocationId = OpVariable %_ptr_Input_v3ulong Input
	%float_0 = OpConstant %float 0
     	%11 = OpFunction %void None %10
      	%A = OpFunctionParameter %_ptr_CrossWorkgroup_float
      	%B = OpFunctionParameter %_ptr_CrossWorkgroup_float
      	%N = OpFunctionParameter %uint
      	%C = OpFunctionParameter %_ptr_CrossWorkgroup_float
  	%entry = OpLabel
     	%19 = OpLoad %v3ulong %__spirv_BuiltInGlobalInvocationId Aligned 32
   	%call = OpCompositeExtract %ulong %19 0
   	%conv = OpUConvert %uint %call
     	%22 = OpLoad %v3ulong %__spirv_BuiltInGlobalInvocationId Aligned 32
  	%call1 = OpCompositeExtract %ulong %22 1
  	%conv2 = OpUConvert %uint %call1
  	%cmp26 = OpSGreaterThan %bool %N %uint_0
    	%mul = OpIMul %uint %conv %N
           	OpBranchConditional %cmp26 %for_body %for_cond_cleanup
%for_cond_cleanup = OpLabel
%acc_0_lcssa = OpPhi %float %float_0 %entry %30 %for_body
  	%add10 = OpIAdd %uint %mul %conv2
  %idxprom11 = OpSConvert %ulong %add10
 %arrayidx12 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %C %idxprom11
           	OpStore %arrayidx12 %acc_0_lcssa Aligned 4
           	OpReturn
   %for_body = OpLabel
  	%k_028 = OpPhi %uint %inc %for_body %uint_0 %entry
	%acc_027 = OpPhi %float %30 %for_body %float_0 %entry
    	%add = OpIAdd %uint %k_028 %mul
	%idxprom = OpSConvert %ulong %add
   %arrayidx = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %A %idxprom
     	%41 = OpLoad %float %arrayidx Aligned 4
   	%mul4 = OpIMul %uint %k_028 %N
   	%add5 = OpIAdd %uint %mul4 %conv2
   %idxprom6 = OpSConvert %ulong %add5
  %arrayidx7 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %B %idxprom6
     	%46 = OpLoad %float %arrayidx7 Aligned 4
     	%30 = OpExtInst %float %1 mad %41 %46 %acc_027
    	%inc = OpIAdd %uint %k_028 %uint_1
    	%cmp = OpSLessThan %bool %inc %N
           	OpBranchConditional %cmp %for_body %for_cond_cleanup
           	OpFunctionEnd

-Lenny

Hi @lenny_guo, sorry for the late reply. Somehow I didn’t receive an email for this thread…

Keep in mind that SPIR-V is just a common binary language used by multiple Khronos APIs. Each API has further constraints on the subset of SPIR-V it accepts. This is typically expressed in the API’s SPIR-V environments, e.g., Vulkan, OpenCL. So you’ll need to make sure the generated SPIR-V blob follows the constraints listed there.

Here you are using a SPIR-V blob that is meant for Vulkan (with the Shader capability). So it won’t be accepted by OpenCL (supports the Kernel capability). That’s why you are seeing the failure I believe.

At the moment the SPIR-V dialect and various conversions to it support Vulkan the best. To enable support OpenCL, there shouldn’t be a huge amount of additional work because lots of the stuff can just be shared. (You can read my previous comment here.) There are quite a few contributions landed previously to push on that. So we have tests like this, this, and this for OpenCL. But right now we don’t have an in-tree mlir-opencl-runner like mlir-vulkan-runner to demonstrate the whole follow to you. It would be quite awesome if that can be built up for sure. For the specific task you are facing and just to turn the light on, you might want to play with this IR snippet by plugging in the stuff you want to see how it goes.

1 Like

Hi Lei, thank you very much for your kind help :smiley:. I will think about this problem and get back to you if I have further questions. I am also considering if I want to take the task to implement the mlir-opencl-runner. I will let you know if I decide to do that as part of my project.

No problem! If you’d like to contribute to bring up more OpenCL related features that would be truly awesome; but certainly no rush! Let me know if you have more questions. Happy to help.

Good morning Lei, I decided to proceed with the mlir-opencl-runner after talking to my advisor :smile:. I will work on this every Wednesday and Thursday. I want to let you know where I am. I started getting to know everything related to MLIR two months ago. This means that I do not have too much experience. However, I have been reading the vulkan runner code for a while, and I understand there are about two steps. The first step is to pinpoint the compilation/lowering process from MLIR to the OpenCL version SPIR-V dialect and binary. The second step is to make the SPIR-V binary run on the OCL runtime, which includes a runtime and a wrapper (maybe). I am looking forward to more information and details about each step to make it work. I will work on the lowering process first. Is the lowering process available now? I am not sure, and will need to try a few examples you shared with me. Also, could you please give me more information for the lowering process? Thank you very much. Please let me know if you have any comments. :smiley:

That is fantastic! Thanks for your help in advance! No worries about just getting started on MLIR and stuff; actually adding support for OpenCL might be a nice hands-on experience to get yourself more familiar with the stack. :slight_smile:

The steps you described in the above are correct. There are two major components: 1) lowering for device kernels and 2) lowering/runtime for host scheduling. The fundamental bits for lowering to device kernels are mostly there but likely certain things are not properly connected. The test files I pointed out in the above are good places to start with. In the test file you can find the command to run to actually convert the IR–please feel free to give it a shot. But I assume you are not very familiar with SPIR-V and its OpenCL flavor based on the previous discussions. (That is entirely fine!) You may want to check out OpenCL’s SPIR-V execution environment to learn more. It’s quite different from Vulkan’s, which you can use as a contrast. I think a good old first step is to get a kernel with spv.FMul working. I don’t expect too much work for it actually. For host side scheduling, the way to do it in MLIR core is to provide a lowering path from GPU dialect’s host launch logic to some op that represents OpenCL API calls. They are eventually converted into LLVM external function calls that are backed by C++ implementations. The Vulkan runner might be a bit verbose for you to understand the logic given Vulkan’s low-level nature. OpenCL is more akin to CUDA/ROCm so you can probably get more direct understanding by reading mlir-cuda-runner. I’d suggest you to follow the logic in the main file of mlir-cuda-runner to understand what each step is doing. The meat is in its runMLIRPasses function. Those passes are worth taking a look. You may want to just run the compiled mlir-cuda-runner (if you have a NVIDIA GPU) on some test IR with -print-ir-after-all to really see the stuff after each step. Those passes convert the input GPU launch logic gradually into LLVM external function calls that are linked at JITing time into C++ implementations wrapping API calls.

One thing worth noting is that various MLIR runners are based on LLVM JIT support. This may or may not be a problem, depending on your final goal or usage environment. But nonetheless, it’s certainly a valuable contribution that we’d really appreciate! :slight_smile:

Hi Lei, many thanks for your detailed instructions. I have successfully run the test examples you suggested above. I have a few questions here.

  1. Is this command (mlir-opt -allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics module-structure-opencl.mlir -o -) mentioned in here to generate the OpenCL style SPIR-V dialect from a GPU dialect?
    I used to generate the Vulkan style SPIR-V by “mlir-opt test-tensor-matmul.mlir -linalg-bufferize --convert-vector-to-scf --convert-linalg-to-parallel-loops -test-gpu-greedy-parallel-loop-mapping -convert-parallel-loops-to-gpu -gpu-kernel-outlining -canonicalize -legalize-std-for-spirv -test-spirv-entry-point-abi --convert-gpu-to-spirv”

  2. I believe this is the fundamental bits that lower to OpenCL device kernels. Am I correct?

I am currently reading the source code of “mlir-translate” and “mlir-opt” to understand the functionality of flags “-test-spirv-roundtrip”, “-spirv-lower-abi-attrs”, “-verify-diagnostics”, and “-allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics”.

Please let me know if I am not on the right direction. :slightly_smiling_face: Thank you very much.

| lenny_guo
January 20 |

  • | - |

Hi Lei, many thanks for your detailed instructions. I have successfully run the test examples you suggested above.

Great!

I have a few questions here.

  1. Is this command (mlir-opt -allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics module-structure-opencl.mlir -o -) mentioned in here to generate the OpenCL style SPIR-V dialect from a GPU dialect?

Yep. What’s important is the spv.target_env there. It controls the target environment. Kernel there means OpenCL. (Shader means Vulkan.)
Generally common math ops are available to both OpenCL and Vulkan. Ops specific to one execution environment will depend on either Kernel or Shader, if you read the SPIR-V spec.
(But the above example does nothing: it just directly returns. So it does not reveal much. :slight_smile:

  1. I used to generate the Vulkan style SPIR-V by “mlir-opt test-tensor-matmul.mlir -linalg-bufferize --convert-vector-to-scf --convert-linalg-to-parallel-loops -test-gpu-greedy-parallel-loop-mapping -convert-parallel-loops-to-gpu -gpu-kernel-outlining -canonicalize -legalize-std-for-spirv -test-spirv-entry-point-abi --convert-gpu-to-spirv”

  2. I believe this is the fundamental bits that lower to OpenCL device kernels. Am I correct?

That’s the right flow if you want to go from something at Linalg level. MLIR conversion is progressive and composble. That’s why you see such a long list of command-line options: they are performing one step or one chunk of IR conversion. Various runners are just programmatically calling into these passes.

I am currently reading the source code of “mlir-translate” and “mlir-opt” to understand the functionality of flags “-test-spirv-roundtrip”, “-spirv-lower-abi-attrs”, “-verify-diagnostics”, and “-allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics”.

Those can be useful. Another thing I’d suggest you to look into is run mlir-translate -serialize-spirv on a piece of SPIR-V IR and run spirv-dis from Vulkan SDK to dump the raw bits in a SPIR-V blob to get familiar with it. That’s what gotten sent to the drivers and you can clearly see stuff like OpCapability Kernel and others.

Hi Lei,

Thank you very much. I have successfully generated the SPIR-V binary from a OpenCL MLIR, and was able to disassemble the generated SPIR-V binary using spirv-dis :smile:. Also, I have tried to compose new OpenCL MLIR kernels myself, thanks to the OpenCL/Vulkan documents you referred me to read. Furthermore, I have tested that the generated SPIR-V binary can be taken by the clCreateProgramWithIL() function, which is a good news for our project. I am still reading the OpenCL/Vulkan documents. I think I am ready to move on to the mlir opencl runner :slightly_smiling_face:. Many thanks!