LLVM Discussion Forums

How to print memref value on a RISC-V backend?

Hi,

I’m trying to run a matrix multiplication function on a RISC-V FPGA Evaluation Kit:

func @main() {
  // Allocate space for %A, %B, and %C.
  %A = alloc() : memref<2x2xi32>
  %B = alloc() : memref<2x2xi32>
  %C = alloc() : memref<2x2xi32>
  // Define constant for %A, %B, and %C.
  %c0 = constant 0 : i32
  %c1 = constant 2 : i32
  // Assign %c1 to %A and %B.
  linalg.fill(%A, %c1) : memref<2x2xi32>, i32
  linalg.fill(%B, %c1) : memref<2x2xi32>, i32
  
  // Assign %c0 to %C.
  linalg.fill(%C, %c0) : memref<2x2xi32>, i32
  call @matmul(%A, %B, %C) : (memref<2x2xi32>, memref<2x2xi32>, memref<2x2xi32>) -> ()

  return
}

func @matmul(%A: memref<2x2xi32>, %B: memref<2x2xi32>, %C: memref<2x2xi32>) {
  affine.for %arg3 = 0 to 2 {
    affine.for %arg4 = 0 to 2 {
      affine.for %arg5 = 0 to 2 {
        %a = affine.load %A[%arg3, %arg5] : memref<2x2xi32>
        %b = affine.load %B[%arg5, %arg4] : memref<2x2xi32>
        %ci = affine.load %C[%arg3, %arg4] : memref<2x2xi32>
        %p = muli %a, %b : i32
        %co = addi %ci, %p : i32
        affine.store %co, %C[%arg3, %arg4] : memref<2x2xi32>
      }
    }
  }
  return
}

After uploading the code to the RISC-V board, I use gdb to verify that the calculation is executed 8 times, which means it runs correctly. Then I want to print the result of the matrix multiplication. I think there are two methods to print:

  • Cross compile mlir_runner_utils library to RISC-V.
  • Call the matmul MLIR function in C code.

IMO, calling the function in C code is more convenient than cross compiling. But when I lower the MLIR to LLVM IR, I find that there are 7 parameters in LLVM IR for each memref parameter in MLIR, the matmul function has a total of 21 parameters:

define void @matmul(i32* %0, i32* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i32* %7, i32* %8, i64 %9, i64 %10, i64 %11, i64 %12, i64 %13, i32* %14, i32* %15, i64 %16, i64 %17, i64 %18, i64 %19, i64 %20) !dbg !109 {
  %22 = insertvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } undef, i32* %0, 0, !dbg !110
  %23 = insertvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } %22, i32* %1, 1, !dbg !112
  %24 = insertvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } %23, i64 %2, 2, !dbg !113
  %25 = insertvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } %24, i64 %3, 3, 0, !dbg !114
  ... ...

It seems that the matmul needs to take a struct { i32*, i32*, i64, [2 x i64], [2 x i64] } for each memref parameter. In this case, I don’t know how to call the matmul in a C code. Should I construct a data structure like { i32*, i32*, i64, [2 x i64], [2 x i64] } in C side, or could MLIR C API help? Is there a better method to print memref value on a RISC-V backend?

Thanks!

Hongbin

See https://mlir.llvm.org/docs/ConversionToLLVMDialect/#calling-convention-for-ranked-memref and https://mlir.llvm.org/docs/Tutorials/Toy/Ch-6/

Basically, you’re on the right track.

Thanks a lot for the link, now I can call the matmul MLIR function in C code, and the matrix multiplication runs on my RISC-V FPGA kit correctly.

I use the --emit-c-wrappers option the generate the C interface of the function, and define the C struct as the argument:


typedef struct MemRef_descriptor_ *MemRef_descriptor;

struct MemRef_descriptor_ {
  int *allocated;
  int *aligned;
  intptr_t offset;
  intptr_t sizes[2];
  intptr_t strides[2];
};

I checked LLVM IR and found that only the aligned data is extracted from the { i32*, i32*, i64, [2 x i64], [2 x i64] }:

%55 = extractvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } %28, 1, !dbg !145
... ...
%62 = extractvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } %35, 1, !dbg !152
... ...
%69 = extractvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } %42, 1, !dbg !159
... ...
%78 = extractvalue { i32*, i32*, i64, [2 x i64], [2 x i64] } %42, 1, !dbg !168

I want to confirm if only aligned data field is used when the memref has constant dimension?

The aligned data is where the data starts for the current memref indexing. However this memref can be a view into a larger one, in which case the data pointer will always point to the beginning of the allocation I believe.

Even if the dimension sizes are constant, there could be other symbols in the layout map (strides, offset) that are bound to dynamic values and so the other fields would be used — constant dimension sizes and non-constant strides/offset is not typical but has use cases in practice.

The aligned data pointer is aligned to memref element size boundaries and it is where the real data starts. So, it’s the one always used for access. For eg. when the memref elements are of a large vector type > 16 bytes, it’d be often different from the allocated ptr.

The “data pointer” means aligned data pointer, and the “beginning of the allocation” is the address pointed by the allocated pointer, right? I’m wondering if the data pointer always point to the beginning of the allocation, then the offset will always be assigned 0 value. Or as the Uday said:

There will be an offset between the first aligned data and the beginning of the allocated buffer when the memref element is greater than 16 bytes. Is my understanding correct?

It’s not when the memref element size > 16 bytes, but based on whether the allocation is aligned at the element size boundary or not. (I used “often” and 16 bytes because malloc on most systems aligns to 16 byte boundaries.)