Bufferization error related to ```memref.clone```

Hello everybody,

We ran into some problems with some of our older code that no longer compiles, and it’s not clear to us if it’s a MLIR compiler bug, or just us not calling the magical optimization step at the good place.

The MLIR code fragment (simplified from a larger example) is the following:

func private @process(%i:tensor<512xf32>,%j:tensor<512xf32>)->(tensor<512xf32>,tensor<512xf32>)

func @myfun()->(tensor<512xf32>,tensor<512xf32>) {
  %1   = constant 1 : index
  %0   = constant 0 : index
  %512 = constant 512 : index
  %zero = constant dense <0.0> : tensor<512xf32>
  
  %o1,%o2 =
    scf.for %idx = %0 to %512 step %1
       iter_args(%acc1=%zero,%acc2=%zero)->(tensor<512xf32>,tensor<512xf32>) {
       
       %acc1_out,%acc2_out = call @process(%acc1,%acc2)
    	  :(tensor<512xf32>,tensor<512xf32>)->(tensor<512xf32>,tensor<512xf32>)
	
       scf.yield %acc1_out,%acc2_out:tensor<512xf32>,tensor<512xf32>
    }
  
  return %o1,%o2: tensor<512xf32>,tensor<512xf32>
}

On this code we apply the following compilation command:

mlir-opt debug.mlir --tensor-bufferize --tensor-constant-bufferize  --scf-bufferize --func-bufferize --buffer-results-to-out-params --finalizing-bufferize --buffer-deallocation | 
mlir-opt --convert-linalg-to-affine-loops --lower-affine --convert-scf-to-std | 
mlir-opt --canonicalize --convert-memref-to-llvm --canonicalize --convert-std-to-llvm --canonicalize --reconcile-unrealized-casts

The whole pipeline crashes at --reconcile-unrealized-casts because builtin.unrealized_conversion_cast remain in the code. It is likely that these operations remain because some memref.clone operations remain in the code, which --convert-memref-to-llvm or --canonicalize have not converted.

Funny enough, if we use separate tensor constants to initialize the two iteration arguments of the scf.for loop, the need for a memref.clone disappears, and so does the problem. But if the input is not a constant, but another tensor variables, this method would not work, so our problem remains (not to mention the fact that we should be able to compile correct code).

Our question is the following: Is it possible to compile our fragment by using other compilation options (how?), or we just stumbled on some bug/limitation of mlir-opt?

Our MLIR commit version is d9e46beace3120fbc4810dda5c3ed88f93e862a4. We would like, if possible, to remain on it, in order not to break other things…

Best regards,
Dumitru

I had a closer look on the bufferization pipeline and I think the mentioned clones emitted by the buffer-deallocation pass are correct in this place. We introduce them, since the buffers are used as iteration arguments in a loop and we need to ensure that the program is still correct in all loop passes. In the current state, clones are usually collapsed by a canonicalization pass. This works fine for most of the emitted clones, but not in this special case. Here, we still need an additional buffer.
We can introduce an additional pass that converts all remaining clone operations into a alloc + copy operations that can be treated by other passes (memref to llvm e.g.) later on.

What do you think?

1 Like

Yes, converting all remaining memref.clone into memref.alloc and memref.copy seems the good solution. I noted that this is the solution proposed by a new patch. Thanks a lot!