Linalg-bufferize pass for tensor.insert_slice

Hello,

I am having an issue in the linalg-bufferize pass for the tensor.insert_slice operation.

mlir-opt --linalg-bufferize <input file>

with the <input file> being

func @rank_reducing_insert_slice_canonicalize(%arg0 : tensor<?x?xf32>, %arg1 : index,
    %arg2 : index, %arg3 : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>
{
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %0 = tensor.insert_slice %arg0 into %arg3[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : tensor<?x?xf32> into tensor<?x?x?xf32>
  return %0 : tensor<?x?x?xf32>
}

would produce the following error.

/home/sflab/mlir-tv-testsuite/opts/canonicalize/canonicalize_00129.src.mlir:9:8: error: 'linalg.copy' op expected indexing_map #1 to have 2 dim(s) to match the number of loops
  %0 = tensor.insert_slice %arg0 into %arg3[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : tensor<?x?xf32> into tensor<?x?x?xf32>
       ^
/home/sflab/mlir-tv-testsuite/opts/canonicalize/canonicalize_00129.src.mlir:9:8: note: see current operation: "linalg.copy"(%3, %12) ( {
^bb0(%arg4: f32, %arg5: f32):  // no predecessors
  "linalg.yield"(%arg4) : (f32) -> ()
}) : (memref<?x?xf32>, memref<?x1x?xf32, affine_map<(d0, d1, d2)[s0, s1, s2, s3] -> (d0 * s1 + s0 + d1 * s2 + d2 * s3)>>) -> ()

The test case is provided in llvm-project/mlir/test/Dialect/Tensor/canonicalize.mlir

I wanted to ask if there is an optimization pass prior to linalg-bufferize that replaces the tensor.insert_slice into other operations. It would help if I could know the specific optimization pass for replacing tensor.insert_slice, and maybe how it is optimized, to understand the semantics of tensor.insert_slice.
Thank you!