Bug in --deallocate-buffers

Another bug, this time in --deallocate-buffers. The input is the following function (toy fft).

  func @fft(%arg0: memref<?xf32>) -> memref<?xf32> {
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %0 = dim %arg0, %c0 : memref<?xf32>
    %1 = and %0, %c1 : index
    %2 = cmpi "eq", %1, %c0 : index
    assert %2, "@fft: size of input data is not power of 2."
    %3 = shift_right_signed %0, %c1 : index
    %4 = call @subtensor(%arg0, %c0, %3) : (memref<?xf32>, index, index) -> memref<?xf32>
    %5 = call @subtensor(%arg0, %3, %3) : (memref<?xf32>, index, index) -> memref<?xf32>
    %6:2 = call @bf(%3, %4, %5) : (index, memref<?xf32>, memref<?xf32>) -> (memref<?xf32>, memref<?xf32>)
    %7 = cmpi "sgt", %3, %c1 : index
    cond_br %7, ^bb1, ^bb2(%6#0, %6#1 : memref<?xf32>, memref<?xf32>)
  ^bb1:  // pred: ^bb0
    %8 = call @fft(%6#0) : (memref<?xf32>) -> memref<?xf32>
    %9 = call @fft(%6#1) : (memref<?xf32>) -> memref<?xf32>
    br ^bb2(%8, %9 : memref<?xf32>, memref<?xf32>)
  ^bb2(%10: memref<?xf32>, %11: memref<?xf32>):  // 2 preds: ^bb0, ^bb1
    %12 = call @tensor_concat(%10, %11) : (memref<?xf32>, memref<?xf32>) -> memref<?xf32>
    %13 = call @bitrev(%12) : (memref<?xf32>) -> memref<?xf32>
    %14 = index_cast %0 : index to i32
    %15 = sitofp %14 : i32 to f32
    %16 = alloc(%0) : memref<?xf32>
    %c0_0 = constant 0 : index
    %c1_1 = constant 1 : index
    scf.parallel (%arg1) = (%c0_0) to (%0) step (%c1_1) {
      %17 = load %13[%arg1] : memref<?xf32>
      %18 = divf %17, %15 : f32
      store %18, %16[%arg1] : memref<?xf32>
    return %16 : memref<?xf32>

The output of mlir-opt --buffer-deallocation is unchanged. I believe this is incorrect. For instance, variables %8, %9 should be deleted (they were created by a function call, and must be deallocated by the caller, under the given calling convention).

The buffer deallocation pass does not handle cross-function lifetime of values. This would require a program-global analysis or a contract that function results never alias and are safe to free. As this is not what we wanted/needed, it is currently unimplemented.

@dfki-mako FYI.

It would be great if we could bail out (signalPassFailure()) in these situations instead of miscompiling.

1 Like

I agree in general. I would not call this a miscompile, though, as the program remains correct. It may leak memory and you would need an additional pass that takes care of cross-function memory dealloction.

Also, how do you envision this to be implemented? We could blacklist this specific call operation but users of this pass would be free to add their own call operations, which would break the blacklisting approach, again.

We probably need to document more clearly the contract of the pass. It’s not clear to me what it is currently, such that users can understand that this is intended behavior, or write additional passes that interact with this one in an understandable way such as deallocating things that this pass intentionally doesn’t deallocate.

From my point of view as a user, I also think this is the main issue: the absence of a clear contract for the bufferization passes.

I volunteer to help. You can either use me (and my PhD) as guinea pigs to read your proposals and give feedback, or we can provide our understanding as users. We can also show you what we managed to to with the existing bufferization algorithms, and how we think they can be improved (in order to improve the performance of our code).

Best regards,

Currently there is a PR which improves the documentation of the surroundings of the bufferization pass. We think that this PR adds to the understanding and maybe you could review it from a user’s perspective. Additional feedback from your side regarding the clear contract would be greatly appreciated. :slightly_smiling_face:

This would be interesting to hear about. We could talk about this at a future ODM. Or would you prefer a different format?