Remove tight coupling of the BufferDeallocation pass to std and linalg operations

We would like to remove the tight coupling of the BufferDeallocation pass to the std and linalg dialects. Currently, this pass creates astd.alloc, std.dealloc and linalg.copy operations in certain cases to create temporary copies and free allocated buffers.

In order to remove this coupling, we could think about several different ways to achieve this purpose:

  • Extend the notion of “allocation resources” to provide the opportunity in order to create associated copy and free operations.
  • Provide an abstract interface that is passed to the BufferDeallocation pass constructor in order to create the appropriate operations for each case.
  • An additional option, that we have not been aware of :smiley:

There are different pros and cons related to these approaches.

Extended Allocation Resources

The advantage of this approach would be that each individually specialized allocation resource can provide its unique methods to create copies and free operations without touching the actual operations it is associated with. This would have the additional benefit of creating other std.alloc like operations in different dialects that can essentially inherit the same copy and free behavior. Furthermore, this solution would be accessible from the whole ”MLIR universe”.

One major question that arises would be how to handle cases in which we must create temporary copies from multiple allocation resources:

%0 = "my_alloc"(...) : memref<...>
    br ^bb1(%0 : ...)
...
%1 = "other_alloc"(...) : memref<...>
    br ^bb1(%1 : ...)
...
^bb1 (%2 : memref<...>)
...
// %2 could now point to memory locations provided by two different alloc
// operations that might leverage different allocation resources

However, as far as I remember, @River707 pointed out that these resources are just a temporary solution and might be replaced in the future…

Pass Interface

This possibility would avoid any interference with allocation resources or operations in the MLIR project. Instead, the BufferDeallocation pass can then create the appropriate operations for each case. This would also cover the example above: In such a case, a specific function could be invoked on this particularly designed interface that specifies which operation(s) should be generated. The downside of this approach is the obvious limitation to the BufferDeallocation pass that would block us from reusing these features in other passes/parts of the MLIR project.

What do you think about these options? Are there alternative solutions?

1 Like

Can we just add a std.copy? Then clients can lower std.copy, std.alloc, std.dealloc to whatever makes sense for their dialect. Maybe this is a good time to create a memref dialect with memref.copy and also move alloc/dealloc there as well.

Can you elaborate on this “allocation resources” concept? What use cases need this? If necessary, we can add attributes to those ops for users to plug in custom resources e.g. alloca could be alloc() {resource = "std.automatic"}, but it could be extended to alloc() {resource = "gpu.on_device"}. The attribute would be required for correctness – passes that lower these ops must only lower resources that they know how to handle.

There were multiple discussions to turn linalg.copy into std.copy like other ops before it.

As highlighted by @bondhugula here, BufferResultsToOutParams would also benefit from graduating linalg.copy to a more neutral (not linalg-specific) location.

@_sean_silva - perhaps my message wasn’t clear. You don’t need to and shouldn’t need to move linalg.copy somewhere to reuse the move return values to output parameters transformation. BufferizeFuncOpConverter already achieved that without such a move - with a templated CopyOpTy (this worked with lmhlo and linalg without the pattern depending on any dialect), and now there is also the CopyOpInterface.

It’s not obvious to me that C++ templates are the right abstraction mechanism for this particular problem.

Multiple ops from the lmhlo dialect have moved into std (such as various memref-related casts). I don’t see why we cannot move lmhlo.copy (which is identical to linalg.copy) to std as well, which eliminates this entire problem.

Do you see a strong reason for keeping a distinct linalg.copy and lmhlo.copy? Or to keep lmhlo.copy when we have an std.copy?

This proposal is beyond linalg.copy vs. lhlo.copy or std.copy but about allowing different copies for different allocation kinds. Specifying which copy to use is just one side-effect of making this configurable (using callbacks like in type conversion or via an interface).

For instance, we currently have hard-coded the fact that we do not use free for alloca allocated memory (and maybe also not copying it). Instead, one could configure that this particular allocation operation has no-ops for copy and free.

To support reference counting, one could have an rc.alloc with rc.increment and rc.decrement operations.

I agree that this can be done by lowering alloc and copy differently but that approach does not allow mixing reference counted allocations with normal ones.

If we can agree to have a std.copy, then using allocation resources seems a good way to model this. We could then go ahead and have a reference counted resource.

1 Like

I’m running into the need for this as well, but it goes a step further: this pass should not have any dependencies on the type of the memory being dealt with. Even if linalg.copy was changed to std.copy that still is something for memrefs and std.alloc/std.dealloc/etc are also memref only. memref is a pretty specific thing and really limiting the usefulness of these kinds of passes.

Having a type interface-like thing that provided memory-like functions (alloc, dealloc, copy, set, etc) would be useful here in this pass as well as many others that just want to pass through what they are working on with minimal coupling.

I’m not sure there’s anything like a type interface today, though one could probably implement it with a dialect interface that was queried through the type (type -> parent dialect -> buffer interface) - interested in hearing other ideas. Definitely can’t do this with C++ templates (as then if you wanted to mix memrefs and other buffer-like things you’d have to stamp out this pass a whole bunch).

1 Like

We have this now. Attributes and types can implement interfaces just like operations. Using this facility to try to re-envision the ShapedType hierarchy has been brought up several times. I suspect the actual answer to that is not to s/ShapedType/ShapedTypeInterface/ but instead to do what you are suggesting and realize the specific abstractions that things need to operate generically on the types.

The interfaces doc is updated to describe the mechanics.

Aside from some design work, I think these generalizations are mostly in the just needs someone to care and do it category.

1 Like

We have this now.

Of course we do, MLIR rocks :heart:

Actually, we think that such a

query could be useful, but is still difficult to implement. However, considering the case that there are several types of allocs (my_alloc and other_alloc e.g.), we can not derive operations from such an interface, since we can not distinguish between both types. As described here:

Maybe it would be better to extend the allocation resources as described by

What do you think?

So we want to generalize this across two dimensions:

  1. Be able to work on different types beyond memref.
  2. Be able to have different alloc/free operations even for the same type.

If we wanted to go with AllocationResource as the central abstraction, then types that are different than memrefs would need to allocate from a different resource. Would this be sufficient for your needs, @benvanik?

If we model this via type interfaces as the sole means, then we can no longer allocate memref from different resources. This is currently needed to differentiate alloc from alloca but I can also envision using this to separate memrefs that are reference counted from ones that are not by using a different allocation resource for them.

In the end, this boils down to the question how one wants to compose things. I’d like to be able to use differently allocated memref values in the same context. The current approach of using the MemoryEffect interface to identify allocations works well for this.

I read the description, but I fail to see the motivation here?

Ref-counting schemes are fine, but this begs for different types than memref to model it though: it isn’t clear to me why should we have different allocation operation for the same type when we can extend the type system.

Sorry for the late reply. This fell off my radar.

I totally agree with @herhut that generalizing this to non-memref types is a little bit different from different alloc/free ops for the same type. Not sure how we want to disentangle that. I agree that wanting std.alloca and std.alloc to both return memref seems desirable and at odds with only relying on a type interface.

Given that we don’t currently have a requirement for non-memref types (or anybody actively pushing on that), I think we should consider that as a soft requirement at this point, unless we find a really obvious way to handle it.

This seems promising. You mean something like std.alloc() {resource = "gpu.on_device"} as in my original reply?

@joker-eph The basic motivation was to enable the emission of different copy operations than linalg.copy (and potentially different dialect-specific alloc and dealloc ops?). Currently, this tight coupling does not allow the BufferDeallocation pass to be applied to arbitrary dialect-specific scenarios. However, our intention was to make it as “generic as possible”.

As @_sean_silva mentions, I understand why we’d want to generate different alloc/dealloc/copy for different types, but it seems like you’re going beyond this?

This looks like a very welcome change – to provide the flexibility to use dialect-specific copy operations and custom alloc/free. Could we restart this discussion? The fact that the bufferization pass depends on linalg simply due to linalg.copy is already a layering problem.

We discusses this a bit more recently and came to the conclusion that the best way forward could be to add a new bufferize dialect, that would contain a copy operation (with implicit allocation) and two cast operations from tensor to memref and back to allow for gradual bufferization.

Users of this could then implement additional pattern to lower the copy operation to whatever target operation they prefer.

Also, by having the cast operation in the bufferize dialect, we can give it semantics that models its use more closely. In particular, it would no longer need to have any side-effects, given its constrained use.

Would this address your use-case, too, @bondhugula?

@dfki-mako @dfki-jugr

1 Like

While adding a memref copy operation sounds fine, do you really need a new dialect for just these three ops that are going to be interspersed among numerous other ops from another dialect? The copy operation could be added to the same dialect where memref casting, allocation and load/store ops live.

Also, what are the benefits of a copy with an implicit allocation instead of the standard pattern of alloc + copy? The latter form is already well handled by various patterns, utilities and passes and consistent with other patterns. For eg. adding a zero memref is a copy - you shouldn’t have to fold the alloc into it.

I’d still have concerns with “fake” ops that we can’t reason about properly using the usual tools (side-effects, etc.) and rely instead on “magic” properties.