[RFC] Dialect for bufferization-related ops

[RFC] Dialect for bufferization-related ops.

Background

TensorLoadOp and BufferCastOp used to belong to std dialect. When the split happened, they ended up in memref dialect.

Dialect Scope and Goal

The bufferize dialect is intended to hold operations used by bufferization.

The following ops will be moved to BufferizeOps.td:

  • memref.tensor_loadbufferize.memref_to_tensor
  • memref.buffer_castbufferize.tensor_to_memref

Optionally, we can move memref.clonebufferize.clone. I think this op was created mostly for bufferization.

Introducing a new dialect would have some advantages:

  • readability of partially-bufferized IR would improve.
  • memref dialect would not depend on tensor dialect
  • confused users would stop trying to “load” a tensor from a buffer using memref.tensor_load

@herhut, @ftynse, @nicolasvasilache, @_sean_silva, @dfki-mako, @dfki-jugr

This is borderline but these reasons appear to be too “thin” to warrant creating yet another dialect for just two ops. Are there plans for additional ops here?

Was there some layering/cyclic dependency issue that this also helped with?

I’m generally ok to have this.

I am all for it.

I would also move the clone operation, it is meant for bufferization transformations.

This is essentially a dialect that is used by a couple of optimization passes during gradual bufferization. Having a separate dialect would make this more apparent, which is a good thing.

1 Like

These ops are all coming with major foot-guns (undefined behavior in non-obvious situations, mostly related to buffer mutation): so I’m all in favor of segregating from the “well behaved” memref ops.

1 Like

I have a few concerns / questions:

  • From a dependence graph perspective, a new dialect that depends on both memref and tensor and adds a bridge between them makes conceptual sense. In practice though, these ops don’t depend on a tensor dialect, they only use the tensor type, which is builtin. Is there a practical benefit to moving these out of memref?

  • What is your opinion on the name of the dialect here? Typically dialect names are more “nouns” than “verbs”. “buffer” vs “bufferize”?

  • There are many “bufferize” passes in MLIR (though only one that claims to be “Comprehensive”). Is this dialect something that unifies all of them, or not? What is our plan to pull these all together into a framework?

  • How does this help with " * confused users would stop trying to “load” a tensor from a buffer using memref.tensor_load"?

That said, I agree with Mehdi that these are not super well curated ops. Getting them out of memref would be a good cleanup. What do you think about moving them to a dustbin somewhere, instead of trying to define what “bufferization” is?

All “bufferize” passes except for the “Comprehensive” one are based on the framework described in slides, recording. It was our first attempt at bufferization, and it has a number of issues, in particular requiring overly conservative handling of in-place calculations. The design goal there was an ecosystem of interoperating bufferizations (you can see how many dialects are affected by the number of passes) – the talk explains in more detail the many benefits of doing it this way – it’s analogous to how we shard conversion to the LLVM dialect into multiple passes.

The “comprehensive” in ComprehensiveBufferization refers to the fact that it does a whole-program bufferization atomically, enabling a sophisticated whole-program aliasing/reuse analysis for the in-place support (analogous to regalloc – it inserts copies to make tied operands assignable the same register). The “comprehensive” bufferization handles a closed ecosystem of ops fairly specific to linalg, and in particular does more in the framework to support in-place-guaranteed ops (such as linalg ops with their “tied operands” (think like SSA MachineInstr) via the outs operands). I believe they recently extended it to use an OpInterface, though I don’t know the details. In the general case though, the bufferization pattern for an op involves ops from various other dialect (e.g. tensor.cast → memref.cast, tensor constant → memref.global), so no matter how you interface it, you are still likely to end up with a per-dialect “Transforms/Bufferize.cpp” or similar to isolate that dependency from the main dialect itself, if you want an open ecosystem.

There’s also other approaches to bufferization which are neither of those. Such as the one that IREE uses which is different from the others. Within “dispatch regions” (which correspond to linalg ops / small clusters of linalg ops), IREE uses the Comprehensive one (or will soon, it currently has a different hand-rolled bufferization which was the “version minus one” of the Comprehensive bufferization, roughly).

In the context of this discussion though, “bufferization” only refers to that first category of bufferization transforms based on the talk I linked, since ComprehensiveBufferization never materializes IR in a partially converted tensor/memref state, and in IREE’s approach “tensors” have been lowered to other lower-level types and so these ops which are specific to to !builtin.tensor don’t apply.

I don’t think there is any one grand unified solution to bufferization that suits all users. It is a fairly nuanced problem with deep ramifications into runtime design (refcounting, asynchronicity, etc.), what types you are using to model things, and intersects a lot with your op set going into the bufferization process. For example, in IREE, by the time that we are doing bufferization, the top-level of the program that is being bufferized consists of a closed ecosystem of ops with really well-constrained interfaces.

1 Like

Interesting, that makes sense. That said, this highlights a significant problem we have in MLIR - we don’t have a good nursery to bake out these ideas, and they end up taking global names like “bufferize” early on - even if they aren’t the fully baked scheme. This sends a mixed signal to people who are working on other things, and to people don’t know any better and assume it as baked as something like CSE.

The point about multipass lowering is a good one. Here’s a teaser for the llvmdev meeting talk I’m working on which is related :slight_smile:

-Chris

I’ve spent no small amount of effort trying to figure a way out of this state and not been super successful. Some of this stuff takes significant iteration and refinement/debate over the idea before it really should be promoted to take a “core” name. Sometimes too, it would be wise to just have a way to let the clock run a bit and be in a position to pick winners vs latching on to early good intentions. Too much occupation of these “global names” without enough windshield time has bothered me for a while.

The problem we run in to is deciding what nurseries there should be and situating them so they aren’t just collections of half implemented parts – and putting them in a place that downstreams can get some mileage on them.

While just a local decision for things in my sphere, we have been actively working to proof out more of these things in a sandbox repo that is dependency clean and forces a certain amount of full-thought/e2e assembly before upstreaming. This comes with some overhead and needs to be maintained, but gives us a place to copy code in/out of and validate that it holds together on its own. It would be a significant workflow/efficiency improvement if this weren’t just a local exercise but more part of the way we develop MLIR.

1 Like

At the same time, we shouldn’t overweight things like a “dialect name” in the repo: in the grand scheme of things it really does not carry much impact other than some basic initial hints. Users get use to it fairly quickly later, and other than a shorter prefix there is really nothing “privileged” here.
Also names are very much overloaded in the various domain the project will be reused: the same name used for a dialect when taken in the context of a compiler for a chemistry DSL, a compiler for HW, a compiler for ML, etc. would be misleading to a user coming from another of these contexts, making it impossible to pick a name that works universally.

I would see most of the improvement on the documentation side of all of this though.

This is also why I support moving these into a separate dialect here (maybe bufferization_internal would be more explicit here): it’ll allow to have a dedicated documentation page for the dialect that will make the context for it very clear. Passes that work within the expectation of this dialect will be prefixed with the dialect name as well, providing a better grouping which again helps navigating all of this within the project.

2 Likes

Yes, linalg::TensorExpand/CollapseShapeOp cannot be moved to tensor dialect because of this dependency. More about that in this thread.

1 Like

What about calling it partial_buffer or modular_buffer dialect, since one of the main features of this bufferization is that it can be applied partially?

Discussed that with @matthias-springer. He is working on making ComprehensiveBufferization interface-based. The transformations and the interface itself need a home. I think it makes sense just calling it “bufferize” and keeping all ops, transforms, interfaces for both bufferization styles there, especially, since we might use both in some cases.

@clattner i feel like there is already a “buffer” dialect and we call it “memref” :slight_smile:

This seems to answer affirmatively to the question from Chris at unifying different bufferizations in the longer run. Nice!

I am also supportive of moving these ops to a separate dialect. It aligns well with the overall guiding principles of std splitting (I think we decided to put these operations there as a means of postponing this RFC), and makes the memref dialect entirely lowerable to LLVM (this was mentioned as a concern in the splitting discussion).

My two cents to the naming discussion: renaming dialects isn’t that hard if we need to do it. It’s daunting, mind-dumbing, fairly monotonic but largely automatable. So we can go with “bufferize” and rename if something better comes up.

1 Like

It’s not clear to me that that is a desirable end state. We added ComprehensiveBufferize as a self-contained, non-generalized thing deliberately to bypass some design discussions, which it seems needs to happen now. If we are generalizing it, then we need to reconsider the tradeoffs. In particular, should we generalize it with the aim of deleting the older bufferization system? Is there some fundamental technical reason that that isn’t doable or desirable?

Ok, yes, I agree that the cost of having short names for dialects isn’t high. I don’t think we want to get into !com.sifive.firrtl.SInt flavored nonsense! Maybe the concern could be handled with some sort of status matrix. We tried (not particularly effectively) to do this with LLVM backends for example, but that isn’t super well maintained (perhaps due to low visibility).

On splitting these ops out, I still don’t understand the layering issue. The builtin tensor type is built in to MLIR, so these ops don’t add any dependency on a tensor dialect etc. Is there a cost to keeping them in the memref dialect (which, I agree, is a buffer dialect)? The point upthread about their semantics not being really well specified seems orthogonal to where they live.

-Chris

Correct. The tensor type is built in. The dependency on tensor dialect is due to the canonicalization patterns for these ops in MemRefOps.cpp. Also these operations are not lowerable to LLVM as @ftynse mentioned. It would be nice to clean this up.

@matthias-springer needs a dialect where he can put his interfaces/passes in any way. It would be nice to have a dedicated “bufferize” dialect for all of that.

Right. At the same time if we want to make it visible and obvious to users that there are two styles of bufferization in MLIR, I think it would be better to keep them in one place with a self-explanatory name like “bufferize”. After both bufferization approaches are actually working and well documented we might want to reconsider the tradeoffs or maybe come up with another approach.

It seems like the context for this should be part of the discussion of creating a “bufferization” dialect, but the OP only mentions operations in the memref dialect. I think this also ties in with Sean’s point that there are a lot of different “bufferization” efforts going on, but there isn’t a clear consensus on if all of those should exist (or even what all is being built). I’m a bit wary of this missing context, especially if the dialect is going to be filled with stuff (a lot of which I assume already exists) not necessarily discussed during the dialects creation.

– River

The creation of a bufferization dialect seems like a perfect time for that discussion though, right? I’m a bit concerned that the full scope of what people intend to put in this dialect is not being discussed here, and is kind of just implicit.

– River

1 Like