All “bufferize” passes except for the “Comprehensive” one are based on the framework described in slides, recording. It was our first attempt at bufferization, and it has a number of issues, in particular requiring overly conservative handling of in-place calculations. The design goal there was an ecosystem of interoperating bufferizations (you can see how many dialects are affected by the number of passes) – the talk explains in more detail the many benefits of doing it this way – it’s analogous to how we shard conversion to the LLVM dialect into multiple passes.
The “comprehensive” in ComprehensiveBufferization refers to the fact that it does a whole-program bufferization atomically, enabling a sophisticated whole-program aliasing/reuse analysis for the in-place support (analogous to regalloc – it inserts copies to make tied operands assignable the same register). The “comprehensive” bufferization handles a closed ecosystem of ops fairly specific to linalg, and in particular does more in the framework to support in-place-guaranteed ops (such as linalg ops with their “tied operands” (think like SSA MachineInstr) via the
outs operands). I believe they recently extended it to use an OpInterface, though I don’t know the details. In the general case though, the bufferization pattern for an op involves ops from various other dialect (e.g. tensor.cast → memref.cast, tensor constant → memref.global), so no matter how you interface it, you are still likely to end up with a per-dialect “Transforms/Bufferize.cpp” or similar to isolate that dependency from the main dialect itself, if you want an open ecosystem.
There’s also other approaches to bufferization which are neither of those. Such as the one that IREE uses which is different from the others. Within “dispatch regions” (which correspond to linalg ops / small clusters of linalg ops), IREE uses the Comprehensive one (or will soon, it currently has a different hand-rolled bufferization which was the “version minus one” of the Comprehensive bufferization, roughly).
In the context of this discussion though, “bufferization” only refers to that first category of bufferization transforms based on the talk I linked, since ComprehensiveBufferization never materializes IR in a partially converted tensor/memref state, and in IREE’s approach “tensors” have been lowered to other lower-level types and so these ops which are specific to to
!builtin.tensor don’t apply.
I don’t think there is any one grand unified solution to bufferization that suits all users. It is a fairly nuanced problem with deep ramifications into runtime design (refcounting, asynchronicity, etc.), what types you are using to model things, and intersects a lot with your op set going into the bufferization process. For example, in IREE, by the time that we are doing bufferization, the top-level of the program that is being bufferized consists of a closed ecosystem of ops with really well-constrained interfaces.