What is the strategy for tensor->memref conversion? (bufferization)

Yes, there is a fair amount of history in the naming that is in need of cleaning up. The team at DFKI is also working on splitting the insertion of de-allocations and optimization of alloc placement into separate passes, so that one does not have to use the pipeline exactly as the hlo to lhlo lowering does.

I quite like the bufferize name (and have also started using it within kernel generator). So I would be all for standardizing on that.

@dfki-mako can you prioritize the cleanup, so that new uses can directly use better naming?

For patterns that implement the tensor to memref transformation, owning dialects can move them to a bufferize.cc and maybe expose a populateBufferizationPatterns so that the patterns can be mixed into a larger bufferization pass.

Now that I understand it better, maybe nothing is missing :slight_smile:

One small thing that still seems to be missing is being able to split the bufferization process into multiple passes for clarity (I really don’t like having mega-conversion passes; it’s harder to debug and test). The key thing needed for that is source/target TypeConverter materializations. Unfortunately, MLIR core doesn’t have suitable ops for that. tensor_load is a suitable source materialization, but tensor_store is not suitable for use as a target materialization as it doesn’t create a new SSA value of memref type (I guess you could insert a bunch of std.dim ops + std.alloc + tensor_store, but that’s bulky and doesn’t naturally fold away at the end of the conversion process).

In npcomp, I locally created tensor_to_memref and memref_to_tensor ops that I use for the materializations (example). I would love to be able to upstream these, but we don’t really have a good place to put them. Since I can easily keep these locally, it’s not a huge problem though.

To be precise, we would define “bufferize” as “the process of converting to buffer types”, and hence mostly refers to a set of conversion patterns that do type conversion (which might involve inserting allocs).

This is then distinct from buffer optimizations and insertion of deallocations, etc. Is that what you were thinking?

Quick question: Is deallocation-insertion and buffer optimizations expected to be running before and/or after linalg tile-and-fuse on buffers?

Also, what is the expected ordering of deallocation-insertion with respect to buffer optimizations?

Def +1 from me on the tensor_to_memref and back.

Thanks. Could you expand out the op descriptions there: https://github.com/llvm/mlir-npcomp/blob/master/include/npcomp/Dialect/Refback/IR/RefbackOps.td#L45 ? Are those ops functionally equivalent to tensor_load and alloc + tensor_store resp. - but keeping all the relevant IR together? I’ve always thought that memref to tensor could be done by using a mem2reg equivalent (treating the entire buffer as one thing), which we may anyway need at some point and which will be control flow aware across ops with regions.

Correct. They could be lowered to tensor_load and alloc+tensor_store. In the case of npcomp, we convert all tensors to memrefs eventually, so they all get eliminated by folding tensor_to_memref(memref_to_tensor(m)) -> m.

I would think they would run “after” linalg : tile (and fuse) on buffers. Basically do a straight-forward allocation. Then tile + fuse. Then some buffers are unnecessary. Do buffer optimizations on the remaining buffers. Doing it before might lead to some weird dependencies that is not fundamental to the computation.

Also if you do linalg tile + fuse on tensors, this whole point is mute.

1 Like

Ok, I’ve been reading the code and have come up with a laundry list of refactorings that I’ll do over the next working days (@dfki-mako don’t worry about it) . Overall this looks pretty nice, modulo the confusing naming.

  • Transforms/BufferPlacement.h -> Transforms/Bufferize.h and clearly split out those components from BufferPlacement.cpp which will just have the buffer placement pass itself (unrelated to the conversion infra stuff)
  • BufferAssignmentTypeConverter -> BufferizeTypeConverter
  • move bufferization patterns/passes consistently to lib…MyDialect/Transforms/Bufferize.cpp and include…MyDialect/Transforms/Bufferize.h and test…MyDialect/bufferize.mlir
  • add tensor_to_memref ops and upstream the source/target materialization from npcomp into BufferizeTypeConverter. This makes the various bufferization passes (currently added only for testing the bufferization patterns) composable with each other in production pipelines.
  • upstream scf.if/scf.for type conversion from npcomp (not tied to bufferization per se, but is a drop-in pattern for those conversions, and bufferization is a nice test pass which also happens to be useful for composing into pipelines)
  • upstream some extract_element and some other miscellaneous bufferization patterns.
1 Like

All of this sounds good to me. But for the third bullet: which Dialect's transforms are you moving them to? These passes already depend on or in future would depend on interfaces used by region holding ops, and the Std dialect won’t depend on such. Why can’t they stay in lib/Transforms?

Reg. bullet 5: Could the type conversion be implemented on LoopLikeOpInterface instead of on scf.for? Both affine.for and scf.for are equivalent in that regard - they also both support return values. This is although a general question I had about whether conversions and op rewrite patterns could be implemented on op interfaces - I think there are no examples of this.

See the last 2 commits in Linalg.

I’d like to move to a model where each dialect is “self-sufficient” w.r.t. bufferization. I was referring to e.g. https://reviews.llvm.org/D88083 which will be moved to Dialect/Shape/Transforms/Bufferize.cpp

Interesting idea. I’ll look into it.

Thanks for cleaning this up.

There is work underway to separate the bufferization into

  • the patterns (which should be bufferize)
  • a set of optimizations (loop hoisting, dominance based hoisting to avoid frees)
  • deallocation insertion

See https://reviews.llvm.org/D87756 for the WIP change.

Our approach so far has been to combine the patterns from various dialects into a single lowering step to buffers. Not arguing against the materialization approach you put forward, just mentioning this so that both variants remain available. In essence, this requires access to the registered patterns.

Nice. @dfki-mako FYI to avoid duplicating work.

1 Like

Generally the idea is that you can bufferize (also partially) early (if you want), then keep optimizing the IR on buffers, which includes optimizing allocations but can also mean tiling. The placement of deallocations should be relatively late, as it finalizes lifetimes of buffers and makes further optimizations harder.

The buffer optimizations are designed to run before inserting deallocations.

2 Likes

Newbie question here: I’d very much like to use the features used in the presentation you mention, and in particular the conversion patterns BufferAssignmentFuncOpConverter,
BufferAssignmentReturnOpConverter, BufferAssignmentCallOpConverter. They seem to be in file BufferPlacement.cpp, but this file does not exist in the main branch (just verified). Which branch includes it?

Dumitru

@dpotop note that @_sean_silva did a bunch of refactorings and improvements to make this more progressive and composable.

See the invocation of the test-tensor-matmul.mlir in https://reviews.llvm.org/D90953 for usage.
Basically it’s a bunch of conversion patterns that need to be applied.

I’m not sure I understand. Do you mean that the patch you mentioned (and which does not include file BufferPlacement.cpp) supersedes previous work which includes BufferPlacement.cpp? This would be nice, because for the previous work I could not find a patch. BTW: is this patch able to handle function signature and return op conversion?

Also, if this is true: how can I install the patch you mention? I just have the up-to-date llvm-project repository. How can I automate the patch application process, especially if the patch has dependences? Is there a page explaining it?

You asked about the conversion patterns, the patterns are in each of the conversion pass (xxx-bufferize, e.g. func-bufferize implemented in createFuncBufferizePass). They do not live in BufferPlacement.cpp as of today.

BufferPlacement has become BufferOptimization a few weeks back, see the commit history.

Depending on how you want to “use the features used in the presentation”, you can:

  1. create a new pass, populate the conversion patterns you need.
  2. call the passes in order as is done in https://reviews.llvm.org/D90953
  3. something else

BufferOptimization seems relatively independent from the patterns. It seems the “buffer finalization” is what performs full conversion (https://github.com/llvm/llvm-project/blob/f7bc56826616814a656866fd50e90a35a8e461eb/mlir/test/lib/Transforms/TestFinalizingBufferize.cpp). FuncBufferize seems to be an implementation of that.

@_sean_silva mentioned on Friday he was going to do a presentation of the refactored pieces. In the meantime, the documentation of each xxx-bufferize seems relevant. You can just mlir-opt --help | grep -A10 bufferize to see what exists and under what name.

2 Likes

Thanks a lot for the previous reply. I have a final, very practical question. Assume I have the following function:

func @myfun1(%i:tensor<10xf32>)->(tensor<10xf32>) {
  %o = absf %i : tensor<10xf32>
  return %i:tensor<10xf32>
}

How can I bufferize it? mlir-opt --std-bufferize --func-bufferize won’t handle it.
I even thought of adding an explicit map operation around absf (instead of the implicit one in the semantics of absf) but I cannot find a suitable map operation that can be automatically bufferized.

As Nicolas mentioned, I’ve been doing some significant refactoring here. If you wait a week, this will all be much easier. I will present at ODM soon (just signed up for Nov 19 ODM “Type conversions the not-so-hard way: MLIR’s new composable Bufferize passes”)

The thing that isn’t bufferized there is the absf op, because there is no buffer equivalent. I’m waiting on review of some patches that make it work. The short answer is that you need https://reviews.llvm.org/D90731 and https://reviews.llvm.org/D90354 and run -convert-elementwise-to-linalg -linalg-bufferize -func-bufferize. See mlir/integration_test/Dialect/Linalg/CPU/test-elementwise.mlir in https://reviews.llvm.org/D90354

2 Likes

Excellent! Thank you!
Dumitru