Moving Operations from one Block into another Operation/Block

Hi all,

I have written a utility that I call clustering out of existing MLIR API calls. I admit the naming may not be the best. The API is as follows:

unsigned Cluster(std::set<Operation > &ops,
Block &block,
cluster_op instruction, /
Enum indicating what type the clustering operation is */
mlir::MLIRContext *context);

The aim of the above function is to take a set of operations (usually a subgraph) present within a block, create a new OP (cluster op) within the block, that has a region and blocks within it, and then move the operations into the new OP.

While moving the ops into the new OP, we also need to re-connect ops that used to connect to the moving ops, correctly as to maintain the connectivity of the IR.

I have the above implemented and working in a number of places in our compiler. I feel I may be missing some terminology here, but is there a inbuilt mechanism within the MLIR core that can perform the above, without me having to write the above implementation ? It feels to me like this could be generic functionality. If not do we see this as useful functionality we could generalise and add to MLIR ?


Infer from your wording (my apologies if I misconstrue): there is a splice call in LLVM with an API that is really nasty to use but does what you want.

Alternatively, the RewritePattern has a mergeBlock and a mergeBlockBefore.
@dcaballe was asking for something similar but without a RewritePattern and I believe he is working on something.

Thanks for the ping, Nicolas.

I think creating a new region op that encloses a range of existing ops is a common pattern. In addition to what Nicolas mentioned, inlineRegionBefore is another utility that can be used to move ops across regions but it falls a bit short for more generic cases and it’s part of the pattern-rewrite infra. Having a more generic and flexible utility would be super useful!

I haven’t thought too much about the implementation of such a utility yet but in my use case I’m trying to go from something like:

  parentOp {

to something like:

  parentOp {
    NewOp2 {

If you already have a utility working for something similar I’ll be happy to take a look. Indeed, it’s not a trivial transformation. Maybe we can generalize it and contribute it upstream!

Hi Diego

What you describe in the example above, is exactly the kind of functionality I have implemented.

I am more than happy to share the implementation, but I need a couple of days to try and clean out some dependencies I have that would prevent from making it generic, and able to compile outside our out of tree project.

I will try to post a patch around the middle of next week.


Related to this, I was fighting the rewriter today as there is no RAUW equivalent.
In my particular case, I was moving a region into another but I was left with some RAUW needs: a value defined outside both ops needed to be replaced by a BBArg of the new op.

As a consequence I had to do a post-processing that explicitly clones with bvm and had fun tracking the self-use that I needed to filter out.

If one of you also has a non-cloning, non BBArgs-replacing RAUW for RewritePattern, I’d be happy to review to plug this hole.

I will try to post a patch around the middle of next week.

Great! Looking forward to it!

If one of you also has a non-cloning, non bbArgs-replacing RAUW for RewritePattern, I’d be happy to review to plug this hole.

I have the impression that we should be able to decompose the transformation into multiple utilities. One of them could be what you need here. We’ll see.

Are the ops contiguous in the block? (This goes towards Nicolas’s suggestion, but I wasn’t sure if that is the case). And is it required that all ops are top most in block? (could one, instead of passing in, use the parent block of head op?) [I’d probably use a predicate function rather than std::set to allow different sets to be used and different clustering approaches, such as clustering based on attribute without populating a set]

@aardeb, just for your reference, I implemented something quickly off-tree to unblock my work here. You can take a look at moveOperationsBefore, getDefsWithExternalUses and the code around their invocations to better understand my use case. Hopefully that helps!

Hi all, we (GrAI Matter Labs) have a very similar problem (and implementation) as well. First, we cluster related ops based on MLIR’s slice analysis. Something like:

llvm::SetVector<Operation *> slice;
                      [&states](Operation *op) -> bool {
                          return somePredicate(op);
llvm::SetVector<Operation *> backwardRoots(slice.begin(), slice.end());
for (auto *root: backwardRoots) {
    llvm::SetVector<Operation *> backwardSlice;
            root, &backwardSlice, [&states, &root](Operation *op) -> bool {
                return someOtherPredicate(op);
    slice.insert(backwardSlice.begin(), backwardSlice.end());

return topologicalSort(slice);

In our case the ops need not be contiguous in a block. Afterwards these get moved into some new op region:

template <typename RegionOpTy>
RegionOpTy Cluster::makeRegionOp(OpBuilder &builder, const Cluster& cluster, BlockAndValueMapping &map) {
    Region body;
    mlir::IRRewriter bodyBuilder(builder.getContext());
    prepareBuilderForFloatingRegion(bodyBuilder, body);

    for (auto op : cluster.ops) bodyBuilder.clone(*op, map);
    auto mappedValues = cluster.mappedValuesUsedOutside(map).takeVector();
    auto returnOp = bodyBuilder.create<SomeTerminator>(
        UnknownLoc::get(builder.getContext()), llvm::makeArrayRef(mappedValues));

    auto regionOp = builder.create<RegionOpTy>(UnknownLoc::get(builder.getContext()),
    return regionOp;

and a bunch of updating ops in the original region to use result values of this new op.

If something like that is available builtin we would be very happy to use that as well.

Hi all

I pushed a draft patch of my code here:

The interface is:

template <class TerminatorOp, class ClusterOp>
Operation *Cluster(std::vector<Operation *> &ops,
llvm::iplist::iterator insertClusterOp,
Block &block, mlir::MLIRContext *context)

  • The std::vector<Operation *> &ops do not need to be a linear range but need to be topologically sorted and form a connected subgraph
  • llvm::iplist::iterator insertClusterOp, is the position in the block where the new op (where the ops will be moved) is inserted into the block.

The code I have is a bit convoluted but works. Happy to receive feedback on how to improve it.

@dcabelle/@jschuhmacher looking through your code it seems that a lot of the complexity in my code may be because I do not assume a linear range of ops and take subgraph, since you code seems to get away with trying to find the external op connections and such. I may also be re-implementing some MLIR internal code leading to a more complex solution.