Improving canonicalization hooks

Hi all,

In another random flight of fancy, I’ve been itching to make it easier and prettier to implement canonicalization hooks for the “simple case”. Right now, you have to do something like in TableGen:

def AttachOp : FIRRTLOp<"attach"> {
...
  let hasCanonicalizer = 1;
}

and then implement it like this:

void AttachOp::getCanonicalizationPatterns(OwningRewritePatternList &results,
                                           MLIRContext *context) {
  struct Folder final : public OpRewritePattern<AttachOp> {
    using OpRewritePattern::OpRewritePattern;

    LogicalResult matchAndRewrite(AttachOp op,
                                  PatternRewriter &rewriter) const override {
      // Single operand attaches are a noop.
      if (op.getNumOperands() <= 1) {
        rewriter.eraseOp(op);
        return success();
      }
      // ...other stuff...
      return failure();
    }
  };
  results.insert<Folder>(context);
}

The problem is that this requires a ton of boilerplate - a nested class, a super indented set of logic, etc. There are other variants of these (declaring the struct outside of the method; then optionally declaring matchAndRewrite out of line) but they are all a ton of boilerplate.

I think we can get to something much better (for the simple matchAndRewrite case). I think we can get to:

def AttachOp : FIRRTLOp<"attach"> {
...
  let hasSimpleCanonicalizer = 1;
}

and then implement it like this, which eliminates all the boilerplate [1]:

LogicalResult AttachOp::canonicalize(AttachOp op, PatternRewriter &rewriter) {
  // Single operand attaches are a noop.
  if (op.getNumOperands() <= 1) {
    rewriter.eraseOp(op);
    return success();
  }

  // Other stuff.

  return failure();
}

I prototyped this (without tablegen support) in this patch, which has both a closure approach (not as nice) and the static method approach.

Does this seem like a promising thing to work on, or does anyone have any specific concerns? This wouldn’t remove any of the existing support for the general case.

-Chris

[1] in case you’re curious, this is implemented as a static method taking an ‘op’ instead of an instance method because you have to delete this, which would be UB for an instance method.

Having typed the complicated version many times for what could have been a simple case, I’m +1 on this.

I’ve prototyped something similar(but different) in the past, which is to just add sugar support to the OwningRewritePatternList:

void AttachOp::getCanonicalizationPatterns(OwningRewritePatternList &results,
                                           MLIRContext *context) {
  results.insert([](AttachOp op, PatternRewriter &rewriter) {
      // Single operand attaches are a noop.
      if (op.getNumOperands() <= 1) {
        rewriter.eraseOp(op);
        return success();
      }
      // ...other stuff...
      return failure();
  });
}

IMO, something like ^ ends up being cleaner and generalizes to more scenarios. WDYT?

– River

1 Like

Hey River,

In the patch above, you can see that I prototyped basically the same thing. The problem is that you still get way too many levels of indent, shoving the body of the code over to the right. Also as you’ve written it, you’re depending on clang inferring the result type of the closure, which has a non-zero impact on compile time.

I agree that the method should be added to OwningRewritePatternList though, there is no reason for it to be a free standing function. That just made it simple for me to play with out of tree.

-Chris

Also, just to be clear, if you look at the patch, my expectation is that this:

void AttachOp::getCanonicalizationPatterns(OwningRewritePatternList &results,
                                           MLIRContext *context) {
  // should be a method on `results` of course.
  addCanonicalizer<AttachOp>(results, context, canonicalize);
}

and:

  let extraClassDeclaration = [{
    static LogicalResult canonicalize(AttachOp op, PatternRewriter &rewriter);
  }];

will be handled by tblgen, which is why I focused on making it be mechanical.

-Chris

It’s always bugged me that the canonicalizer is in separate code, rather than being directly in the tablegen (like the verifer).

Could it be:
simpleCanonicalizer=[{ ... }];

Right, I didn’t really read the patch that well :wink:

With a lambda the indent is generally one more to the right than usual, which depends on the complexity of the code of course, but usually isn’t that bad.

While true, if I(and I suspect many others lazy like me) wanted to quickly add a pattern (or a callable function in the overwhelming majority of cases) I would lean towards lambda because the activation energy is smaller.

I personally am more interested in the general case, given that if any of these ops end up with another canonicalization, the user is going to (predication that may or may not be true) heavily lean towards shoving everything in one pattern instead of adding multiple. That makes it harder to debug, composability is more difficult, etc. etc. Having an easy transition story there is a necessary step for me being +1 on making the one pattern case as simple as in the original post (which I’m not against it).

– River

1 Like

Having C++ code inside of tablegen files (*.td) is just pretty painful for the purposes of in-editor code completion, compilation, and navigation! (and even auto formatting I think)

2 Likes

Agreed, but since this is somewhat inherent in tablegen files generating c++ code, and since c++ code is already endemic in tablegen, maybe it’s something we should embrace more completely? From my perspective, we either need to believe that Tablegen solves an important problem, or determine that it is not worth the pain for the benefit it provides and deprecate it.

I think there is give and take here. I look at tablegen like I look at C++ meta-template programming. I can use it in a constrained way to solve the problems I’m interested in, but I in no means want to completely embrace it in all of my C++ code. Tablegen makes the process of generating C++ easier (by allowing for a more declarative DSL approach), but it is in no means an effective environment for developing C++. This is why we have generally taken the approach (at least upstream) of trying to keep the amount of inline C++ to a minimal, because it is painful to debug/analyze/code-search/<insert anything to do with C++ code here>.

– River

1 Like

Just wanted to make it clear that I’m +1 in general (this includes that tablegen part). There is no good reason why defining a canonicalization should be that much more difficult than a fold in the simple cases. Just want to make sure that the path from simple canonicalization to non-simple is easy enough to prevent a single mega-canonicalization pattern. (Also happy to just discuss this on the patch itself when it is posted for review)

– River

1 Like

+1 to making this more lightweight on the specification side.

While this is being discussed and the system is under change: I always wondered why canonicalizations are strictly tied to operations. The definition of what a canonical IR is is hardwired in that sense. Would it be reasonable to make this more configurable? Maybe by adding a registry for this so that users can add their own canonicalization patterns independent of specifying an operation?

One use case I have for this is around shape dialect. Depending on what the final IR of the lowering pipeline looks like, it can be beneficial to canonicalize shape computations to descriptor accesses (if the target has a runtime encoding of these) or to scalar computations (if the target uses an explicit encoding). So in that sense the canonical IR is different.

A registry would also make it easier to specify additional canonicalizations out of tree without having to register a fake operation for it.

Would it be welcome to change the system in this regard or is this considered too much flexibility?

Actually, it was my mistake - I refactored the patch several times along the way and deleted the variant that I was referring to. It basically ends up looking like this:

void AttachOp::getCanonicalizationPatterns(OwningRewritePatternList &results,
                                           MLIRContext *context) {
  results.insert<AttachOp>(
      context, [&](AttachOp op, PatternRewriter &rewriter) -> LogicValue {
        // Single operand attaches are a noop.
        if (op.getNumOperands() <= 1) {
          rewriter.eraseOp(op);
          return success();
        }

        for (auto operand : op.getOperands()) {
          // Check to see if any of our operands has other attaches to it:
          //    attach x, y
          //      ...
          //    attach x, z
          // If so, we can merge these into "attach x, y, z".
          if (auto attach = getDominatingAttachUser(operand, op)) {
            SmallVector<Value> newOperands(op.getOperands());
            for (auto newOperand : attach.getOperands())
              if (newOperand != operand) // Don't add operand twice.
                newOperands.push_back(newOperand);
            rewriter.create<AttachOp>(op->getLoc(), newOperands);
            rewriter.eraseOp(attach);
            rewriter.eraseOp(op);
            return success();
          }
        }
        //...
        return failure();
      });
}

As a consequence of the long type names, and the arguments that need to be passed into the closure, clang-format pushes the code up 8 levels of indent instead of 2. This is a big deal to me. I tried other variants (e.g. going for static C functions as shown in the patch) to reduce nesting, but that leads to other boilerplate and (with tblgen support) would force you to put the implementation of the folder logic into a specific .cpp file.

In contrast, the “method” variant doesn’t have these problems. You can declare the simple thing with one level of indent:

LogicalResult AttachOp::canonicalize(AttachOp op, PatternRewriter &rewriter) {
  // Single operand attaches are a noop.
  if (op.getNumOperands() <= 1) {
    rewriter.eraseOp(op);
    return success();
  }

  for (auto operand : op.getOperands()) {
    // Check to see if any of our operands has other attaches to it:
    //    attach x, y
    //      ...
    //    attach x, z
    // If so, we can merge these into "attach x, y, z".
    if (auto attach = getDominatingAttachUser(operand, op)) {
      SmallVector<Value> newOperands(op.getOperands());
      for (auto newOperand : attach.getOperands())
        if (newOperand != operand) // Don't add operand twice.
          newOperands.push_back(newOperand);
      rewriter.create<AttachOp>(op->getLoc(), newOperands);
      rewriter.eraseOp(attach);
      rewriter.eraseOp(op);
      return success();
    }
 }
 return failure();
}

This can be put in any .cpp file, all the boilerplate is generated, and it covers the vastly most common case without reducing generality.

I think it is a pretty nice mix of tradeoffs.

-Chris

Unrelatedly, I think it would be worth making OwningRewritePatternList carry a MLIRContext to simplify the signature of all the things that take it.

-Chris

The original idea (not sure whether this is currently happening) is that this is important for scalability. Consider a compiler for something like “swift-when-llvm-gets-rewritten-to-use-mlir-pervasively”. You’d have 5 or 6 major phases with different predominant dialects (including machine instructions) and canonicalization happening many times on multiple different levels of abstraction. Each of these major phases will have many other optional dialects that are occasionally involved (e.g. an OpenML dialect or whatever).

You don’t want canonicalization for the “SIL” level or the “LLVM IR level” to have to juggle all the patterns for the dialects that aren’t active, just because those dialects are linked into the process. Having this be op driven means the canonicalizer can lazily instantiate the patterns that it wants based on what actually occurs in the IR. This makes it more scalable without requiring the compiler author to manually configure canonicalize.

-Chris

I’m not exactly sure what you mean here, but this sounds more like a lowering problem than a canonicalize problem to me.

-Chris

To me this is kind of in contradiction with the principle of canonicalization. The “canonical form” of the IR should be well defined for a dialect and is really about the op semantics and what form is the easiest to analyze and transform, reducing the complexity on transformation to not have to support multiple equivalent form IR. Some ref: Canonicalization | sunfishcode.github.io

Basically I agree with Chris that this looks more like a lowering question than a canonicalization one.

This is pretty neat!

Something that is relied on in some case and that we probably shouldn’t lose is the ability for an operation A to define a canonicalization pattern “rooted” at operation B. This works fairly naturally right now with getCanonicalizationPatterns for operation A including a pattern defined to match operation B.
That allows for example the producer of a type used by the dim operation to provide some rewriting when matching a dim.

1 Like

The canonical form of the IR is not the same as the canonical form of a single dialect. If I compose dialects, for example the mhlo dialect with the shape dialect, I also need canonicalization patterns across the two dialects. Currently we place these in the mhlo dialect as we assume it is always composed with the shape dialect, so that works.

In the example I gave, I am composing the shape dialect with the memref dialect, which means I want to canonicalize certain shape expressions to just a dim op, because that is the canonical form to express the concept when using memrefs. Where do I place these patterns? In the shape dialect, assuming it always composed with the memref dialect? Staying with @clattner argument, this would be inefficient, as it adds more patterns than are needed.

This all gets complicated a bit currently, as the memref dialect is also used in places (IREE I believe) that much rather had just a pointer and the shape computations on the side. We do not have a dialect for this, so everyone needs to use memref and my canonical form would no longer work for others. I agree that is a modelling issue but something we have to live with in the meantime.

In summary: I don’t want to change the general system that canonicalizations are tied to ops and only tried if the given operation is in play. But I’d like this to be more composable, so that I as a user of dialect X and Y can write patterns to canonicalize their interplay.

I completely agree. I don’t plan to remove the ability to define getCanonicalizationPatterns to handle the fully general case.

-Chris

1 Like

+1 on accepting any function with matchAndRewrite-like signature in place of a pattern.

I’m slightly against the AttachOp::canonicalize(AttachOp op, PatternRewriter &rewriter) variant

  1. It means that there are two ways to write canonicalizers – more cognitive overhead.

  2. I think it will tend to result in multiple canonicalizations being written in the same function, which gets confusing.

  3. My gut is that enough ops have >1 canonicalization pattern that the “simple” canonicalization variant doesn’t cover enough cases to justify its existence (maybe somebody can spot check the numbers on the % of cases that are “simple” vs not).

1 Like