[RFC] Allowing to set a default dialect to be used for printing/parsing regions

MLIR already allows dialect author to customize the printing/parsing of their IR with “custom assembly”. This allows to make the IR more readable in the context of a specific dialect, even though it can make it harder for someone unfamiliar with the dialect to read it. This exists to various in affine, linalg, or the tf (tensorflow) dialect.
This can be fairly intrusive, for example the tf_executor dialect has an island operation that wraps a region:

   %0 = tf_executor.island {
      %1 = "tf.Const"() {value = dense<1> : tensor<i32>} : () -> tensor<i32>
      tf_executor.yield %1
    }

However for the special case when the nested region contains a single operation, we print it as a one-liner:

    %output, %control = tf_executor.island wraps "tf.Const"() {value = dense<1> : tensor<i32>} : () -> tensor<i32>

Similarly we’re looking into pattern-matching MHLO reduction to collapse the region as a one liner, right now MLIR displays:

%106 = "mhlo.reduce"(%105, %93) ( {
^bb0(%arg14: tensor<f32>, %arg15: tensor<f32>): // no predecessors
  %107 = mhlo.add %arg14, %arg15 : tensor<f32>
  "mhlo.return"(%107) : (tensor<f32>) -> ()
}) {dimensions = dense<0> : tensor<1xi64>} : (tensor<512x128xf32>, tensor<f32>) -> tensor<128xf32>

when XLA HLO displays much nicely as:

reduce(f32[512,128] input, f32[] 0), dimensions={0}, to_apply=add

When developing a closed system like TensorFlow or XLA with MLIR, it makes sense to allow the author of such closed system to have a syntax that will make the most sense in the context of this system, even though it can raise the bar for someone unfamiliar with the dialect.
In any case, MLIR keeps the “escape-hatch” of the generic printer which is always available.

When MLIR started, most of the IR was written with operations in the standard dialect. It was considered that it wouldn’t be nice to have to repeat std. in front of every single operation in MLIR, since it was the main dialect it should be pretty obvious. With the same logic, when working on a “closed” TensorFlow IR, I have the same motivation to not repeat the dialect prefix either!
XLA developers enjoy reading reduce(f32[512,128] input, f32[] 0), dimensions={0}, to_apply=add and I’d like MLIR to allow the same power to compiler authors who adopt MLIR: we shouldn’t force them to chose between a subpar experience there or not using MLIR at all!

The patch in ⚙ D107236 Add a new interface allowing to set a default dialect to be used for printing/parsing regions makes the default prefix (currently std) configurable.

This will allow a tfg.graph (which models GraphDef 1:1 and only allows TensorFlow operation to be expressed without repeating tfg. in front of every single operation:

tfg.graph #tf_type.version<producer = 42, min_consumer = 33> {
 %placeholder, %ctl = Placeholder : () -> (tensor<*xi32>)
  %AddV2, %ctl_0 = AddV2(%placeholder, %placeholder_1) : (tensor<*xi32>, tensor<*xi32>) -> (tensor<*xi32>)
  %placeholder_1, %ctl_2 = Placeholder : () -> (tensor<*xi32>)
}

I don’t anticipate any change for dialects in-tree right now with this new feature: they aren’t the target here.

The only candidate might be the SPIRV dialect which is already “closed” (no other ops than SPIRV are allowed in a SPIRV module). It could look like:

spv.module Logical GLSL450 {
   func @do_nothing() -> () "None" {
     Return
   }
   EntryPoint "GLCompute" @do_nothing
   ExecutionMode @do_nothing "ContractionOff"
}

Note: I’m not proposing to change the SPIRV dialect here, just for the MLIR parser/printer to have an interface allowing other compilers to have slightly more control here, so that we’re not facing side-by-side comparisons between something like MHLO and XLA-HLO where MLIR is unnecessarily more verbose.
I also believe that this extra customization is much less intrusive and less confusing for users than anything we already allow (and do) with the custom assembly.

Something else that can be done as part of this transition, is to remove the hard-coded std. dialect dispatch in the parser, making the std. dialect (for what remains there still for now) on equal foot with all the other dialects.

2 Likes

We’ve (Mehdi and I) discussed this in the past a few times, and I am +1 in general. Such a feature is quite useful when you are building out a domain specific compiler, especially when (as you mention), there won’t be inter-mixing of a wide variety of dialects.

I would actually like to consider using this for PDL/PDLInterp, which is similar to SPIRV (and the other dialects you discuss) in that no other dialects can be intermixed. I’ve been staring at a lot of pdl dumps/writing a lot while prototyping out-of-tree frontends and having something like this would make that quite a bit easier/more fun.

Big +1. I’d love to kill that.

– River

1 Like

In the examples above, you use f32 without a prefix, what if my dialect/dialect my dialect uses the types of another dialect primarily can I then skip prefixing there? Or do I need to chose between skipping prefix of my dialect or prefix of dialect whose type I use? And what happens if they conflict - e.g., what if TFG had a f32 type?

How does error reporting/dumps look in this case? A concern mentioned was ambiguous ops, which is more of an issue with errors than probably full dumps (as we don’t have context).

This feature would only be for operations. More specifically, we currently have no support for eliding namespaces from anything other than operations and would need to make fundamental changes to how things work for attributes/types to change that.

– River

The elision only affects the textual form (when printing in the custom/pretty form, and only inside of a region that explicitly elided the namespace), so it shouldn’t be any different than how error dumps look today for std/builtin operations (i.e. you should still see the full name in every place that matters). It also only applies to the textual form in specific contexts, where the ambiguity is limited/non-existent (if you print the operation by itself, outside of this context, it will still print with the full name).

I look at this as a pure refinement and elevation of what we already do today for builtin/standard operations.

– River

1 Like

+1 on this. This also provides more incentive for folks to define their own “module” and “func” ops which is a direction we are headed anyway.

QQ: does this apply recursively? I.e.

foo {
  bar {
    // If bar does not implement the default dialect prefix interface,
    // but foo does, then within bar does the default dialect "reset" or does it
    // "inherit" from foo?
  }
}

IMO, no. We should only elide in immediately nested regions. We don’t want the elision to carry any further than explicitly specified, and having things be implicitly recursive is what helps breed ambiguity and confusion.

– River

1 Like

+1 Looks nice! Thanks Mehdi! This indeed could be a great clean up over ops nested inside a spv.module! I’ll adopt this afterwards. :slight_smile: (And I can probably change to use the more proper spirv prefix given there are less duplication!)

1 Like

Yes right now I’ve been only looking at operations, types/attributes are another can of worms entirely.
Right now we still have builtin as the default for types, and changing this would make it so that everywhere we’d read !builtin.f32. But maybe this is just something we should get used to. That’ll be another RFC, for another time… :wink:

It’s pretty common for standard, math, and memref dialect ops to be carried in large regions/bodies. I’ve been a -1 on this change given that eliding the dialect prefix based on the surrounding op makes the IR harder to read – it wouldn’t be clear from the file itself as to which dialect’s op is being referred to. The regions can be large. Customizing the dialect prefix part this way isn’t also good for uniformity I think. Having the textual form display one thing in a specific context (elide dialect prefix when printing along with its parent op) and print another thing in other (dump the op by itself) doesn’t also sound great. Is the only benefit that it makes it a bit easier to hand-write MLIR saving a prefix? But that’s a relatively small savings - you still have the operands and types, etc. to type out - the IR is verbose for other good reasons anyway.

I don’t yet see the benefit this brings for the complexity, context-sensitivity, and non-uniformity it adds to the eco-system when looking at things globally (developing across several dialects). For eg. this change would make it possible (it’s a choice) to custom print IR like the following (which I think we should disallow by design just like the parser prevents certain syntax customization):

scf.for
  for
    for 
      scf.if (..)
        scf.for 
          if (..)

or

scf.for
  affine.for
    for .. {
      scf.execute_region {    
        for ...
        }
      }
...

@_sean_silva I didn’t understand what you meant here. This is a print elision for the custom form - how would it impact a decision on creating a func/module op for a dialect? In fact, it’d be pretty deceiving for a reader to see func/module appearing in the IR not knowing whether it’s a builtin.func op or some_dialect.func op because unless you know the context (look above the op) and you know from the codebase (or prior knowledge) that the dialect chose to elide prefixes, you won’t know which func op is being referred to. This is the reason I’m against this proposal for now. For module itself, since it’s typically going to be the root op of your dialect, I don’t think you can elide the dialect prefix at all - even with the proposal here - @mehdi_amini can clarify.

Your own custom module/func ops would let you opt-in to this bonus readabilty benefit in the context of self-contained systems where this makes sense (well, I consider that this would be a readability benefit where it makes sense, it seems you disagree on that point)

I think this depends on the context. We all come into this discussion with our own biases based on the IR we “typically” read. I’m pretty convinced by Mehdi’s motivating example here, and my own experience with the npcomp torch dialect and various IREE dialects leans me towards thinking this would be useful to have in general. (why should IREE’s vm dialect or the llvm dialect, which tend to have very short ops and are very self-contained, need to prefix when analogous std ops do not?)

I think the coherent altenative here is to entirely prohibit elision of the dialect prefix for all ops, which seems to introduce an unnecessary degree of second-classness to compilers build with the MLIR system. If we are going to keep the dialect-prefix-omission feature at all, this proposal seems like a good way to control it.

Also, I think we’re seeing a trend towards more dialects as self-contained layers (including their own func and module-like ops) with reasonably strong invariants about which ops their regions may contain, away from the “grab-bag mix of dialects at any point in the IR” cambrian explosion we are emerging from. In that world we’re trending towards, it’s less confusing to omit the prefix, and also more useful.

Doesn’t a similar thing exist for the builtin dialect as well and would it be similarly removed there too? If yes, won’t it force a module -> builtin.module everywhere one has module currently? But func would stay the same if it’s immediately nested in a builtin.module.

I personally don’t see how eliding your custom dialect suffix on your dialect’s module and func ops, which are going to be a handful in the common case, improves readability. If at all, it hurts readability not distinguishing it from a builtin.func. Also, IIUC, you can’t elide the dialect prefix on the root op of the IR. (It’s only in the immediately nested block of a region holding dialect op.) It further hurts when your dialect reuses any “core” region holding op abstractions from one of the in-tree dialects like std, builtin, or scf. You won’t get your dialect’s prefix elision within such ops because the immediate surrounding parent op has changed. I have a feeling you are mostly focusing on the scenario staying fully contained in a single dialect as opposed to a well-defined but still small mix of dialects. (I added an example on the post above intermixing scf and affine ops which is for example a realistic use case as well.)

I don’t see anything bad in eliding the prefix just on builtin and std dialects. Earlier, the latter had ops that were widely and pervasively used on lowering paths - most CPU and GPU code generation went through those because nearly all of those have had a lowering to the LLVM dialect. These dialects’ op names also do not require a prefix to the name for context: this is perhaps also true for all the math dialect ops but not for the memref dialect ops as you already know. Depending on how std changes, the option of either forcing the std prefix on the remaining ops or merge it with the builtin dialect perhaps also exists.

Yes, but even with those invariants on what ops regions will have, you could have ops from a mix of dialects: eg. math, memref and std dialect ops could be in one’s dialect’s region holding op. I don’t see how that’s an argument in favor of elision - you still run into poorer readability and ambiguity. (I’m not referring to any unconstrained arbitrary mixing of dialects or any fuzz testing scenarios.)

Yeah, this is the main situation where this feature is useful. No one so far has suggested using this feature in any other case (AFAIK), where readability problems may occur. Any use of this feature should still have the same amount of scrutiny that is placed on other features, but the use cases people in this thread have in mind do not suffer from any of the problems you mention.

It feels like you are anchoring very strongly on one very specific workflow and use case, which is not the point of this feature. In all of the use cases that have been described here, the problems you are describing don’t exist. For example in PDL, it isn’t possible to use any other dialect aside from PDL inside of a PDL region. Everything is contained in such a way, that the pdl parts don’t inter-mix with anything else. There is no possible point of confusion, because there is no possible way to use anything from any other dialect. When working on a PDL compiler (inside of pdl regions), developers don’t need to think about the “what if” situations, because they aren’t possible by design.

This is part of the previous point. The use of std is only pervasive in a specific workflow, which many do not ever use anything from std. For example, a pdl compiler and workflow won’t ever see or interact with an operation from std (or related “core” dialects).

This is the point that potential users in this thread having been trying to make, but for other dialects than the ones you mention (the dialects you mention, e.g. std/math/etc are the ones where this feature generally doesn’t have an obvious win or benefit). From the perspective of specific compilers, the context of the operations is immediately obvious based on the context. Inside of a pdl region, you already know that you can’t have any other dialect.

– River

Can you clarify how is his relevant to River’s comment, for example where would there be confusion with respect to standard, math, and/or memref dialect op if we were to elide the pdl prefix in a PDL pattern?

Also, even if there are carried over in large regions, their prefix would be explicit, so I’m puzzled why this feature would interact at all with this.

MLIR is already non-uniform because of custom assembly printer, and I see the custom assembly as much more intrusive in how it changes the syntax than eliding the prefix.

I argue that the parser is already extremely liberal in what it allows, and you can implement a terrible syntax for scf, affine, etc.
If your main angle of argument is that “someone may do something not good with the feature”, then we may not be able to reach an agreement here: I don’t see this as a valid line of argumentation and almost contrary to what MLIR allows (hey we allow people to design their own IR, but they may do terrible choices).

If the for inside the scf.for is an scf.for then it means that the scf.for elides the scf prefix which would elide it the scf.if as well. If the scf.if would also elide the scf prefix you would instead have:

scf.for
  for
    for 
      if (..)
        for 
          if (..)

Assuming this is the case, then why wouldn’t we allow the optimal solution for such cases? Why wouldn’t we allow a TensorFlow GraphDef that will not include anything else than TensorFlow to elide the prefix? What issue do you see with PDL pattern to elide the pdl prefix?
You bring a lot of possible bad design to show how this feature can be misused, but that’s not an argument against how useful it can be!

1 Like

I think you are missing the analogy I was making w.r.t. the parser. For eg. cf. doc comments from upstream:

include/mlir/IR/DialectImplementation.h-/// The DialectAsmParser has methods for interacting with the asm parser:
include/mlir/IR/DialectImplementation.h-/// parsing things from it, emitting errors etc.  It has an intentionally
include/mlir/IR/DialectImplementation.h:/// high-level API that is designed to reduce/constrain syntax innovation in
include/mlir/IR/DialectImplementation.h-/// individual attributes or types.
include/mlir/IR/OpImplementation.h-/// The OpAsmParser has methods for interacting with the asm parser: parsing
include/mlir/IR/OpImplementation.h-/// things from it, emitting errors etc.  It has an intentionally high-level API
include/mlir/IR/OpImplementation.h:/// that is designed to reduce/constrain syntax innovation in individual
include/mlir/IR/OpImplementation.h-/// operations.
include/mlir/IR/OpImplementation.h-///

Similarly, I felt we should avoid custom ASM print innovation in the interest of simplicity, readability, and avoiding confusion. Doesn’t this analogy make sense?

The parser had this comment from early-on (July 2018!!), but I don’t think this is as much of a limit as it was originally thought of when this was written: it evolved and really offers a significant amount of “syntax innovation” in its current form (and we got experience with this, as shown in some of the example around there).
The real strong limit that the parser has left is that new SSA value must be on the LHS of an assignment before the op name. (I was in favor of lifting this early on by the way, allowing custom dialect to define their DSL with something like %a op %b -> %result if they wished so).

Anyway, to come back to my comment:

You don’t believe that the freedom currently offered by the ASM parser does not allow exotic and potentially terrible syntax?
Do you think that people reading the tf_executor.island single line syntax know obviously that there is a nested region with a terminator? What do you think of people who complained that the syntax of linalg or affine required them more effort to get started with the project instead of a uniform generic syntax?

And there is the additional argument of whether scf should even opt in to using this feature in the first place. I think given how it is typically used, we would probably not apply this feature to scf ops, so this example is fairly moot. Overall, I don’t think we would apply this feature to any of the ops that Uday mentions as problematic (indeed, it is fairly problematic for them).

1 Like

Yeah I didn’t mention it, but Sean is right: this was very theoretical as this feature does not intend to be used by scf.for and in similar situations.

To get back to the “contextual” aspect of the IR that seems to be an issue for Uday, I’d like to demonstrate that this ship already sailed with custom assembly, two more example:

  %0 = ...
  test.isolated_region %0 {
    "foo.consumer"(%0) : (index) -> ()
  }

The two uses of %0 in this IR example are not the same SSA value. The first one is unsurprisingly the visible definition %0 = ... but the second one is actually a hidden block argument of the inner region.

Another example is the tfg dialect which defines its own function operation:

tfg.func generic @gfoo(%arg0 : !tf_type.tensor, %arg1 : !tf_type.tensor) {

This function has one particularity: it has twice as many block arguments as you expect (so 4 block args in this example): for each argument it defines an extra argument to express possible control dependencies on the argument. Inside the function body, it can be referred to by suffixing the argument name with .ctl, for example:

  %add, %ctl = tfg.AddV2(%arg0, %arg1) [%arg1.ctl] : (!tf_type.tensor, !tf_type.tensor) -> (!tf_type.tensor)

Here %arg0 refers to the first block argument, %arg1 to the third block argument, and [%arg1.ctl] to the fourth.

1 Like