[ODS] Support dialect specific content emission via hooks

Hi everyone,

Thus far op classes generated from ODS are closed: we can only have the same set of methods even for operations in different dialects. This is in contradiction to the openness of MLIR in general and is problematic for dialects that want to generate additional operation class methods programmatically, e.g., a special builder method or attribute getter method. Apparently we cannot update the OpDefinitionsGen backend every time when such a need arises.

So I think we should introduce a hook into the OpDefinitionsGen backend to allow dialects to emit additional methods and traits to operation classes according to their needs.

Specifically, introduce the following definitions:

using DialectEmitFunction =
    std::function<void(const Operator &srcOp, OpClass &emitClass)>;

// ODSDialectHookRegistration provides a global initializer that registers a
// dialect specific content emission function.
struct ODSDialectHookRegistration {
  ODSDialectHookRegistration(llvm::StringRef dialectName,
                             DialectEmitFunction emitFn);

Then in OpDefinitionsGen we can have

static llvm::ManagedStatic<llvm::StringMap<DialectEmitFunction>> dialectHooks;

ODSDialectHookRegistration::ODSDialectHookRegistration(
    StringRef dialectName, DialectEmitFunction emitFn) {
  bool inserted = dialectHooks->try_emplace(dialectName, emitFn).second;
  assert(inserted && "Multiple ODS hooks for the same dialect!");
  (void)inserted;
}

Using ManagedStatic for hooks to make the system extensible is already a commonly used approach in MLIR, so hopefully this should not be too surprising. I have a prototype at https://reviews.llvm.org/D72514. Please feel free to share your thoughts! :slight_smile:

What is the use case for this? Can you provide examples that would motivate this before we add this?

Also, this is intended to inject code in the mlir-tblgen but code we don’t want in the compiler otherwise, how is a dialect implementor supposed to achieve this?

What is the use case for this? Can you provide examples that would motivate this before we add this?

This feature allows us to inject traits and methods to the generated C++ op class according to a dialect’s specific logic.

@nicolasvasilache has one use case for the linalg dialect and he probably can comment more on it. My impression is that Nicolas wants to divide linalg ops into multiple categories and based on which category an op is, emit additional class methods pertain to the specific category.

I have one use case for the spv dialect, where certain ops will have minimal version requirement. I’d like to have those ops to have QueryMinVersionInterface::Trait and the interface implementations. I intend to leverage this feature to register an ODS hook to probe an op’s list<Availability> field to see if there is MinVersion<...> inside it. If so then put QueryMinVersionInterface::Trait into the parent class list and generate the implementation.

this is intended to inject code in the mlir-tblgen but code we don’t want in the compiler otherwise, how is a dialect implementor supposed to achieve this?

I don’t quite get what you mean here. Could you elaborate a bit? The hook will be invoked at compilation-time by mlir-tblgen and the C++ code are generated into operation classes.

Isn’t it something we could achieve purely on the ODS side? For example could the ops inherit from a base class (in TableGen) that could add the traits and/or the extra class declarations?

The question is not how the dialectHooks is invoked, but how is it populated?

Isn’t it something we could achieve purely on the ODS side? For example could the ops inherit from a base class (in TableGen) that could add the traits and/or the extra class declarations?

That means we will have lots of duplicated information in ODS for an op. For example, for a SPIR-V op I can have four availability controls: MinVersion, MaxVersion, Capability, Extension. Instead of having an op definition like:

def SPV_GroupNonUniformBallotOp : SPV_Op<"GroupNonUniformBallot", []> {
  let availability = [
    MinVersion<SPV_V_1_3>,
    Capability<[SPV_C_GroupNonUniformBallot]>
  ];
};

Where the availability interface traits and implementation can be inferred from availability field, one now need to write

def QueryMinVersionInterface : OpInterface<"QueryMinVersionInterface"> { ... }
def QueryMaxVersionInterface : OpInterface<"QueryMaxVersionInterface"> { ... }
def QueryExtensionInterface : OpInterface<"QueryExtensionInterface"> { ... }
def QueryCapabilityInterface : OpInterface<"QueryCapabilityInterface"> { ... }

def SPV_GroupNonUniformBallotOp : SPV_Op<"GroupNonUniformBallot", [
         DeclareOpInterfaceMethods<QueryMinVersionInterface>,
         DeclareOpInterfaceMethods<QueryCapabilityInterface>
]> {
  let availability = [
    MinVersion<SPV_V_1_3>,
    Capability<[SPV_C_GroupNonUniformBallot]>
  ];
};

Other ops may have different combinations of these four availability controls. So having a base class for each possible combination to “simplify” the final op definition also incurs lots of duplication (2^4 cases). I’d prefer to keep the ODS clean and having single source of truth and auto-generate as much as possible.

The question is not how the dialectHooks is invoked, but how is it populated?

Similar to other ManagedStatic registration, it relies on declaring a global static ODSDialectHookRegistration in a file compiled together with OpDefinitionsGen.cpp into the final *-tblgen binary.

For example I can declare such a hook in SPIRVUtilsGen.cpp and got it populated into dialectHooks via

static mlir::tblgen::ODSDialectHookRegistration
    genOpAvailabilityImpl("gen-spirv-avail-impls",
                          emitSPIRVAvailability("spv", emitAvailabilityImpl);

Other than the DeclareOpInterfaceMethods this seems identical in terms of information isn’t it?
And so injecting the DeclareOpInterfaceMethods in the base class (SPV_Op) or just leaving this as is here seems fine to me.

Right, back to my question: how do you get this compiled in the tblgen binary is still what I’m after.

The tblgen binary is standalone and does not link any dialect. An out-of-tree user does not need to modify the MLIR tblgen binary today to use ODS.

You could have SPV_Op add those in ODS, that is why we have base classes for the ops, and so have class SPV_Op adding those or using “mix in” trait. Given the lazy evaluation semantics, one could even modify the traits of the op inside the class (that may make it more complicated, one would actually need to prototype it to see the cost/complexity). The point is not about specifics, but rather the general that this need not make adding ops much more complicated and that that complexity could be captured in the base class for the ops.

If we do look at the specifics, isn’t an example of something we want a general interface for and this is just a prototype for SpirV to get experience with approach before a general one is created? General versioning is the goal. So the complexity in the base class would only persist until then. Now of course, different folks might have different requirements, and in which case this amounts to two traits extra per op defined.

I think its good to highlight what could be done with this extension rather than custom TableGen backend (as we use for exporting tflite flatbuffers for example, we didn’t modify mlir-tblgen). I’d say my main concern would be that ODS’s C++ classes are not stable, I wouldn’t even call them an unstable API. So we might change 90% of them and this seems to open that more/provide less structure. And then it is less clear when they should use this vs create a new backend.

Other than the DeclareOpInterfaceMethods this seems identical in terms of information isn’t it?
And so injecting the DeclareOpInterfaceMethods in the base class (SPV_Op) or just leaving this as is here seems fine to me.

You could have SPV_Op add those in ODS, that is why we have base classes for the ops, and so have class SPV_Op adding those or using “mix in” trait. Given the lazy evaluation semantics, one could even modify the traits of the op inside the class (that may make it more complicated, one would actually need to prototype it to see the cost/complexity). The point is not about specifics, but rather the general that this need not make adding ops much more complicated and that that complexity could be captured in the base class for the ops.

Injecting it at base SPV_Op class means all ops are forced to implement all these interfaces. A large portion of ops (e.g., arithmetic ops) simply do not have any version/extension/capability requirement at all. That would be unfortunate to still generate all the interface implementation for all of them. It bloats the library and makes availability probing slower compared to just check isa<*Interface>.

Having DeclareOpInterfaceMethods on each of the op (or letting each op to override the base one, which is the same) is duplicating information. It would be tolerable if there is only a handful of such cases, but many ops are gated by versions/capabilities/extensions and cases can be quite diverse: op A may have max-version & capability requirement, op B may have min-version & extension requirement, etc. That means a large amount of duplication. I don’t think that goes with the general philosophy of ODS of trying to have single source of truth.

Besides, the availability field directly in a SPIR-V op definition is just for the opcode itself. If I have an opcode itself is available from, say, version 1.1 and the op has an enum attribute that has a value whose availability requires, say, an capability Shader, then the op should declare two interface, one for the min version, one for the capability. Now it’s all manual: whoever adding the op needs to write the argument list, and also check all argument’s availability requirements and figure out what additional interfaces are. It’s more tedious and error prone than letting an ODS hook to look at the op and its argument list to figure out what traits to add programmatically.

Right, back to my question: how do you get this compiled in the tblgen binary is still what I’m after.
The tblgen binary is standalone and does not link any dialect. An out-of-tree user does not need to modify the MLIR tblgen binary today to use ODS.

IIUC, the mlir-* tools are just binaries for MLIR core’s own use and we said other projects are free to define their own ones wherever suitable. For example, we have tf-opt in TensorFlow to use instead of mlir-opt.

My understanding is that, one can define its own my-tblgen like

add_tablegen(my-tblgen
MyOwnGen.cpp
${MLIR_SOURCE_DIR}/[tools/mlir-tblgen/OpDefinitionsGen.cpp](http://tools/mlir-tblgen/OpDefinitionsGen.cpp)
MyODSHook.cpp
)

and then use my-tblgen to generate stuff.

If we do look at the specifics, isn’t an example of something we want a general interface for and this is just a prototype for SpirV to get experience with approach before a general one is created? General versioning is the goal. So the complexity in the base class would only persist until then. Now of course, different folks might have different requirements, and in which case this amounts to two traits extra per op defined.

This is a good point. If the op availability mechanism is integrated directly into base Op and other definitions, I guess my use case can be solved too. The problem is mainly that the generated C++ class for the op is closed so adding traits pragmatically is not possible, if not touching existing OpDefintionsGen.

I think its good to highlight what could be done with this extension rather than custom TableGen backend (as we use for exporting tflite flatbuffers for example, we didn’t modify mlir-tblgen). I’d say my main concern would be that ODS’s C++ classes are not stable, I wouldn’t even call them an unstable API. So we might change 90% of them and this seems to open that more/provide less structure. And then it is less clear when they should use this vs create a new backend

A custom TableGen backend definitely takes full control of everything. But I think still we cannot easily have additional contents plus all the ODS generated contents for op classes, unless we want to duplicate all the logic already existing in ODS and modify the code in the new TableGen backend. I think that should not be a favorable approach.

I proposed this because @nicolasvasilache told me that he also has a use case. So I think this can be a general useful feature to have. But maybe Nicolas also has his own workarounds for things. :slight_smile:

I don’t believe this is correct. Some tools like mlir-opt are intended to be testing tool for the libraries.
mlir-tbgen is not a testing tool and is a standalone utility, as far as I know ODS was entirely based on the TableGen input file and didn’t involve a custom binary: so you’re introducing quite a paradigm change in the tool here.
At this point I rather see this change reverted and discussed first.

Sure, it is reverted now.