Custom/extended builders for Python bindings

@ftynse @joker-eph

I’m starting to hit up against the limitations of the default builder that we ODS generate for the python dialect wrappers, and before spending any time thinking about it, I wanted to see if we had a design in mind already for it.

I feel like to get any real generality, we are just going to need to go all in and implement a C+±side generic builder that can match the signatures properly. I could see this being exposed to the Python generated code as something like:

  _ods_cext.def_builder( ... need to figure out what goes here ... ),
  _ods_cext.def_builder( ... need to figure out what goes here ... )
class BatchMatmulOp(_ods_ir.OpView):
  OPERATION_NAME = "linalg.batch_matmul"

The add_opview_builder decorator would install an appropriate init method on the class and could also augment the documentation systematically.

Right now, we generate some relatively… twisty… python code for materializing the default builder. I feel like in order to have a hope of efficiency (relatively – this is still python) and error messages, we just need to implement this facility natively. That way we can build a pretty efficient pattern matcher that uses normal C++ types and avoid a lot of dynamic dispatch (paying the cost at dialect load time instead of per op instantiation). Also, I just really don’t want to be tablegen generating a non trivial search of a builder pattern match tree for Python :slight_smile:

Also, I was hoping that someone had an opinion about how we should map region based op builders. In Python, I’ve found it moderately nice to just instantiate the regions on the op and then access them by a symbolic name to keep building (i.e. versus the C++ style of passing a callback).

For reference, I have generated the python wrappers for a handful of core dialects: MLIR Python Dialect Bindings · GitHub

If there’s more than one way of initializing an op, you’d still have to dispatch somewhere. Are you thinking of doing it on pybind11::object in C++ code?

For simple cases, I would agree with named accessor to the region + current insertion point context manager mechanism. Maybe with an extra mechanism for creating blocks in the region.

There may be more complex cases where the (non-default) builder creates a region, initializes its entry block and passes block arguments in some semantically-meaningful way to the callback, or where the builder expects the region to have specific properties in order to complete, e.g., have a terminator to infer the op results. The former can be still exposed as op properties for delayed region construction, e.g. loop.induction_var, loop.carried_args and used as normal values. Not sure we can get away without callback for the latter.

I wasn’t sure yet – it is similar to the way that pybind11 does overload resolution, but it is probably not close enough to reuse the exact mechanism (without a lot of template foo at least). So maybe implement a similar mechanic on the c++ side.

My meta question was more: do we want to go there and support the non default builders as overloads?

I suppose one issue is that these all exist without qualification on the current builders, so it might not be easy to only map the ones that make sense for the language.

Related question, since you’ve spent the most time looking at this: do you think we’re going to get there without some language specific oss builder declarations? Do you think the current builder ODS should be sufficient for what we want to do?

From what I understand (spent 10 minutes tops staring at the code), overload resolution in pybind boils down to a dynamic dispatch based on the actual Python type arguments using Python C API. Here are the actual “cast” calls pybind11/cast.h at 5469c238c878193bbfce1a67ab28db2508ef5a41 · pybind/pybind11 · GitHub (these get called through a bunch of variadic template dark magic) and here is the driver pybind11/pybind11.h at 98f1bbb8004f654ba9e26717bdf5912fb899b05a · pybind/pybind11 · GitHub. So far, doesn’t look there is much optimization happening there, just a linear search over all possible overloads.

I don’t think I have a strong preference or incentive for going one way or another. There is undoubtedly an issue of having to respect all the op invariants in the generic builder, which is usually taken care of by the custom builder. There is also a tendency for delegating more and more stuff to ODS instead of writing custom builders (result type inference, region callbacks), which can help us generate easier-to-use default builders. I am also annoyed by the lack of common patterns in custom builders - every dialect does a different thing - and frustrated by our inability to have a better way of creating operations than looking at ODS in a separate window and guessing what argument types are necessary. So I am secretly hoping to get a better interface in Python and push that back to C++.

There aren’t that many ops with regions, and we already have some qualification with SingleBlockRegion/SingleBlockImplicitTerminator. That being said, because there aren’t that many ops, we can also special-case the relevant ones.

I have been worried about using C++ types directly, we’ll need some way of defining the match between those types and language-specific types (which we already do in llvm-project/ at main · llvm/llvm-project · GitHub), and we can get some clashes. For example, if we had a builder that takes a float and a builder that takes a double, it’s unclear which one to call from a Python float. Other than that, it sounds feasible with some minor annoyances like understanding that DenseSet<int> & and ::llvm::DenseSet<int>& are the same type.

This has been my intuition as well. Given that, do you have a preference for the way we special case them? I’m partial to having a place that someone can write a blob of python code to cover the corner cases like this. Should we add such a blob to the main ODS (ie. A python_decls on the main op class on tablegen), or create a new record type in the binding specific td file that “extends” ops with python specific extensions and add it there (ie. A PythonOpExtension tablegen record for any ops that need special care.

I suspect the latter is the way to go, but I really only know enough about tablegen to put those words together and am looking for some confirmation that I may be looking at this the right way before spending time on it.

Yeah. To be honest, I’ve been relatively happy with most of the default builders. I feel like what is there for custom builders is tempting to latch on to but it’s really c++ specific at the end of the day: since they are done for a mix of c++ ergonomic reasons and as a way of making things more correct by construction in that setting, it seems unlikely that just blindly applying them to python is going to yield something that is nice to use.

Ok, scratch that.

I don’t think it’s reasonable to put Python-specific blobs in the main ODS although it may cause issues if the main ODS is updated and the bindings are not. At scale, I wouldn’t expect anybody doing upstream changes to also update the bindings for all the languages we might eventually have. My preference would be for just writing Python functions that call default builders or Operation.create and injecting those into the specific op class when loading a module. This can be also done by having an additional layer of tablegen that ties ODS defs with Python blobs, but I am not a fan of writing executable code inside strings.

Yeah, I was thinking more about making optional / default-empty things truly optional through keyword arguments and that kind of stuff. How about we have the possibility to define custom builders in Python and see if we reach more than 10% ops that need them, at which point we can consider generating them automatically? We will also have gotten examples of what is actually necessary from custom builders.