[MLIR] How do I link an external C++ function for an operation in an MLIR file?

I need to be able to call an external C++ library function for every operation encountered in an MLIR file. Specifically, I have an MLIR (mhlo) file with conv2D operations and I need to call the corresponding tf function. I did see some code snippets in which the functions were defined using C++ “extern” method. But I’m confused as to how to get the function names for the corresponding op in MLIR file and how the linking is done.

Any help would be greatly appreciated, thanks!

Hello - your question cross cuts several areas so apologies if this isn’t the tack you are looking for.

In general, in MLIR, ops can be anything and they get their meaning by what you do with them (ie. How they are transformed). Generally, this is referred to as “lowering”. So while some dialects (collections of mlir ops and types) are designed to be backed in a fairly 1:1 fashion by library calls of some kind, mhlo specifically primarily exists as a gateway to emitting machine code (ie. Codegen) to for an entire function consisting of mhlo ops (i.e. it emits machine code directly for performing the linear algebra).

There are multiple lowering paths from mhlo to corresponding machine code (linalg, affine, others) and the relevant pass pipelines for these are generally in the Transforms directory. Note that some of those paths may have facilities for stopping short of emitting machine code, electing to call library functions instead (ie. Blas routines or equivalent) but that is a matter for the level below mhlo (and it sounds like it isn’t what you are after).

It is possible to write transforms from mhlo to things backed by library calls of some kind, and I know it’s at least one example that does so – albeit primarily for testing/validation. Typical mhlo programs executed in such a way may not be the most efficient because naively mapping mhlo ops to function calls will result in quite a bit of extra materialization of data because the op set was designed to be implemented in terms of something that can do loop fusion, eliding the redundancies (whereas a naive translation would have a hard time eliding them).

It sounds like what you are actually after may be a higher level op set that is isomorphic with a runtime library such as tensorflow’s tf dialect (which is 1:1 with tensorflow kernels) or npcomp’s aten dialect, which is 1:1 with aten entry points.

Perhaps you could describe more what you’re trying to accomplish versus the mechanism?

1 Like

It has been a while for this question. So are there any progress. I am working on this for a week. But also need a demo to demonstrate how do you write a pass that can covert other dialect to form of calling external function.

Sorry for the delay, I had not seen this when originally posted.
TL;DR is that there is no progress in core and no progress planned in the short-term that I know of.

As far as traditional ML operations are concerned, the topic has also been discussed here: Lowering optional attributes in Linalg StructuredOps to Standard dialect.

The most important issue is the ABI one: the C++ and MLIR representation of the data type have to agree in consistent ways in all cases. This is one of the main reasons the MemRefDescriptor exists.
The second big topic is MLIR attribute <-> MLIR struct <-> C++ struct.

MLIR core does not yet provide any support for this; we are starting to touch IDL land. The project that does this consistently and that I am aware of is this one. Maybe at some point, @whchung and colleagues should present their work at an ODM and consider upstreaming some of their project.

Linalg has a simple rewrite pattern to lower an op to a library call, it does not support attributes atm. For the specific question of function name, there is generally a name mangling procedure. For Linalg it is here.

Regarding examples, here is one non-trivial interop example that prints a memref of 2-D vector using the print_memref_vector_4x4xf32 whose definition lives here. You can see how at runtime this uses the option -shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext and do some variation of dlopen: you can just reuse one of the mlir-xxx-runner binaries or build your own.

This is a just simple example to test the ABI and “C++ calling MLIR calling C++” works.
You will likely want to spell the name mangling, C++ shim that connects to your lib implementation and ABI (e.g. unranked memref): this all depends where you want to put the switches that inject static information (e.g. fixed size along some dimension, data type, rank, others).
There are many ways to evolve all of this and spell it out properly in an extensible fashion that will also be more mindful of future data types etc.

However, this is quite low-priority for us in the grander codegen vision (i.e. for the few things we will really need it will be easy to build one-off solutions with small variations on top of the existing mechanisms).

1 Like