Julia C API bindings - simple op dump segfault

Hi! I’m writing a wrapper library in Julia to the C API but I’m running into issues dumping MLIR out - so I’d like to ask the more experienced for some advice/any immediate things which stand out in my reasoning.

test4 = () -> begin
    println("\n---- TEST 4 ----\n")
    ctx = MLIR.IR.create_context()
    loc = MLIR.IR.create_unknown_location(ctx)
    func_state = MLIR.IR.OperationState("func", loc)
    func = MLIR.IR.Operation(func_state)
    MLIR.IR.dump(func)
end

I have a small test here which creates a context through the API call mlirContextCreate(), then setups a “blank” operation state through mlirOperationStateGet and an operation with mlirOperationCreate.

I would expect a mlirOperationDump of this to work - but it segfaults.

Is this something I’m doing wrong perhaps? Any ideas?

My initial suspicion is that there are “default” things I need to initialize in the state which I’m missing (e.g. operands, results, attributes, etc) before called dump but I’m not sure.

Edit: for the interested, bindings package is here.

dump and printing in general rely on the operation being valid, as does most of the functionality. The op you are creating is not, FuncOp requires a bunch of things – symbol name and visibility attributes, type, etc., – which the code does not add to the state.

It is still possible to print invalid ops in the generic form by configuring MlirOpPrintingFlags accordingly –https://github.com/llvm/llvm-project/blob/3bf7d47a977d463940f558259d24d43d76d50e6f/mlir/include/mlir-c/IR.h#L282 – and by calling *Print rather than *Dump.

As a side note, the C API is intended for bindings to provide higher-level language-specific IR building constructs, in particular by leveraging ODS. End users should not be concerned with OperationState.

1 Like

@ftynse Hi Alex - thanks for the comments! I ended up getting this working - one deeper issue I eventually ran into was that I was working with a version of the API which used StringRef in place of Identifier in constructing NamedAttribute instances.

If I understand correctly, the C API actually supports the construction of operations which are not registered to a particular dialect - but this is not the recommended usage of the C API, instead a user should use an ODS dialect backend.

If I did try and construct a dialect through the API directly - is this possible ? I suspect that it would be difficult to support pattern rewrite rules and conversion through the API, but I haven’t investigated this fully. I imagine that this is a difficult use case to support - but I am curious as to what can be controlled through the high-level wrappers.

Registration is mostly orthogonal to construction. When a dialect is registered in a context, it becomes possible, in this context, to call operation hooks (printing, parsing, verification, etc.) opaquely. That is, Operation::print will be dispatched to MyOp::print by the infrastructure. Without registration, it is still possible to call MyOp::print directly, and it may still assert/segfault if the invariants of the op are not maintained. Same for builders, it is possible to call MyOp::build() to populate the OperationState without registration. It is also possible to populate OperationState manually and call Operation::create. This is not specific to C API.

That being said, C API does not expose MyOp::build equivalents for individual ops. The idea so far is for each language to generate the code similar to what we do in C++ given the ODS definition.

Patterns are a different story. I see essentially two kinds of patterns: those defined in DRR (tablegen) that can be generated and reasoned about, and those involving lots of C++ and interaction with PatternRewriter. The latter is quite complex and I am not convinced that we want to expose it to the bindings. There’s also PDL that lets you express patterns as IR, which you can parse and execute using mostly the parts of the API that are available in the C API.

That being said, C API does not expose MyOp::build equivalents for individual ops. The idea so far is for each language to generate the code similar to what we do in C++ given the ODS definition.

Just to be sure I understand this - this comment means: each higher language package API which wraps the C API would take over the pipeline (ODS -> C++ templates) (by this, I mean - not directly to C++ templates, but to the same effect) ?

I think I understand your above comment better now - end user’s of the package should not be exposed to OperationState - I agree.

There’s also PDL that lets you express patterns as IR, which you can parse and execute using mostly the parts of the API that are available in the C API.

Is there an area where I can read about PDL ? E.g. I found this thread: PDL: Dialects for Representing and Transforming Pattern Rewrites but was curious if there is documentation.

Yep.

PDL is still work-in-progress. https://mlir.llvm.org/docs/Dialects/PDLOps/ is the actual state, and you can click links in the post you found for more information.

The approach taken in the python bindings is to always verify before printing and automatically use the generic form to avoid segfault in such cases: https://github.com/llvm/llvm-project/commit/1c2159494d07

I was skeptical of this approach when it went in, but fwiw, it saved me a lot of time debugging once I actually used it. It works well for default printing cases – where something is better than nothing. We may also want a hook at some point to raise an error if the verification fails (ie. For tools and such that should fail eagerly).

Thanks for comments here - will create an issue to track the correct usage of verify on the bindings repo.

@joker-eph I can also describe a bit more some of Julian motivations for playing with MLIR:

one high level motivation for exploring these bindings: there are a number of researchers in the Julia community who are engaging in “soft” compiler research design (e.g. high level AD or probabilistic programming IR design) but want a little more space to move than what is provided by the standard compilation pipeline. (See e.g. Brutus for Julia with an MLIR phase).

There exist mechanisms to manipulate the Julia IR from stable non-compiler dev user land, but they are not fully exposed out of the compiler yet. In addition, the Julia IR is not designed to be extensible or to model semantics which are not intended by the compiler team.

I’m a probabilistic programming researcher - a number of my recent experiments have centered around constructing abstract interpretations to statically verify properties of models and inference algorithms. I’ve found that it’s difficult to express or verify the correctness of the transformations I write on Julia’s SSA form IR and would prefer to express a domain-specific IR which faithfully models the primitives of my languages. Hence, I’m exploring MLIR to determine if it’s suitable for these purposes (and a few others, which I’ll save for another discussion).

There are other users who are interested in designing domain-specific IRs for quantum computing. Similar philosophy and motivations - existing tools are not necessarily sufficient, it would be nice to explore MLIR as an elegant solution.

Despite my rambling, I think this presents a spec of reqs which I should check off:

  1. C API allows the usage of dialect ops (dialect can be expressed and expanded through TableGen).
  2. C API allows control over lowering to LLVM IR (conversions possibly written in C++/does this interact with the rewrite system?)
  3. C API allows the applications of passes (controlled by the pass manager, also from C API).

From the above conversations, it seems like these are supported by the current API - is my understanding correct?

This is roughly correct, with some caveats. In particular the C API is a work-in-progress, so while this is the intent it isn’t necessarily just ready yet.

I’ve lost count of the number of times I had to restart the debugging session because I tried to print an invalid op from within the debugger (especially with extensions that print some common MLIR entities automatically)… It would be also nice to make it possible to skip the error messages that the verifier prints.

I would like to hear more about these. Would you be interested by discussing these at the open design meeting? :slight_smile:

I might be out of date because of the end-of-year pause, but there was no support for translating MLIR to LLVM IR from the C API as of two weeks ago. The conversion from the Standard to the LLVM dialect is possible, it’s just a pass. (Note that we use translation to refer to transforming MLIR to another IR, and conversion to refer to transforming one MLIR dialect into another; conversions may or may not be lowerings).

The rewrite system is a completely orthogonal issue. Translations are not based on any rewrite system (MLIR-to-LLVMIR is basically a huge if/else chain). Most conversion passes are based on the pattern rewriting infra, but so are some other passes. There is currently no plan of exposing the internals of the infra to C API, but it is still possible to write patterns with it in C++ or DRR. PDL is supposed to be the next-generation tool for this, where patterns are written as another dialect and the rewriting process is yet another pass.

1 Like

@ftynse I’ve discussed the possibility of presenting on compiler design topics with a few people in the Julia community (specifically, Valentin Churavy (who I think you know) and Roger Luo (whose working on compilers for quantum computing)). I think there’s enough people who are interested to discuss some of the design points for ongoing projects at a design meeting!

Might I suggest early February? There’s a few more people I want to chat with.