[RFC] Rebooting C APIs for core IR

Authors: @ftynse, @stellaraccident, @mehdi_amini

Context: [RFC] Starting in-tree development of python bindings

Introduction

Following up on several discussions, we propose to reboot the in-tree development of C APIs to access core MLIR functionality (generic IR creation, inspection and modification, Pass management). The APIs will focus on low-level “generic” components and will not be concerned with dialect-specific operations, types or attributes. They are intended as building blocks for further bindings in different languages, which are expected to use these core components and provide wrappers around them following language-specific idioms. APIs for dialect-specific IR components can be based, e.g., on custom ODS backends and will be subject to separate proposals.

Scope

Initial implementation will expose creation and inspection methods for Attribute, Block, MLIRContext, Operation, Region, Type, as well as for parsing and printing. Attribute and Type will be created by parsing from strings. Operations will be created using an equivalent of OperationState, e.g. a full list of operands, results, attributes, successors and regions. Further extension will include creation and inspection of Standard types and attributes, as well as guidance for exposing dialect-specific constructs.

Connection to ODS is out of scope and will be discussed separately.

Type Model

Core types will be exposed as type-punned opaque structures, i.e. empty named structures declared in headers and defined in a source files (hence the opacity) that are only transmitted by-pointer. They will actually point to the corresponding C++ object. Since the users of the C APIs don’t have access to the type definition, they will be unable to dereference the pointer and violate the strict aliasing rules.

Memory and Ownership Model

All IR objects are allocated and deallocated in C++, using appropriate mechanisms. C types come with two flavors: owning and non-owning, identified by their name. Objects of owning C types are produced by creation functions and are expected to either be transferred to another owned object, or destroyed from C. They cannot be inspected, but can be passed to the “getter” method that returns an equivalent inspectable non-owning object. Calls to any function that accepts an owning type except the “getter” function are assumed to transfer ownership of the object to the callee. It is impossible to construct an owning object from a non-owning object.

Example:

// Create two blocks that the caller owns.
mlirOwningBlock ownedBlock1 = mlirBlockCreate();
mlirOwningBlock ownedBlock2 = mlirBlockCreate();
// Get a non-owning view of the block for inspection,
// the caller retains ownership of the block. Cannot
// use ownedBlock1 with mlirBlockSize.
mlirBlock block1 = mlirBlockGet(ownedBlock1);
printf("%d", mlirBlockSize(block1));

// Attach block to a region. The ownership is transferred
// to the callee. Cannot use block1 here.
mlirRegionAppendBlock(region, ownedBlock1);

// Destroy the block we still own. No need to destroy
// ownedBlock1 because we don't own it anymore.
mlirBlockDestroy(ownedBlock2);

Note that C++ classes have implicit ownership semantics, except for constructing an operation with regions that takes unique_ptr<Region>.

Iterable Objects

Non-owning iterable objects that can be indexed can be accessed through a combination of two functions: “size” and “at(index)”, with names following the C++ names.

Example:

unsigned i, n = mlirOperationGetNumOperands(operation);
for (i = 0; i < n; ++i)
  mlirOperationGetOperand(operation, i);

Non-owning iterable objects based on linked lists can be accessed through a combination of “getFirst” and “next” functions, which return NULL to indicate no further objects are available in the list.

Example:

mlirOperation op = mlirBlockGetFirstOperation(block);
for ( ; op; op = mlirOperationGetNextInBlock(op))
  // use op.

Open questions

Naming scheme (bikeshed!)

Current proposal is to prefix all function and type names with mlir, leading to camelBack being used for both types and operations. Functions that logically correspond to methods on a given object are prefixed with the object’s type and take this object as the first argument, except for creation functions.

Examples:

  • Type: mlirOperation
  • Owning counterpart: mlirOwningOperation
  • Function taking an object of this type as first argument: mlirOperationGetNumOperands

Directory Tree and Build

Current proposal is to put C APIs under {mlir/include,mlir/lib,mlir/test}/mlir-c and compile as a separate target.

Rationale: by placing API implementations in a separate library, we make sure they remain non-intrusive to core C++ APIs (e.g., we don’t expect files in lib/IR to include headers from include/mlir-c) and can reuse common implementation details.

C API library can be built and tested by default since the C++ compiler can usually compile C and is already configured in the build.

Existing include/mlir-c

We are unaware of any users of this code, so it will be removed.

1 Like

This has been prototyped in https://reviews.llvm.org/D83310 with intentionally different naming scheme and directory tree to force us to take a decision.

I’m fine with the naming as you have it. An alternative might be to follow the scheme I set out (Bindings/CAPI), but the C-API is likely to be so core that giving it a top-level directory seems fine.

The awkward part comes in with respect to CMake ordering of includes. Since, in general, any of these bindings might depend on “anything”, they should be included after everything else in the lib directory. If we had one exclusion for such things by nesting all such things under lib/Bindings, it might be structurally cleaner as there would only be one special case at the top level.

When you say “cmake ordering of includes”, I think you’re talking about ordering of subdirectories in the cmake files… In general, this shouldn’t matter, unless you’re referring to variables such as MLIR_DIALECT_LIBRARIES. As long as you are only referring to targets, then order of add_subdirectory() calls shouldn’t matter. Or maybe you are talking about some other ordering?

Correct for targets. I’m referring to the need for this layer of API to do things like:

get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)

And that is add_subdirectory() order dependent.

This is really great Alex. I’d recommend that this eventually become a document in the docs directory. Some questions and specification suggestions below:

Thank you for explaining this, I am super +1 on a very clear naming convention for Get/Create this is great.

However, I don’t understand the second half: how can the behavior of a callee depend on the provenance of a pointer passed into them? mlirBlockSize should always be a non-consuming operation, and almost all methods end up being non-consuming. I think it would make sense to standardize the naming convention for things-that-receive pointers to include “Destroy if they consume” or not if they don’t.

Does this make sense, or am I missing something?

I think this has a similar conceptual issue: you are describing this in terms of a reference counting system or std::unique_ptr like system with moves - in these, when you pass off ownership your pointer get null’d out.

In reality, LLVM and MLIR don’t work like this. You need to keep track of whether you own something, but transferring logical ownership doesn’t invalidate your pointer. You can still pass it to something like mlirBlockSize.

Another naming question comes up in mlirBlockSize itself. Here you are omitting the Get, on the one hand, this makes sense because it doesn’t return a pointer and then you can use Get as part of the pointer management system. On the other hand, it is unclear what that function does without the Get. This is a minor point, but should be well described so we can ensure consistency at scale.

Just in terms of the nomenclature, I’d recommend separating these concepts into “Iterable and Indexed Objects”. That would make it more clear that these are two logically distinct things in the API, but of course you can iterate over an indexable thing.

This came up in a separate thread about coding standards, but what is the return type of mlirOperationGetNumOperands going to be? It shouldn’t be unsigned despite the bad practices I have historically promulgated - should it be size_t or ssize_t?

+1. I think that consistency here is far more important than beauty. These things can be wrapped by people who don’t like them.

+1 to your other points as well. Great job on this!

-Chris

I’d be surprised if the proposal would need to depend on any Dialects? I assume (perhaps I’m wrong) that the focus is on exporting APIs from lib/IR primarily?

@ftynse currently has this in his patch. I agree that in the near term, this likely does not need the dialect libs (or conversion libs) dependencies. However, we will likely need a C API to “register all dialects” and “register all conversions” before we get too far.

Makes sense. I thought we already had that mechanism, but mlir::registerAllDialects() is defined in a header so that the linkage problem doesn’t happen and it’s in a namespace so it would need a tweak to work for C.

Let me rephrase this as follows. The type of the argument indicates whether or not the ownership is transferred. A function that takes ownership will take a mlirOwning<Type> object, a function that does not will take a mlir<Type> object. Thus the ownership transfer is visible from the function signature. The only exception is mlir<Type>Get. Maybe we should call it mlirOwning<Type>Get.

So the behavior doesn’t depend on the provenance, but on the type. And since there’s no overloading in C, there is only one behavior per function.

It is non-consuming, as indicated by its signature: void mlirBlockSize(mlirBlock). So are other functions that take mlirBlock. Because of type punning, it is actually impossible to pass an object of type mlirOwningBlock to mlirBlockSize. One has to explicitly “get” the non-owning block. Functions that want to consume the block will look like this: void mlirBlockDestroy(mlirOwningBlock), with a different type.

It is intentionally resembling unique_ptr, but C won’t let you make the pointer null on the callers side. Well, unless you pass a pointer to pointer, but that looks unnecessarily clunky, especially for lists of such things, e.g. lists of regions to attach to an op.

MLIR is a bit less consistent. One example that even the prototype touches is constructing an op with regions that takes unique_ptr<Region>. Another is OwningModuleRef, and there are more. This ownership model will have to be reflected in the bindings, so I just went all the way through, instead of saying “module and region have unique-ptr+raw-ptr model, operation and block need manual tracking”. It would be interesting to reconsider that in MLIR proper.

Not if it has a different type. mlirBlockSize won’t take an owning pointer.

Only mlirBlockGet(mlirOwningBlock), literally that spelling, does not take ownership of its argument. Any other function, regardless of the name, takes ownership of its mlirOwning<Type> arguments and does not take ownership of its mlir<Type> arguments. Consider void mlirRegionAppendBlock(mlirRegion, mlirOwningBlock). It does not take ownership of the region, but it does take ownership of the block, and transfers it to the region. It sounds awkward to encode such things in the name, ultimately you’d need as many “ownership tokens” as the function has pointer arguments and a way to unambiguously map them. mlirRegionKeepApendBlockTake? mlirRegionAppendBlock(__mlir_keep mlirRegion, __mlir_take mlirBlock) ?

Thanks, I was looking for proper terms!

I would argue that it should be whatever MLIR gives. Technically, casting unsigned to size_t should be fine, the inverse can be problematic.

The proposal does not, at least not with this scope. The tests do, and I intentionally put them in the wrong place to force the directory tree discussion :slight_smile:

However, we will eventually consider some dialect-specific bindings, at least for the Standard dialect.

MLIR appears to have followed LLVM’s bad practice here - using unsigned is bad for code size (of the compiler) performance (of the compiler) and generality on 64-bit hosts. It would be far better to use size_t or ssize_t here. @River707 do you have any opinions or thoughts on this?

-Chris

I’d be pro ssize_t here (given I’m also pro [llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior." :slight_smile: ) and changing MLIR to correspond SGTM.

IMO we should use whatever is most recommended/best. For the user facing API, I would say this is either ssize_t or size_t. For some internal storage, we need to use unsigned for storing some values due to bit packing and other storage saving techniques. As long as we are pragmatic about what is best for each use case, instead of “always use int over unsigned” and vice versa.

– River

Online discussion yesterday resulted in an alternative proposal: reflect ownership transfer in the naming convention instead of the type system. It can reduce the number of types exposed in the API, form 12 to 8 in the current prototype, and remove one function per owning/non-owning pair, that is 4 functions in the current prototype. This will not be consistent with the C++ API, in particular there will be no visible counterpart for OwningModuleRef and std::unique_ptr<Region>, that will become plain pointers instead. Note that the current proposal is also inconsistent with the C++ API as the latter doesn’t have owning types for Operation and Block.

The naming scheme could resemble the following:

  • a function that returns an object that the caller will own must include Create after it mentions the object, usually its type: mlirContextCreate, mlirBlockCreate; the keyword is used as suffix to preserve the convention of the functions related to mlir<Obj> starting with mlir<Obj>;
  • a function that takes ownership of its argument must include Owned before it mentions the object it intends to take ownership of: mlirRegionAppendOwnedBlock(mlirRegion, mlirBlock) does not take ownership of the Region argument but does take ownership of the Block argument;
  • any other function does not take or transfer ownership.

This does not allow the compiler to catch some ownership transfer issues, e.g. attempting to transfer the ownership twice would not have been caught by the initial proposal either but attempting to transfer ownership of a non-owned pointer would have, but makes the API slightly simpler. This is also slightly more in line with what many existing C APIs provide.

Any further thoughts or suggestions on this subject?


In the same call, we seemed to reach consensus that the API headers should be located in mlir/include/mlir-c wrt to the LLVM monorepo. We did not have the time to discuss the library/implementation layout though. The following alternatives were mentioned previously:

  • mlir/lib/CAPI
  • mlir/lib/Bindings/C
  • distributed across relevant files under mlir/lib/IR/*

I am less favorable to the last option as it makes C APIs less hermetic.

Any opinions?

I’m interested to see maybe some code samples of the two scheme for the API (with/without separate types for making the ownership explicit).
While we can’t make C “safe”, I’d look into what is the easiest to read/write in practice: it isn’t clear to me that the extra types are hurting / not helping so far.

⚙ D83310 [mlir] Initial version of C APIs now contains the API without separate owning types;
⚙ D84019 [WIP] C APIs (with ownership) - with separate owning types;
⚙ D84020 [WIP] differences between C APIs without and with ownership contains the latter rebased on the former to show the difference.

These diffs intentionally don’t touch anything other than ownership.

I prefer this one. I personally don’t think of such a C API as a binding.

I prefer the option without ownership encoded in the types. I see what you’re trying to accomplish, but I just don’t think that, aside from careful naming and a consistent pattern, any C-language features go far enough to actually be less of a foot-gun in the common case.

Virtually every time I’ve used a C API like this from a higher level language, I’ve constructed some kind of smart pointer-esque mechanism in that language to be explicit about create/retain/release/destroy semantics, which is where additional safety comes in. From that perspective, I prefer fewer types and less encoded in them since it makes such machinery easier to maintain.

I don’t think any of these mechanisms will significantly help haphazard, ad-hoc use of the API, and they will hinder more structured use. It’s C: it’s not going to be safe.

I like the code structure overall, though, and I would love for us to decide and get to work.

Interestingly I was expecting the opposite: it’d be less error-prone and easier to build/maintain such wrappers with Types to help distinguish.

It depends. If building a JNI binding, for example, it is such a pain to mirror the type hierarchy that you typically erase everything as much as possible and do the best you can with semantics that match the target. And in all of the cases, I have to read the comments and follow the naming convention carefully to map the semantics properly. The extra types are probably just something I would code around versus using as intended.

I just wouldn’t prioritize adding the complexity to the C API itself because it can’t really go far enough and now you have semantics spread across two places to cover it all.

Don’t get me wrong: I’d use either one and both are nice. I’d pick the simpler and minimize the type overhead, personally.