LLVM Support types in C API

In the MLIR C API, we want to expose several functions that accept or return objects of types defined in LLVM Support library, namely StringRef, APInt and APFloat. LLVM itself does not seem to expose these types in its C API.

The high-level design question is whether we want to provide C interface to these types locally in MLIR or provide in it LLVM and create a dependence between C API libraries.

Regardless of the high-level design, some ideas for the actual representation are:

  • StringRef can be unpacked to struct { const char *, size_t }; that it actually contains;
  • APInt can be exposed as an opaque pointer, at a cost of additional allocation per APInt, or unpacked to struct { union { uint64_t; uint64_t *; } data; size_t bitwidth}; that it actually contains.

Both structures are relatively simple and can be returned by-value from C API functions. APFloat is a bit trickier because it would also require exposing FltSemantics, so I am tempted to just keep conversions to float and double in the short term.

For all classes, we are likely going to expose only conversion and construction function for interfacing purposes, leaving users provide provide operations specific to their languages and libraries (e.g., Python has built-in support for long arithmetics, C can have it through GMP or imath, there are plenty of libraries to use for string operations).

Thoughts?

For StringRef I’m +1 on using a locally defined struct. Counted byte ranges are ubiquitous for these kinds of boundary cases, and imo do not pass the bar to justify taking a dep to reuse such a fundamental struct definition.

I have less of an opinion on APInt and APFloat but do agree that they are different. For APInt, it would be nice to expose it as a struct that could be easily interchanged with corresponding arbitrary precision integer libraries (gmp, python, etc). I’d be inclined towards a local struct for that as well. Your definition seems reasonable.

For APFloat deferring and just providing float/double conversations seems ok for now.

I agree with Stella here.

There should be very few things that take and return APFloats. I’d recommend that we provide them in the form of something that takes double plus a C enum specifying the format. This allows us to extend it over time, and works reasonably for any IEEE format that is fp64 or smaller.

If/when this is not enough (e.g. someone wants PPC double double or FP80), analogous APIs can be provided that take a more fancy format.

After a more detailed look, the allocation for APInt appears unavoidable: for more bitwidth>64, the APInt object owns the underlying data and provides read-only access through its API at best. So we would need to allocate and copy the data to the unpacked struct for long integers. At this point, I am inclined to just always allocate an APInt * and move-construct it from the C++ object to avoid copying the internal data. In the common case of bitwidth<=64, we can expose the int64_t <...>getInt() functions in addition to APInt <...>getValue(). Since APInt is mostly used in attributes, which have a type that allows the user to query the bitwidth before the value, it should be sufficient and avoid unnecessary allocations for small values and only allocate a small amount of additional memory for big values.

An alternative to this would be to allow “detaching” the allocated data in llvm::APInt API and using that to populate the unpacked struct in the C API.