Just to throw more use cases into the mix, here’s a list of random stuff that I pulled from an old email on a related topic (emitting C code for targeting low-resource DSP’s):
Also, thinking of C generation as a general building block that will be reused in many places, one might want to consider such things as:
generating #include statements
putting #ifdef’s around regions of code
using some opaque datatype (such as an “Status” data type) as a return value of a function, and all you know is the name of the data type, not its actual ABI layout (which might even be highly target dependent or only known internally to the compiler, such as jmp_buf or ucontext_t). Maybe all you’re trying to do is emit a series of function calls like
CHECK_OK(Foo(...)); CHECK_OK(Bar(...)) and you don’t care about the layout of Status (e.g. you are coming from a dialect where there is no layout).
using the symbols
stdout, which can be defined in many different ways, one of which is as a macro.
generating inline asm blobs (e.g. this might be a building block for generating Ruy kernels)
generating weird target-specific attributes on the functions (e.g. “on this one platform, I need to output
__attribute__((address(0x1234))) on each global variable instead of using a linker script”)
generating a header just containing some structs, typedefs, and function declarations (and an include guard, of course) to accompany some other generated file.
generating C++ code instead of C
While for this patch the approach doesn’t matter much, some of these use cases could lend themselves one way or the other. Would like to know more about how we expect this to evolve and what the goals are.
Also, one of the key things that folks will want to do with this is to emit arithmetic expressions. Would like to see how that layers into this code. I guess we could emit things like
std.addi as a call, and then somehow have a
std.addi function (or rather functions, since it could operate on i64 or i32 or i16 or i8).
Based on my experience writing TCIE (my Tiny C Inference Engine that I believe Jacque talked about in some talk at some conference; Jacques is there a link for interested folks?), I actually believe that the C emission process actually has two totally distinct subproblems:
- emitting “structural” code like function declarations,
#include's, structs, typedefs, etc.
a. for this, we need lots of flexibility because let’s be honest the use cases just get weird (as I described above). Errors here are mainly going to result in syntax errors at the C++ compiler level, so string munging is not a worry. Yeah, it might be annoying, but won’t result in miscompiles.
- emitting the “bodies of functions”.
a. For this, the key difficulty is general and correct program emission, rather than pure syntactic concerns. That is, if I say I want an i16 add with unsigned wrapping, the code emitter better make sure that it uses the right datatypes in the emitted C to make that the numbers come out right, and this is fairly tricky to do with a purely syntactic approach. You also want to make sure that control flow is handled in a fully general and correct way, and that there are appropriate hooks for defining how an MLIR type (like
tensor) should be materialized at the C/C++ level.
For 1., the visitor approach to me seems like the clear winner.
For 2. the benefits of a proper “C dialect” start to become more pronounced. For example, to properly emit an i16 add with unsigned wrap we would need to cast to
uint16_t to make sure that the C “+” operator does the right thing. Using a dialect formalizes the notion of what we know we can safely emit. For example, we can say that we can only emit
i64. On the other hand, we could handle this by defining a legalization target with dynamically legal
std.addi. Regardless of whether we use a visitor approach or a “C dialect” here, I think the critical thing is defining a legalization target that people can lower to programmatically.
Another way of approaching this is to just opaque std.addi as a C call to a function
std_addi. But that’s actually not so simple since you would need a side channel to emit the corresponding function declarations for every possible bit width. That seems to entail a grossness that my intuition says just won’t scale.