Simple C++ emitter rev

This is a follow up from previous discussion with review https://reviews.llvm.org/D76571. The C++ emitter backend here is a simple one that provides some basic structure while deferring most emissions to dialect emitters.

As mentioned in the rev: This is useful for cases where you have a C++ compiler but no access to the codegen (e.g., for using MLIR optimizations along with legacy or proprietary systems) and for prototyping/debugging (e.g., found this useful to play with shape functions).

It is very simple and as mentioned in the previous discussion on the mailing list does not add a C/C++ dialect. A C/C++ dialect would be useful instead but outside of what I currently have planned.

One point in the rev is that I’m not using an interface but instead take a map from dialect to emitters as input. This is mostly as I do not think of the emitter as core to the ops and so I don’t want to change the op definitions for this and also I want it to be possible that one can use multiple different lowerings for the same op/dialect (at different times).

– Jacques

2 Likes

I’m not necessarily opposed to this going in as is, but let me elaborate on the source of my questions:

I would love to see a centralized c/c++ dialect where the syntactic issues can be isolated. I think starting such a project based on the needs of this could grow into something more broadly useful. I certainly don’t think we should prevent this landing to wait for a full c++ round-trippable dialect. It’s a personal pet peeve of mine every time I start seeing code strings going to an output stream… It always starts simple and becomes complicated and unreadable.

Your point about interfaces is a good one. I don’t know of a good solution to the fact that the emission needs to be customized. I’m hesitant to add yet more mechanisms to MLIR, however… It is already a relatively rich framework with lots of stuff that fits together in unusual ways. In this case, I might lean on the side of doing something even simpler, without planning for the complexity of “people might want to do this in lots of different ways”. This starts to get at the crux of the product complexity of dialects. We want to enable a (seemingly) arbitrary set of transformations on an arbitrary set of dialects.

I don’t disagree with you on that. I think that is a very valid objection.

One could be built using different linker targets probably.

Well I’m one of those people :slight_smile: I have 2 different ways in which I want to lower TF dialect. Perhaps I should not hedge: either have only the registry and have different targets link in different emitters, or don’t have a registry and require it to be explicitly passed.

(part of me also sometimes look at something being added ODS/DRR side and thinking … “mmm, if that was an op, it could be represented as a verification of the op”/"we could have fused X and Y if we didn’t just splat out immediately, so ideally we would subsume it with something better wrt C++ emission …)

Made an example EmitC dialect in https://reviews.llvm.org/D76571 for discussion (should maybe have used different rev … and need to update description). This dialect’s goal is to make it easy/trivial to translate to code rather than model anything beyond that.

Hello, I’m quite new here, but rather interested in C output. Can you tell me:

  1. how to get to test this
  2. why C++, and not plain C - what are the features that would take advantage of the C++ extensions (classes, templates, RTTI?).

Hey,

It doesn’t exist beyond the code under review, so it is very early and not usable beyond emitting summer function/function calls. The dialect currently being discussed would be syntactic and have a trivial mapping to C++. Now if you restrict how you lower to this dialect you could emit C instead - but that would be a function of the dialect conversion and not using C++ constructs (e.g. if you convert to EmitC with class Ops, then you won’t get plain C out of the translation). For just function calls and if you don’t use multiresult ops, then you would get C out with the rev posted. But yes one of the applications does require generating templates.

  1. Is there some tutorial on how to test this “code under review”?
  2. Can you give me an example of the need for templates?
  3. My immediate interest here would be to be able to go from affine to C, but of course I understand the rationale you set up in your first post.
  1. Download the raw diff and patch it into your client (potentially check for the head revisions there to avoid needless merge conflicts).

  2. Sure, an example is when you need to integrate the generated code with an existing system that uses templates. E.g., you previously had:

template <...>
tensor foo(...) { ... }

and you want to now instead use MLIR to optimize foo before you emit it. For example, you want a loop nest emitted that can be specialized for multiple different int/float types. You could of course do that by just duplicating & specializing in MLIR already if you knew all the types you wanted at the generated function compile time.

  1. Sounds good, the dialect would not help with that in the current version (you’d probably need to add for constructs and many other parts), the current rev just gives function calls.

I like the idea of the “emitc” dialect. In some scenarios we probably will want to emit raw C and so it should be possible to swap out all the C++ tuple stuff with struct tuple0, struct tuple1_i32, struct tuple2_i64_if64, struct tuple2_f32_f32, etc.

I also like the idea of an dialect to emit C or C++ code. I will shortly be starting to work on serializing IREE’s VM dialect to C/C++ (IREE#1173). This could be realized with/profit from the “emitc” dialect.

A visitor approach to emit C++ (possibly with an interface) was a good first approach to me. The update with a new dialect does not seem clearly motivated: I don’t understand why a dialect is useful / desirable here: what kind of analysis, transformation, manipulation, etc. would we do on this “dialect”?

From what I understand, lowering to a “Dialect” is more of a matter of convenience, such as converting generic ops to function calls and such. This way a verifier could be later tacked onto (for example, with knowledge of the available libraries to check if all ops are supported) a generic black box call op.

(I’m playing devils advocate here and will try to argue both sides :slight_smile: )

String emits

  • :white_check_mark: This is very flexible and we can do arbitrarily complex things simply (e.g., there is no difference from emitting a macro that creates a templated class vs a function in terms of effort/structure needed).
  • :negative_squared_cross_mark: There is no structure or verification until the compiler is finally invoked.
    • Helper functions could be added as we did for ODS

Dialect

  • :white_check_mark: Common constructs can be emitted more safely
    • Doesn’t avoid all footguns though (e.g., unless we model C++ type system completely/expect all types to be defined along with the program, you can use a type that can’t be lowered)
  • :white_check_mark: Textual emission is trivial (e.g., there is a 1:1 mapping, decisions about what types constructs should get lowered to have already been handled)
  • :negative_squared_cross_mark: More restricted and need to add an op for a syntactic feature (e.g., if you want a macro, you’d need a macro op)
  • :negative_squared_cross_mark: Little transformations/analysis on this form (as it is syntactic, more just textual templating engine with more structure [constructs/verification] specific to C/C++

I see little analysis on this dialect, some transformation (you could do variable renaming, automatic comment generation, generate forward declarations [including autogenning all the tuple types you would need C side for multi-results that have C primitive types]), I do see verification playing in here (and that increasing over time), but initially it would just be simple checking that a value being consumed was actually produced (rather than hoping the string referenced in a assign actually matches some variable).

As Sean mentioned the original approach had a problem there in that how you emit a multi-result output there is fixed, in the EmitC dialect there would be no multi-result outputs, those would have already been lowered in some way: so you’d have a single result op that could produce a std::tuple or a struct tuple_i64_f64 and the emission of the dialect doesn’t really care

As mentioned parts of what I like about the dialect, you could also get from helper functions. Consider all the classes in ODS and DRR C++ side, those model some C++ constructs (we have an C++ structure that corresponds to the C++ class that will get emitted), we could generalize those and expose it as an API to use here (note: I consider it internal to ODS/DRR at the moment and made for those use cases). Now we do perhaps want ODS and DRR to be dialects (or some parts of them :slight_smile: ) and do optimizations and verifications on those dialects before emitting, then it becomes dialect conversion & trivial emit [and yes we probably can’t use ODS to define the dialects required for ODS without either a multistage process or not using ODS for them].

OK, I did a terrible job in devils advocate here … I think it boils down to we can make both work, the string one we have more helper functions & flexibility, but less structure. The dialect one has more structure, dialects are pretty cheap to define, but less flexibility.

Just to throw more use cases into the mix, here’s a list of random stuff that I pulled from an old email on a related topic (emitting C code for targeting low-resource DSP’s):

Also, thinking of C generation as a general building block that will be reused in many places, one might want to consider such things as:

  • generating #include statements

  • putting #ifdef’s around regions of code

  • using some opaque datatype (such as an “Status” data type) as a return value of a function, and all you know is the name of the data type, not its actual ABI layout (which might even be highly target dependent or only known internally to the compiler, such as jmp_buf or ucontext_t). Maybe all you’re trying to do is emit a series of function calls like CHECK_OK(Foo(...)); CHECK_OK(Bar(...)) and you don’t care about the layout of Status (e.g. you are coming from a dialect where there is no layout).

  • using the symbols errno/stdin/stdout, which can be defined in many different ways, one of which is as a macro.

  • generating inline asm blobs (e.g. this might be a building block for generating Ruy kernels)

  • generating weird target-specific attributes on the functions (e.g. “on this one platform, I need to output __attribute__((address(0x1234))) on each global variable instead of using a linker script”)

  • generating a header just containing some structs, typedefs, and function declarations (and an include guard, of course) to accompany some other generated file.

  • generating C++ code instead of C

While for this patch the approach doesn’t matter much, some of these use cases could lend themselves one way or the other. Would like to know more about how we expect this to evolve and what the goals are.

Also, one of the key things that folks will want to do with this is to emit arithmetic expressions. Would like to see how that layers into this code. I guess we could emit things like std.addi as a call, and then somehow have a std.addi function (or rather functions, since it could operate on i64 or i32 or i16 or i8).

Based on my experience writing TCIE (my Tiny C Inference Engine that I believe Jacque talked about in some talk at some conference; Jacques is there a link for interested folks?), I actually believe that the C emission process actually has two totally distinct subproblems:

  1. emitting “structural” code like function declarations, #include’s, structs, typedefs, etc.
    a. for this, we need lots of flexibility because let’s be honest the use cases just get weird (as I described above). Errors here are mainly going to result in syntax errors at the C++ compiler level, so string munging is not a worry. Yeah, it might be annoying, but won’t result in miscompiles.
  2. emitting the “bodies of functions”.
    a. For this, the key difficulty is general and correct program emission, rather than pure syntactic concerns. That is, if I say I want an i16 add with unsigned wrapping, the code emitter better make sure that it uses the right datatypes in the emitted C to make that the numbers come out right, and this is fairly tricky to do with a purely syntactic approach. You also want to make sure that control flow is handled in a fully general and correct way, and that there are appropriate hooks for defining how an MLIR type (like tensor) should be materialized at the C/C++ level.

For 1., the visitor approach to me seems like the clear winner.

For 2. the benefits of a proper “C dialect” start to become more pronounced. For example, to properly emit an i16 add with unsigned wrap we would need to cast to uint16_t to make sure that the C “+” operator does the right thing. Using a dialect formalizes the notion of what we know we can safely emit. For example, we can say that we can only emit i8, i16, i32 and i64. On the other hand, we could handle this by defining a legalization target with dynamically legal std.addi. Regardless of whether we use a visitor approach or a “C dialect” here, I think the critical thing is defining a legalization target that people can lower to programmatically.

Another way of approaching this is to just opaque std.addi as a C call to a function std_addi. But that’s actually not so simple since you would need a side channel to emit the corresponding function declarations for every possible bit width. That seems to entail a grossness that my intuition says just won’t scale.

What I am missing today is a better description of the dialect, the kind of types and operation it would have beyond the emitC operation, what kind of verification would be made, etc?

In the context of what I was saying, I don’t see a place for an “emitc” dialect. What I’m imagining is something more like a “c” dialect that has a set of primitive math ops (and maybe some stuff related to pointers). In fact, something like what River was working on for clang IRGen could potentially be a good candidate for this?

Yes, C4ML 2020 linked on website: Talks - MLIR

You mention it produces syntax errors, which are miscompiles. What do you consider miscompiles?

The verification that the types match are in the dialect being lowered from, and as long as the lowering is consistent/sounds this isn’t an issue. True for string approach too. A difference is if your input program/“source” dialect is not strict and during legalization/emission the wrong thing happens. But whether I want unsigned wrap or not is still a function of legalization rule in all 3 and if your lowering is wrong and so chooses a wrong lowering, then none of these would save you (as it is valid)

Because of flexibility it does have yes, although you say structural and for functions, function calls, for loops it doesn’t give you any structure, so you’d need to supplement with helper functions for reuse as mentioned above.

This is true even with string munging approach (the dialect emitter in the previous rev could be made to just return failure for all other types). Even if we consider just for a given op (e.g., std.subi can only handle i16) then the string, syntactic and “proper C” dialect could all ensure that safety.

Doesn’t that go against the flexibility argument above? But excluding that, I don’t see how that is an argument for any of the suggestions.

Are you assuming that your input function does not have explicit types? E.g,. my function only has i16 why do I need any side channel? If a bit width is used that isn’t supported, it just won’t compile (so similar to syntactic error above) or at worst won’t link, but you’ll know what offline (e.g., while invoking the compiler on the generated code).

Types:

  • To start with would be effectively opaque (e.g., !emitc<"StatusOr<Foo>">) and standard scalar ones (e.g., i8, i16, i32, i64, f32, …). It is more about consistency of the types.
  • Adding structured types could improve verification, but for example call_member_function would have object and symbol to function in class operation and that could be verified.
  • Might need something like !emitc.template_arg<"T"> to enable verifying that the type referenced is nested within the template op.

Operations:

  • call, call_member_function, for/while/if, namespace, include, class (which would have multple regions), access_specifier (which would have a region), template, comment, ifdef (which has multiple regions), macro (although could be assert instead as first one of interest), scope (single block region, not yield) [haven’t though about lambda yet, I used that to emit op with region, but didn’t turn out that useful for the case I had, same with asm]
  • Could reuse std.return and std.function (it may not suffice given we want templated functions, but I think it should have the function nested inside template and then using template_arg type …).

Verification:

  • The type & dominance verification (these 2 gets some nice benefit :slight_smile: )

Transformation:

  • Specialize template block for N types
  • Insert forward declarations

This would prototyping and some of the use cases (e.g., subsume ODS custom C++ emitters) more difficult. While appealing for verification, it would not be as easy to use.

@_sean_silva It seems to me like your use case actually advocates for emitC as a separate dialect. There should be a C dialect with the goal of syntactically representing C code and being round-trippable, and a separate dialect (which might lower into the first) which is useful for sugaring the code generation process and ensuring that things are syntactically correct. Or maybe they are the same dialect but with some additional semantic properties which must hold to round-trip semantically valid C code. The dialect would have an internal lowering which would ensure these semantic properties (perhaps by inserting casts). This is similar to what the LLVMIR dialect seems to be converging on.

In this context, I consider a miscompile something that will execute (after being further compiled by a C compiler) but produce incorrect output. A syntax error in the generated C code is a miscompile in this sense since the program never reaches the point where it executes.

By analogy, if a C compiler produces assembly that can’t be assembled by the assembler - is it a miscompile or a failure to compile? The definition above considers it a failure to compile, which is intuitively correct IMO (it fails the user’s build, just as a syntax error, compiler crash, etc. would and those are failures to compile).