LLVM Discussion Forums

Simple C++ emitter rev

This is a follow up from previous discussion with review https://reviews.llvm.org/D76571. The C++ emitter backend here is a simple one that provides some basic structure while deferring most emissions to dialect emitters.

As mentioned in the rev: This is useful for cases where you have a C++ compiler but no access to the codegen (e.g., for using MLIR optimizations along with legacy or proprietary systems) and for prototyping/debugging (e.g., found this useful to play with shape functions).

It is very simple and as mentioned in the previous discussion on the mailing list does not add a C/C++ dialect. A C/C++ dialect would be useful instead but outside of what I currently have planned.

One point in the rev is that I’m not using an interface but instead take a map from dialect to emitters as input. This is mostly as I do not think of the emitter as core to the ops and so I don’t want to change the op definitions for this and also I want it to be possible that one can use multiple different lowerings for the same op/dialect (at different times).

– Jacques

1 Like

I’m not necessarily opposed to this going in as is, but let me elaborate on the source of my questions:

I would love to see a centralized c/c++ dialect where the syntactic issues can be isolated. I think starting such a project based on the needs of this could grow into something more broadly useful. I certainly don’t think we should prevent this landing to wait for a full c++ round-trippable dialect. It’s a personal pet peeve of mine every time I start seeing code strings going to an output stream… It always starts simple and becomes complicated and unreadable.

Your point about interfaces is a good one. I don’t know of a good solution to the fact that the emission needs to be customized. I’m hesitant to add yet more mechanisms to MLIR, however… It is already a relatively rich framework with lots of stuff that fits together in unusual ways. In this case, I might lean on the side of doing something even simpler, without planning for the complexity of “people might want to do this in lots of different ways”. This starts to get at the crux of the product complexity of dialects. We want to enable a (seemingly) arbitrary set of transformations on an arbitrary set of dialects.

I don’t disagree with you on that. I think that is a very valid objection.

One could be built using different linker targets probably.

Well I’m one of those people :slight_smile: I have 2 different ways in which I want to lower TF dialect. Perhaps I should not hedge: either have only the registry and have different targets link in different emitters, or don’t have a registry and require it to be explicitly passed.

(part of me also sometimes look at something being added ODS/DRR side and thinking … “mmm, if that was an op, it could be represented as a verification of the op”/"we could have fused X and Y if we didn’t just splat out immediately, so ideally we would subsume it with something better wrt C++ emission …)

Made an example EmitC dialect in https://reviews.llvm.org/D76571 for discussion (should maybe have used different rev … and need to update description). This dialect’s goal is to make it easy/trivial to translate to code rather than model anything beyond that.

Hello, I’m quite new here, but rather interested in C output. Can you tell me:

  1. how to get to test this
  2. why C++, and not plain C - what are the features that would take advantage of the C++ extensions (classes, templates, RTTI?).

Hey,

It doesn’t exist beyond the code under review, so it is very early and not usable beyond emitting summer function/function calls. The dialect currently being discussed would be syntactic and have a trivial mapping to C++. Now if you restrict how you lower to this dialect you could emit C instead - but that would be a function of the dialect conversion and not using C++ constructs (e.g. if you convert to EmitC with class Ops, then you won’t get plain C out of the translation). For just function calls and if you don’t use multiresult ops, then you would get C out with the rev posted. But yes one of the applications does require generating templates.

  1. Is there some tutorial on how to test this “code under review”?
  2. Can you give me an example of the need for templates?
  3. My immediate interest here would be to be able to go from affine to C, but of course I understand the rationale you set up in your first post.
  1. Download the raw diff and patch it into your client (potentially check for the head revisions there to avoid needless merge conflicts).

  2. Sure, an example is when you need to integrate the generated code with an existing system that uses templates. E.g., you previously had:

template <...>
tensor foo(...) { ... }

and you want to now instead use MLIR to optimize foo before you emit it. For example, you want a loop nest emitted that can be specialized for multiple different int/float types. You could of course do that by just duplicating & specializing in MLIR already if you knew all the types you wanted at the generated function compile time.

  1. Sounds good, the dialect would not help with that in the current version (you’d probably need to add for constructs and many other parts), the current rev just gives function calls.

I like the idea of the “emitc” dialect. In some scenarios we probably will want to emit raw C and so it should be possible to swap out all the C++ tuple stuff with struct tuple0, struct tuple1_i32, struct tuple2_i64_if64, struct tuple2_f32_f32, etc.

I also like the idea of an dialect to emit C or C++ code. I will shortly be starting to work on serializing IREE’s VM dialect to C/C++ (IREE#1173). This could be realized with/profit from the “emitc” dialect.

A visitor approach to emit C++ (possibly with an interface) was a good first approach to me. The update with a new dialect does not seem clearly motivated: I don’t understand why a dialect is useful / desirable here: what kind of analysis, transformation, manipulation, etc. would we do on this “dialect”?

From what I understand, lowering to a “Dialect” is more of a matter of convenience, such as converting generic ops to function calls and such. This way a verifier could be later tacked onto (for example, with knowledge of the available libraries to check if all ops are supported) a generic black box call op.

(I’m playing devils advocate here and will try to argue both sides :slight_smile: )

String emits

  • :white_check_mark: This is very flexible and we can do arbitrarily complex things simply (e.g., there is no difference from emitting a macro that creates a templated class vs a function in terms of effort/structure needed).
  • :negative_squared_cross_mark: There is no structure or verification until the compiler is finally invoked.
    • Helper functions could be added as we did for ODS

Dialect

  • :white_check_mark: Common constructs can be emitted more safely
    • Doesn’t avoid all footguns though (e.g., unless we model C++ type system completely/expect all types to be defined along with the program, you can use a type that can’t be lowered)
  • :white_check_mark: Textual emission is trivial (e.g., there is a 1:1 mapping, decisions about what types constructs should get lowered to have already been handled)
  • :negative_squared_cross_mark: More restricted and need to add an op for a syntactic feature (e.g., if you want a macro, you’d need a macro op)
  • :negative_squared_cross_mark: Little transformations/analysis on this form (as it is syntactic, more just textual templating engine with more structure [constructs/verification] specific to C/C++

I see little analysis on this dialect, some transformation (you could do variable renaming, automatic comment generation, generate forward declarations [including autogenning all the tuple types you would need C side for multi-results that have C primitive types]), I do see verification playing in here (and that increasing over time), but initially it would just be simple checking that a value being consumed was actually produced (rather than hoping the string referenced in a assign actually matches some variable).

As Sean mentioned the original approach had a problem there in that how you emit a multi-result output there is fixed, in the EmitC dialect there would be no multi-result outputs, those would have already been lowered in some way: so you’d have a single result op that could produce a std::tuple or a struct tuple_i64_f64 and the emission of the dialect doesn’t really care

As mentioned parts of what I like about the dialect, you could also get from helper functions. Consider all the classes in ODS and DRR C++ side, those model some C++ constructs (we have an C++ structure that corresponds to the C++ class that will get emitted), we could generalize those and expose it as an API to use here (note: I consider it internal to ODS/DRR at the moment and made for those use cases). Now we do perhaps want ODS and DRR to be dialects (or some parts of them :slight_smile: ) and do optimizations and verifications on those dialects before emitting, then it becomes dialect conversion & trivial emit [and yes we probably can’t use ODS to define the dialects required for ODS without either a multistage process or not using ODS for them].

OK, I did a terrible job in devils advocate here … I think it boils down to we can make both work, the string one we have more helper functions & flexibility, but less structure. The dialect one has more structure, dialects are pretty cheap to define, but less flexibility.

Just to throw more use cases into the mix, here’s a list of random stuff that I pulled from an old email on a related topic (emitting C code for targeting low-resource DSP’s):

Also, thinking of C generation as a general building block that will be reused in many places, one might want to consider such things as:

  • generating #include statements

  • putting #ifdef’s around regions of code

  • using some opaque datatype (such as an “Status” data type) as a return value of a function, and all you know is the name of the data type, not its actual ABI layout (which might even be highly target dependent or only known internally to the compiler, such as jmp_buf or ucontext_t). Maybe all you’re trying to do is emit a series of function calls like CHECK_OK(Foo(...)); CHECK_OK(Bar(...)) and you don’t care about the layout of Status (e.g. you are coming from a dialect where there is no layout).

  • using the symbols errno/stdin/stdout, which can be defined in many different ways, one of which is as a macro.

  • generating inline asm blobs (e.g. this might be a building block for generating Ruy kernels)

  • generating weird target-specific attributes on the functions (e.g. “on this one platform, I need to output __attribute__((address(0x1234))) on each global variable instead of using a linker script”)

  • generating a header just containing some structs, typedefs, and function declarations (and an include guard, of course) to accompany some other generated file.

  • generating C++ code instead of C

While for this patch the approach doesn’t matter much, some of these use cases could lend themselves one way or the other. Would like to know more about how we expect this to evolve and what the goals are.

Also, one of the key things that folks will want to do with this is to emit arithmetic expressions. Would like to see how that layers into this code. I guess we could emit things like std.addi as a call, and then somehow have a std.addi function (or rather functions, since it could operate on i64 or i32 or i16 or i8).

Based on my experience writing TCIE (my Tiny C Inference Engine that I believe Jacque talked about in some talk at some conference; Jacques is there a link for interested folks?), I actually believe that the C emission process actually has two totally distinct subproblems:

  1. emitting “structural” code like function declarations, #include's, structs, typedefs, etc.
    a. for this, we need lots of flexibility because let’s be honest the use cases just get weird (as I described above). Errors here are mainly going to result in syntax errors at the C++ compiler level, so string munging is not a worry. Yeah, it might be annoying, but won’t result in miscompiles.
  2. emitting the “bodies of functions”.
    a. For this, the key difficulty is general and correct program emission, rather than pure syntactic concerns. That is, if I say I want an i16 add with unsigned wrapping, the code emitter better make sure that it uses the right datatypes in the emitted C to make that the numbers come out right, and this is fairly tricky to do with a purely syntactic approach. You also want to make sure that control flow is handled in a fully general and correct way, and that there are appropriate hooks for defining how an MLIR type (like tensor) should be materialized at the C/C++ level.

For 1., the visitor approach to me seems like the clear winner.

For 2. the benefits of a proper “C dialect” start to become more pronounced. For example, to properly emit an i16 add with unsigned wrap we would need to cast to uint16_t to make sure that the C “+” operator does the right thing. Using a dialect formalizes the notion of what we know we can safely emit. For example, we can say that we can only emit i8, i16, i32 and i64. On the other hand, we could handle this by defining a legalization target with dynamically legal std.addi. Regardless of whether we use a visitor approach or a “C dialect” here, I think the critical thing is defining a legalization target that people can lower to programmatically.

Another way of approaching this is to just opaque std.addi as a C call to a function std_addi. But that’s actually not so simple since you would need a side channel to emit the corresponding function declarations for every possible bit width. That seems to entail a grossness that my intuition says just won’t scale.

What I am missing today is a better description of the dialect, the kind of types and operation it would have beyond the emitC operation, what kind of verification would be made, etc?

In the context of what I was saying, I don’t see a place for an “emitc” dialect. What I’m imagining is something more like a “c” dialect that has a set of primitive math ops (and maybe some stuff related to pointers). In fact, something like what River was working on for clang IRGen could potentially be a good candidate for this?