TOSA reference model from MLIR using EmitC

Opening this thread to discuss technical details around the development of a path to perform TOSA functional verification through the MLIR EmitC dialect.

This was briefly discussed on Discord starting with a request with @marbre but bringing it here for broader visibility.

Present

The TOSA reference model (reference_model.git - [no description]) consumes TOSA flatbuffers form. It takes the model, network inputs and emits functional network output. It is aligned to TOSA specification.

The TOSA dialect MLIR can be converted to flatbuffers using the following MLIR pass: tosa_mlir_translator.git - [no description] . It can be integrated into projects as a CMake submodule, though we also integrate it into TensorFlow Bazel builds using a custom build rule. Details in [RFC] Tosa import/export tool - #33 by sjarus .

Proposal

Drive the TOSA reference model by generating C API calls to reference model from the TOSA MLIR form using the EmitC dialect . The reference model would not be a binary but a library.

Conversation So Far

On our part, when we implemented the reference model, we considered offering a mechanism like this, but there was no immediate use case then. Adding @jsmolens and @eric-k here as additional involved folks at Arm.

The reference model isn’t a performance-focused runtime - the focus is on precision and bit accuracy for comparison to frontend reference output, and serves as a critical part of HW/SW co-design efforts.

There are some design considerations around this proposal to consider, e.g.

  • Passing parameter and datatype information in a manner easily parseable on the reference model side.
  • How to convey the optional quantization information construction properly to the reference model interface ? There is a cleanup to the dialect interface planned for the next TOSA minor version update (v0.24) that should significantly simplify this, but this update is only intended to happen in January.
3 Likes

Thanks @sjarus for starting a followup. Let me provide some additional information.

Current State

Prior to upstreaming EmitC to the MLIR core repository, we implemented a TOSA to EmitC pass. This pass allows to convert all TOSA ops included in MobileNetV2 to EmitC (mlir-emitc/tosa-op-coverage.md at main · iml130/mlir-emitc · GitHub). The generated C++ code relies on operations implemented (header-only) in mlir-emitc/emitc_tosa.h at main · iml130/mlir-emitc · GitHub and some additional header files. This conversion/translation pass is end-to-end tested in our CI pipeline.

Proposal

If the community thinks this is a useful thing, we are open minded to upstream what we have . Like the TOSA reference model, our header-only implementation is not tarting performance. However, we currently evaluate the option to make use of Eigen (the implementation currently only depends on the standard library).
Side note: We also have a MHLO to EmitC pass, that relies on our header-only implementations.

In addition we would like to discuss if it would be useful for the community, if we refactor such that the TOSA reference model is used instead of our own header-only implementation. This would also require a more library-friendly version of the TOSA reference model. Therefore, I would love to hear the opinion of the community first.

2 Likes

That’s quite a lot of pieces you already have in place to interface with the TOSA reference model! We’re pleased to see you’ve played with the reference model (which uses Eigen) too.

There are some mechanical questions here around how a library form of the reference model would interface into the MLIR ‘runtime’ here . Do you already have a proposal for how that would be driven, @marbre ? Right now, the reference model has a graph traverser along with the Eigen-based functional implementation of the ops themselves.

So far, I only have some initial thoughts but not yet a too concrete proposal.

  1. The main issue is indeed the tight coupling between the serializer, the graph
    traverser and the ops. Form what we have to far with TOSA → EmitC → EmitC C++ Reference Implementation (with EmitC C++ reference implementation I refer to emitc_*.h files in mlir-emitc/include/emitc at main · iml130/mlir-emitc · GitHub), it would be straight forward for us if we would be able to replace the EmitC C++ Reference Implementation by the Eigen-based ops implemented in the TOSA reference model.
    However, this would require to decouple the op implementations and make those available in a library.
  2. Ops in the TOSA reference model are stateful. It is probably necessary to separate handling the states and the computation itself. My colleague @david_ronnenberg has looked into it and could give a more detailed description of what would be needed.
  3. TOSA supports some datatypes for which we don’t have any support on the EmitC side so far.
  4. The emitted tensors target emitc::Tensor. We would need an efficient conversion to TOSA/Eigen tensors or an option to directly emit those via the Cpp emitter, respectively.

So there are definitely mechanical questions that need to be solved. We’re willing to push this forward, but as mentioned before, would like to here if this is of interest for the community.

@david_ronnenberg Please feel free to comment, especially if I have missed something that we have already discussed internally.

A quick heads-up related to this - we have implemented the Statefulness Support for TOSA - TOSA - Discourse proposal as a prototype for the purpose of expressing RNNs; it comprises the TOSA dialect utility ops, the serialization lib and reference model updates, and it works. We’ll be releasing it in the near future.

This will add some complexity around a few things, e.g. a simple memory model for maintaining persistent state contents, interfacing those utility ops since this is still in MLIR, whereas the serialized form translates these into tensors with a special is_variable bit set .

I think that makes sense out of tree. I’d rather not have a dependency on Eigen in core (too many moving parts and I don’t think C++ standard support lines up). MHLO repo that could be fine given TF’s dependency on Eigen. Utilizing a pure BLAS (or some such reasonably standard) interface and being able to switch in different implementations would be more appealing.

Having two versions also makes sense, so you have one for correctness and the other as a baseline for comparing against codegen/where you only have a C/C++ compiler available as backend. But Eigen is a large dependency and I’d much rather have pure reference implementation + codegen in core than an optimized library implementation with more dependencies and introduce a support & maintenance requirement for it.

Thanks a lot for your feedback. A pure reference implementation is what we initially had in mind and our header-only implementation therefore has no dependency except the STL. Hence, I can of course think off upstreaming the reference implementation we have so far to the MLIR core. To us the main question really is if there is interest to add such a reference implementation to the core (or to some other repo).

I therefore agree that an Eigen-based implementation makes more sense as part of an out-of-tree project. Basically, I am trying to figure out which implementation might be interesting for which user base.

We’ll definitely take a look into the proposal. When you say you’ll be releasing it in near future, is a date already scheduled? Referring to @jpienaar’s comment, what location would you like to see/suggest for a reference implementation that lower TOSA to EmitC to something tbd.

When we implemented the TOSA reference model and the MLIR pass to serialize TOSA form to drive the reference model, we faced similar issues with both Eigen and Flatbuffers dependencies, which we could not resolve how to seat within the MLIR core. Ultimately we left them as originally done - the reference model as a standalone flatbuffers/JSON consuming binary, and the MLIR pass as a standalone repo that could easily be linked with an MLIR pass manager.

To get a better understanding of the requirements, what does the EmitC path intend ? What would be the networks being emitted - single op unit tests cases for example, or full networks in TOSA MLIR ?

To run full networks, any parallel reference implementation would need to implement graph building and traversal, the basic memory model concepts, file I/O… which all amount to duplicating the existing reference model functionality.

Conceptually it seems more straightforward to have EmitC have a functional verification mode where it emits calls to construct a full MLIR TOSA form, invoke the graph builder in the existing reference model, invoke its traverser and get an actual functional bit level output.

This would be an out-of-tree path since the dependency on the external reference model and its own Eigen dependency would remain; the flatbuffers dependency would be absent since this is an independent path to construct and drive the reference model.

Having discussed this internally with @jsmolens and @eric-k we think this is feasible, though the interfaces would need to be defined.

No, we’re still trying to close out internal and external feedback loops on this, as is normal with new proposals on the TOSA discourse.

I’m surprised: I’m pretty sure I mentioned a path for this back then (a CMake flag to conditionally enable this basically), you shouldn’t hesitate to bring up this kind of questions as soon as they occur.

Ultimately, I believe it’ll be incredibly valuable to build as much as possible of end-to-end flow upstream.
I’ve been meaning to invest more into this for a while, and we’ll get there. The duplication may be unfortunate, but what’s the alternative? Would you be willing to upstream your work instead?