[RFC] Tosa import/export tool

Thanks @stellaraccident for beating me to this RFC! I’m traveling today so apologies for typos from a mobile device.

A couple of additional points:

  • the serializer is currently an MLIR pass that we now overlay upon the existing open sourced code. It expects the MLIR form to be purely TOSA IR, though this was slightly relaxed before (eg std dialect constants were accepted). @kevincheng, is this still the case ?
  • as @jsmolens mentioned , there’s only a serialization path but deserialization can also be accomplished ; we just haven’t needed it yet .
  • I looked at the SPIRV serialize/deserialize implementation for inspiration. To answer @jpienaar asking about this, the same approach might work; the fundamental problem is that compiling this requires the Flatbuffers standard headers to be included ; it does not appear any other LLVM/MLIR module has such a dependency and introducing this dependency was what we were unsure about .
  • it isn’t really hard to switch from a pass at present to invoking the serialization/deserialization interfaces; really the only showstopper here is ‘how do we build as part of LLVM with an optional dependency on the external flatbuffers.h ?’
  • it is expected that as the spec evolves, the tosa.fbs will be updated to follow, and will be consumed by the serialization API.

Partially yes. std.return is currently the only operator allowed.

I would expect it to be pretty mechanical, and there may be simplification options with judicious use of MLIR features (like an appropriate op interface). Certainly of a much lower level of complexity than the Torch/MLIR converter because these are actually isomorphic representations.

The translation is fairly mechanical. Ops with similar operand/attribute signatures are handled through macros (could be refactored to use templates), while ops with unique signatures have their own functions to write out the Flatbuffer fields in the right order.

Whether in C++ or Python, the same total number of fields need to be extracted and put into Flatbuffers form, so neither language has an obvious advantage from that standpoint. I feel the generated C++ interface is more mature because it has proper enums (and string tables for enums), while the flatc doesn’t create those in the Python version. These things make debugging easier when adding new ops.

Reviving this thread after a long time. There’s been significant development of the underlying bits in recent months. To summarize:

  1. The actual flatbuffer schema, headers and flatc-generated forms (C++ and Python) are in https://review.mlplatform.org/tosa/serialization_lib .
  2. This is included by the TOSA reference model (reference_model.git - [no description]) as a third party dependency in order to parse and consume flatbuffers content.
  3. We have an as yet unreleased MLIR pass that takes fully TOSA form and serializes it. This pass also consumes the same serialization_lib and generates the flatbuffer form after legalization to TOSA. This pass has bott CMake and Bazel build files, the latter permitting it to be included within the TensorFlow build.
  4. The serialization_lib stays synchronous with updates to the spec and reference model.

Third Party Dependencies:

  • The serialization_lib content depends on flatbuffers.
  • The serialize_tosa pass depends on serialization_lib

There are a couple of options here:

  1. The translator sits in a standalone repository, e.g. a github project. Would build a tosa-serialize-opt or similarly named binary.
  2. Follow previously mentioned suggestion - make it a translator within llvm-project MLIR, gated by -DMLIR_ENABLE_TOSA_FLATBUFFER_TRANSLATION=ON .

The second option ties the serialization to the dialect implementation in-tree and aligns to the idea of MLIR serialize/deserialize APIs, but presents two problems:

  • How to define the existence of submodules in the form of the serialization_lib it would include ? The llvm-project does not appear to use submodules.
  • Synchronization issue between definition in serializaiton_lib and MLIR implementation, unless a submodule hash or similar means can help resolve it.

I did look briefly at implementing tosa_serialize in Python too, but this approach isn’t currently POR on our part. It’s an interesting option, but would need some documentation on how to load the .mlir files , from which the mechanical serialization can be done in Python.

Any thoughts and suggestions would be welcome.

My understanding at the time, was that everything would be in-tree, but the dependency on the flatbuffer library itself. For this we’d use the regular CMake mechanism for finding flatbuffer, leaving it up to the user to install it separately.

Thanks! Yes this part seems relatively straightforward. Guarding the dependencies seems to be just a CMake mechanical matter that I’ve mostly gotten sorted out.

However there’s an interface problem here. I tried to use TranslateFromMLIRRegistration which in turn defines TranslateFromMLIRFunction which takes a ModuleOp and raw_ostream as parameters.

This is a little problematic. Historically the TOSA flatbuffers form emits a flatbuffers file and a collection of npy files for each layer weights - it is not a single monolithic file with everything embedded in, like .tflite. We’d like to be able to define command-line parameters for an output directory and optionally an output file name for the flatbuffers file.

What would be a good approach here ? Keeping it as a pass as it is now would be easiest to tie together, but it’s technically a translator from MLIR to another format; the MLIR translation interface is just a little restrictive here.

Ah that’s interesting!

One way to do it would be to have the flatbuffer file be emitted through the stream, and add another command line option (and function parameter) specifying an empty directory where to write the other files.

Yeah that might work - it does add the redundancy of having something like -o path_to_output/test.fb --tosa-flatbuffer-dirname=path_to_output . The serializer is already presented with the opened output file to stream to so -o get the full path anyway.

Another thought - how does the MLIR translation API integrate with an external project ? For example, I can invoke any builtin MLIR TOSA passes directly from within tf-opt as we do for the TF legalizations. But how does an external project like that access the translator from TOSA to flatbuffers ?

As a pass, this is easily done, but is there such a concept as being able to access a translation to/from MLIR from within an external project that has MLIR as an out-of-tree dependency ? Like MlirOptMain, is there some kind of translation manager that can let a developer constructor a translation manager that registers translations defined elsewhere and make them visible from that tool ?

I’m not sure I follow the question, but I’ll try to answer as I can: all the command line tools are really “testing and development” tools. A project like TensorFlow would use directly the C++ API and not the translation API directly in any way I think.

The MLIR translation interface has never been my favorite API – it is more of a “starter API” for building a compiler. You’ll notice that TensorFlow SavedModel import, which also has the characteristic of not being a single file (it is a directory) is wedged directly into the main() function, bypassing the registered translations entirely.

In IREE, the primary compiler backend (which compiles from MLIR-input → various outputs) fits the translate contract and is implemented in terms of it (i.e. iree-translate). We still debate when is the right time to upgrade this to a “real” tool not based on the pluggable translations.

To contrast, IREE’s various importers are just regular tools, not build on the translate APIs (TF Example, XLA Example). This tends to allow some better control over command-line and lets you craft it exactly as you want for a user-facing tool. Both of the above examples had to integrate with various non-LLVM-standard IO mechanism, and just directly coding that is the cleanest and provides the most control.

I’d recommend focusing on getting the TOSA import/export C++ API established and then create a dedicated tosa-flatbuffer-convert tool to go with it that you will use to write your tests (and can just serve as a “real” developer tool as well). Alternatively, you could start by just creating such a tool and having it work directly to serialize/deserialize. Then as a second step, extract the API from the tool. Either way gets you where you want to go. The tool-first approach has the benefit of you just writing some un-opinionated code/build support to get started without having to get the API factored perfectly in one step. That would give folks something to look at and might unblock progress.


Yeah I agree with @stellaraccident : the translation API does not bring much value, other than reducing the number of binaries we build and link in-tree.
But at the same time, if it does not fit the API, it isn’t that much overhead to add one binary to the build and it’ll be more ergonomic.

Especially since this binary will be conditioned on a CMake setting to enable the flatbuffer dependency, it fits together a bit more cleanly if it is its own thing. That way, you can just do CMake level if() decisions for including/excluding the tool in its entirety. Everything in mlir-translate now is a non-optional dependency, and changing that would require C macro level conditioning that I don’t think is worth it, especially given the less than optimal fit.

Thanks for the feedback! Just to offer further insight fot @mehdi_amini , this tool works at the point of statically shape resolved TOSA form. It could handle dynamic shapes, but right now the focus is on emitting flatbuffer or JSON content to drive the TOSA reference model.

This enables bit-accuracy conformance testing, which enables the development or updation of current legalizations, a topic covered in the related thread - TOSA shape inference and dynamic shape support - #4 by sjarus .

This pass is the last remaining piece to get out in order to enable the open sourcing the TOSA legalization unit test infrastructure , which can be run any time the TF/TFLite to TOSA legalizations are updated. It would validate the the legalizations aren’t breaking anything in terms of functionality / bit accuracy.

The C++ interfaces here (and even the Python ones) are pretty stable, and have been so since back when we open sourced the reference model back in Nov 2020. The same serialization_lib repo is a dependency of both this translator (TOSA producer) and reference_model (TOSA consumer).

The tricky part here really is that this tool ought to be either standalone or pluggable into Bazel or CMake based flows we have. As an MLIR pass, this works well. For example, we’ve trivially added this into tf-opt. It’s been added to other internal flows targeting other frontends and backends.

It would also be valuable within any TOSA-consuming flow including IREE, which could trigger the flatbuffers generation and call the reference model to emit output to validate against codegen result, for example.

Given the conversation here, it probably makes better sense to wrap the pass into its own binary for the LLVM integration, but also optionally enable it to be integrated as a pass into any other flow. I’ll step back and take a stab at that now.

Thanks for the explanation - I think the need is sound. Mainly just working out approach, I think.

If you’ve got something to see, it might be a bit easier to visualize – I don’t have a clear idea of what you mean by having implemented it with a pass. Are you able to dump what you have somewhere we can look at? (Mainly: I’d hate for you to spend a lot of time polishing if we’re still discussing approach)

As an integrator, a combination of C++ API and tool command line binary has proven to be a versatile thing to plug in and use.

If you get stuck on build system stuff, I or others can help with that if you get the basic bones of what you are trying to do.

This is an MLIR pass right now. It could very trivially combine with an MlirOptMain blob to constitute its own binary, e.g. tosa-serialize-opt maybe.

The original plan was in fact to independently open source this on mlplatform or github. Then I tried to figure out the MLIR translator API but that’s a dead end here. It makes sense to keep this piece close to the TOSA dialect, since it’s a translation of the dialect form to a serialized form, and that’s why we considered putting this in LLVM somehow.

While nothing’s open sourced, the mechanics are simple to visualize:
a) Legalize to TOSA form. Shapes must be fixed… somehow. This is the input.
b) This pass takes a ModuleOp of the TOSA form, and walks it.
c) On every TOSA op encountered, emit its flatbuffer formatted content, leveraging the serialization_lib API.
d) This lib is also a build dependency of the TOSA-consuming reference_model.
e) When updates are made to TOSA, these pieces are updated in lockstep.

The emitted content gets consumed by the reference model, which takes the same reference input, the entire flatbuffer format content, and executes it functionally, emitting the actual NN output. This is how the legalization regression suite works (Test Infrastructure for TOSA - TOSA - Discourse). We also have a large suite of model conditioned real world networks with fixed shapes, that feeds this infrastructure - all the typical networks like Inception, MobileNets, BERT, EDSR, DeepSpeech…

So all these pieces are synchronized together, and have been a stable construct for over a year. It’s a critical piece of the regression infrastructure - without this piece generating flatbuffer/JSON, there’s nothing to drive the reference_model.

Maintenance-wise, this piece will be updated along with any updates to the TOSA dialect, together with updates to the reference_model and serialization_lib . This enables us - internally for now - to run ~15-20k unit tests and about 50 full networks on any iteration of dialect or legalization update we push. It’s also enabled us to catch things like TFLite kernel behavior changes.

Thanks. No argument on what you are trying to do or the utility of it – just iterating on how it will be done.

Emits it to where? I’m guessing the pass takes a parameter like output_dir?

Is there also an inverse operation, to go from a tosa flatbuffer → ModuleOp? (Or: should there be while we’re in here?)

I don’t generally see i/o pases like this, but maybe its ok (definitely fine to do if practical in an external project but expect more scrutiny in the llvm repo). Still feels like this is more the territory of a tool, which would be more of the usual way of integrating this kind of thing across llvm and works fine with build/testing/etc (as Mehdi says, the translate api is mostly a means to have fewer tools in an MLIR build – not an end in itself).

Regardless of tool vs pass, this seems slightly monolithic: there should probably also be an API with the bulk of the implementation like:

  • LogicalResult exportTosaFuncToFlatbuffer(FuncOp, Serializer&)
  • LogicalResult importTosaFuncFromFlatbuffer(Serializer&, OpBuilder&)

Then the tool/translation/Io passes would be the trivial user of these APIs. If you had bidirectional support, I imagine your unit tests of the API would mostly be of the roundtrip variety. E.g:

// RUN: mlir-tosa-convert -export-to-flatbuffer %s | mlir-tosa-convert -import-from-flatbuffer

(A similar pipeline would work with an opt/pass based approach)

If the tool is not bidirectional, we’d need to come up with another mechanism to test it. If history is a guide, just making it bidirectional can be easier (and have utility on its own) than other approaches, but if truly one way, there are things that can be done there (ie. Have a flag to dump the flatbuffer as json), which you then pattern match).

Generally, you want to make sure that however you set up your tool, it supports testing of the API as a first order thing. This also tends to make it a reasonably useful thing for folks to use as well. We’d need to fiddle with it to figure the best way to also test the side npy files but it seems like we can follow-up with that. There are some examples of multi output tools/tests we could look at.

In terms of integration, something like IREE or TF would likely just integrate the c++ API directly in the appropriate production tool (IREE doesn’t ship opt binaries to users generally, because that is a dev tool). They can also pretty trivially build and re-export the tool binary if it is a use to their final product.

If you’d like to post a draft patch with your proposed approach that could be a next step – versus answering all questions of approach here. I’d just advise that based on the discussion here so far, there are likely to be structural comments that necessitate rework, and if you biased towards a rough draft that you haven’t spent much time polishing for initial comment (and note that it is for design feedback in the description), that can help limit the overall cycles and wasted time. For this kind of structure thing, I could also look at an llvm fork if you had a branch somewhere with these changes.

1 Like

I don’t have much to add to what @stellaraccident wrote just here!

I would take it from the other way around: write a utility and use it in a binary, which leaves the possibility of wrapping the tool in a pass if needed, instead of wrapping a pass to expose a utility somehow.
I see concept of a “pass” is itself a “wrapper” for the purpose of integrating in a pass manager, but I’m not sure why you would need to have the export inserted in the pass manager itself? The whole point of the pass manager is to “chain” passes one after each other, while here you don’t really produce any different IR.

I suspect we can get mileage without using a pass to begin with, and consider a pass wrapping later if it justifies itself.

With a serializer API like Stella mention above, I expect any user to be able to do something akin to:

PassManager pm;
// build pass pipeline...
if (succeeded(pm.run(module)) {
  // TOSA serialization
  exportTosaFuncToFlatbuffer(module, serialization_options);

Yes, we pass a parameter defining the output location.

This should be reasonably straightforward, but we just haven’t needed this so far. Having an analogous interface does make sense though - it could be used to consume TOSA content produced from a non-MLIR environment.

Yes, the LLVM bar is one reason why we’d probably keep this out of tree. That’s how it sits now - it gets picked up by custom modifications to workspace.bzl in an internal branch, that cause it to get added to tf-opt .

We realize that generally MLIR serializers are tied to ‘terminal dialects’ like SPIRV or LLVM. TOSA is not terminal but has a mature functional model and regression infrastructure and this serializer to emit the form the model consumes.

Yes we have an API that looks almost exactly like this. It currently gets called from runOnFunction() for the pass.

The APIs currently emit flatbuffers or JSON, the latter for the readability reasons mentioned. Internally they are --tosa-serialize and --tosa-serialize-json passes in tf-opt when we tie it up with TF/TFL legalization. We didn’t quite depend on implementing bidirectionality to test this - we depended on the TF/TF vs TOSA reference model output difference. However there exists stubs of code that could quickly constitute a deserializer - it’s just not actively part of the plan right now.

We’ve been experimenting with trying to make the serialized output a monolithic form, at least in the case of flatbuffers - similar to the .tflite form content in TFLite.

Here are the actively used integrations/plans we have:

  • For TF repo, this permits running the legalization infrastructure prior to a push to the TF/TFL legalizations to TOSA. Without this piece, it’s not possible to set it up externally.
  • For an e2e compiler (like IREE or our own), the reference model output would be a useful golden reference to validate final codegen result against.
  • Similarly, TOSA-compliant hardware verif can compare RTL output to the reference model result as a golden reference.

It is useful for random regressions driving a prototype design effort : generate random frontend sequences as unit tests and emit flatbuffers forms to generate the functional output for that sequence out of the reference model, then lower the TOSA form through the compiler and optionally drive hardware verification and compare that output. In this flow, the TOSA serialization is an intermediate pass emitting flatbuffers/JSON, even as other MLIR passes continue from TOSA targeting backend codegen and RTL verif.

Yes I’ll step back and think about the hands on experience with the multiple approaches here and consider what seems a good draft approach to release this serialization mechanism. I’ve a couple of other things on hand at the moment so I’d probably get back to this start of next week.

I would still probably split the pass manager in two phases: 1) frontend->TOSA and 2) TOSA->lower. The IR after 1 can be intercepted by the tool and exported freely.

An update on this after some time: we have open sourced the MLIR pass in its current form: tosa_mlir_translator.git - [no description] . It’s purely a pass with no manager and thus does not build standalone. It has a submodule dependency on the TOSA serialization_lib, which is the flatbuffers interface also used by the TOSA reference model reference_model.git - [no description] .

It enables every existing legalization from frontend->TOSA to be validated for bit accuracy against the reference result from the framework itself. Generation of the flatbuffers form enables the TOSA model to be run by the refererence model, which currently consumes content in that format.

This currently works internally as part of both TensorFlow and Torch-MLIR repository paths using Bazel and CMake integrations respectively. In combination with (to be open sourced) unit test generators and a full network test harness in the case of the former, this exercises corner cases for each legalization.

For TensorFlow to TOSA legalizations, it emits approx 20K unit tests plus the full network test infrastructure, which consumes flatbuffers/protobuf networks together with their reference inputs, runs through legalization and the TOSA reference model to generate bit accurate output to compare to reference output. For Torch-MLIR, a similar but more early stage unit test infrastructure does the same thing.

We’d be very happy to help integrate these into testing or CI paths of these repositories if there’s interest in doing so.

1 Like