[RFC] Tosa import/export tool

There exists a small amount of code (currently maintained as an external patch) which imports/exports from the Tosa reference model flatbuffer format to the corresponding dialect in MLIR. We would like to decide on the right place to put this tool and the form it should take and then contribute it to the codebase.

Motivation

As a specification, Tosa provides a reference model and regression test suite for conforming programs. This model is built explicitly to be as-simple-as-possible, correct/not performant, and not for production use (now or in the future). It exists solely as a vehicle for verifying conformance with the spec, of which MLIR’s Tosa dialect and lowerings (TBD) are an implementation. It has been implemented with a Flatbuffer serialization interface, which is versioned with the specification.

We would like there to exist a tool to translate between programs in the MLIR Tosa dialect and the corresponding reference model serialization. Ideally, as an implementor, this tool would exist in the MLIR repository under a path like mlir/tools/tosa-serializer. This would give us a convenient mechanism to:

  • Import programs from the regression test suite for development and execution in the MLIR-based compiler (i.e. run with mlir-cpu-runner, etc).
  • Export Tosa-dialect programs generated through other means (i.e. via the TFLite->Tosa importer) for evaluation under the reference model. This would form the basis for broader regression test tooling between systems (not maintained in MLIR).

Non goals

This is not production code and should not become production code or a general purpose serialization mechanism for transporting program fragments for non testing purposes. The reference model itself has been written in a clarity/simplicity over performance manner, and this tooling should fall into the same category.

Why should this go in the MLIR repository?

Ideally the specification and related reference model continue to treat LLVM/MLIR as a Tosa implementation and keep it at arm’s length. While such a serialization tool could go in the reference_model repository, it would invert the relationship, making the reference model have a dependency on the implementation that is meant to conform.

Practically as well, while the spec is versioned and expected to evolve in an explicit fashion over time, the corresponding MLIR dialect is not (by design). As MLIR evolves and IR constructs change, how Tosa is represented in MLIR will change in the details. As such, it presents a moving target, and it makes the most sense to co-locate the serialization tool with MLIR and version them together.

Technical options

Presently, the tool is C++ code which uses MLIR’s C++ API to read a module/function containing Tosa ops and writes out a corresponding flatbuffer by using generated flatbuffer serialization code (plus the inverse). While it would be nice to not have to rewrite this, the implementation decisions here are not load bearing. Specifically, as a testing tool, this should bias towards simplicity (both of the code and of the integration), and a direct coupling of C++ APIs may be more engineering than we strictly want to maintain.

Options:

  1. Adapt the existing C++ code and land it into mlir/tools/tosa-serializer as an optional tool. It would use find_library to take an optional dep on the flatbuffers C++ library. As part of this option, we would likely include a snapshot of tosa.fbs or the generated code in the tools directory itself (vs taking a cross-repo dependency on the reference model).
  2. From the tosa-serializer tool, emit some lighter weight variant of the flatbuffer representation. After a fashion, flatbuffers does interop with JSON, as an example (albeit in a way that is not the most approachable).
  3. Rewrite the tool in Python, introducing no further hard dependencies. In this approach, the reference tosa.fbs build upstream would be extended to also emit generated Python sources. We would snapshot the resulting tosa_generated.py python module into mlir/tools/tosa-serializer and then implement mlir-tosa-serializer.py itself to just import this and use a pip install flatbuffers available on the host system. The tool would use the MLIR Python API for building and reading the Tosa containing MLIR modules. For testing the tool itself, the corresponding lit directory could be enabled only if the Python flatbuffers module was available.

Of these options, based on my experience, #2 has a lot of negatives and should be eliminated. #1 preserves the most existing code, but at the expense of cross-project C++ build/dep complexity. #3 would involve writing something new but would introduce the minimum coupling, in my opinion (and also has the side benefit of it “never” being confused with production code).

For completeness, there is also a possible #4: Standardize a binary serialization format and stable API for MLIR and get it to a point of maturity such that we would feel fine having the reference model introduce a dependency on LLVM for it. While I would generally love such a thing to exist, I think the level of production engineering involved is mis-aligned with the goals here. Even if such a thing exists, it would invert the relationship of the projects and I would bias towards the simplest possible thing for a testing case like this.

Opinions? We would like to make progress on this first thing in the new year.

Thanks.

@sjarus @jsmolens @rsuderman

This sounds like a translate. Why does it not fit in with other similar translates and should be a separate binary?

Good point: it can. My primary reticence is dependencies (specifically, adding a dep on the flatbuffer C++ library to a central tool). If we are comfortable with that, it can go into translate. If not (and I am myself torn on this, given that this is a test-only dependency and it may not rise to the level of us wanting to add the complexity to the project), then biasing towards isolation makes more sense.

Minor correction:

  • The existing TF patch only exports to a TOSA serialized format from MLIR at the moment. It does not import back into MLIR, but I don’t see any technical constraints preventing this from happening.

A few comments:

  • The home of the definitive, versioned tosa.fbs file is currently in the reference model repository on mlplatform.org. I don’t believe anything in this RFC would change this arrangement. Having said that, we do need to make sure all tools are diligent about setting and/or checking version information.
  • For option (2), the reference model supports reading both JSON and binary flatbuffer formats of the serialized network. In practice, we have done most of our development with the JSON version because it is human-readable. It’s worth noting that only the C++ Flatbuffers library can read and write JSON. More specifically, the JSON option is not available in Python, although serialized networks can be converted with ‘flatc’ on the command line.
  • For option (3) the TOSA reference model repository includes a Python implementation of the flatbuffer writer in verif/tosa_serializer.py. However, Python code to read a TOSA .mlir file would need to be written and maintained. We weren’t aware of the MLIR Python API - sounds interesting!

Thanks @jsmolens for correcting what I mis-communicated.

Also, the MLIR Python bindings are documented here: https://mlir.llvm.org/docs/Bindings/Python/ They are new but should have enough features to do this. Worth considering, imo, because it is a lot easier to manage these things as runtime deps (i.e. just install the things and do “import mlir”).

1 Like

Thanks @stellaraccident for the writeup. I think you and @jsmolens already covered pretty much everything. One minor point to add on:

  1. The JSON generated from flatbuffers c++ library is actually not a standard JSON format but a JSON-like format. A thirdparty JSON reader or reading through python json library would fail in my experience.

I’m supportive overall.

I don’t quite get this part, what are you referring to here with “a direct coupling of C++ APIs”?

That seems like the most natural solution to me, as long as it is behind a CMake flag: -DMLIR_ENABLE_TOSA_FLATBUFFER_TRANSLATION=ON would make it available to mlir-translate.

I was just referring to the systemic cost of adding new C++ dependencies and wanting to be sensitive to that. Also, serialization formats are an important topic in general and I know that there are strongly held opinions about how flatbuffers do or don’t fit into that which I am sympathetic to but don’t think apply here. Any weirdness of phrasing was me trying to state that I think flatbuffers and interfacing with an upstream serialization approach for conformance testing is fine for this case but not a general license to embrace that broadly in MLIR.

Agreed - if we converge on that option, I think it is the most consistent with the rest of the project.

+1 from me too and making it optional avoids dependency concerns for me.

Agreed that it makes sense to colocate given this constraint. How non-trivial do you expect the translation to become? I don’t think we should put any “real code” in Python. For example, I would not want a person updating the TOSA dialect (or any MLIR dialect) to need to context switch and update Python code. I think it probably makes sense to keep in C++.

I would expect it to be pretty mechanical, and there may be simplification options with judicious use of MLIR features (like an appropriate op interface). Certainly of a much lower level of complexity than the Torch/MLIR converter because these are actually isomorphic representations.

Thanks @stellaraccident for beating me to this RFC! I’m traveling today so apologies for typos from a mobile device.

A couple of additional points:

  • the serializer is currently an MLIR pass that we now overlay upon the existing open sourced code. It expects the MLIR form to be purely TOSA IR, though this was slightly relaxed before (eg std dialect constants were accepted). @kevincheng, is this still the case ?
  • as @jsmolens mentioned , there’s only a serialization path but deserialization can also be accomplished ; we just haven’t needed it yet .
  • I looked at the SPIRV serialize/deserialize implementation for inspiration. To answer @jpienaar asking about this, the same approach might work; the fundamental problem is that compiling this requires the Flatbuffers standard headers to be included ; it does not appear any other LLVM/MLIR module has such a dependency and introducing this dependency was what we were unsure about .
  • it isn’t really hard to switch from a pass at present to invoking the serialization/deserialization interfaces; really the only showstopper here is ‘how do we build as part of LLVM with an optional dependency on the external flatbuffers.h ?’
  • it is expected that as the spec evolves, the tosa.fbs will be updated to follow, and will be consumed by the serialization API.

Partially yes. std.return is currently the only operator allowed.

I would expect it to be pretty mechanical, and there may be simplification options with judicious use of MLIR features (like an appropriate op interface). Certainly of a much lower level of complexity than the Torch/MLIR converter because these are actually isomorphic representations.

The translation is fairly mechanical. Ops with similar operand/attribute signatures are handled through macros (could be refactored to use templates), while ops with unique signatures have their own functions to write out the Flatbuffer fields in the right order.

Whether in C++ or Python, the same total number of fields need to be extracted and put into Flatbuffers form, so neither language has an obvious advantage from that standpoint. I feel the generated C++ interface is more mature because it has proper enums (and string tables for enums), while the flatc doesn’t create those in the Python version. These things make debugging easier when adding new ops.

Reviving this thread after a long time. There’s been significant development of the underlying bits in recent months. To summarize:

  1. The actual flatbuffer schema, headers and flatc-generated forms (C++ and Python) are in https://review.mlplatform.org/tosa/serialization_lib .
  2. This is included by the TOSA reference model (reference_model.git - [no description]) as a third party dependency in order to parse and consume flatbuffers content.
  3. We have an as yet unreleased MLIR pass that takes fully TOSA form and serializes it. This pass also consumes the same serialization_lib and generates the flatbuffer form after legalization to TOSA. This pass has bott CMake and Bazel build files, the latter permitting it to be included within the TensorFlow build.
  4. The serialization_lib stays synchronous with updates to the spec and reference model.

Third Party Dependencies:

  • The serialization_lib content depends on flatbuffers.
  • The serialize_tosa pass depends on serialization_lib

There are a couple of options here:

  1. The translator sits in a standalone repository, e.g. a github project. Would build a tosa-serialize-opt or similarly named binary.
  2. Follow previously mentioned suggestion - make it a translator within llvm-project MLIR, gated by -DMLIR_ENABLE_TOSA_FLATBUFFER_TRANSLATION=ON .

The second option ties the serialization to the dialect implementation in-tree and aligns to the idea of MLIR serialize/deserialize APIs, but presents two problems:

  • How to define the existence of submodules in the form of the serialization_lib it would include ? The llvm-project does not appear to use submodules.
  • Synchronization issue between definition in serializaiton_lib and MLIR implementation, unless a submodule hash or similar means can help resolve it.

I did look briefly at implementing tosa_serialize in Python too, but this approach isn’t currently POR on our part. It’s an interesting option, but would need some documentation on how to load the .mlir files , from which the mechanical serialization can be done in Python.

Any thoughts and suggestions would be welcome.

My understanding at the time, was that everything would be in-tree, but the dependency on the flatbuffer library itself. For this we’d use the regular CMake mechanism for finding flatbuffer, leaving it up to the user to install it separately.

Thanks! Yes this part seems relatively straightforward. Guarding the dependencies seems to be just a CMake mechanical matter that I’ve mostly gotten sorted out.

However there’s an interface problem here. I tried to use TranslateFromMLIRRegistration which in turn defines TranslateFromMLIRFunction which takes a ModuleOp and raw_ostream as parameters.

This is a little problematic. Historically the TOSA flatbuffers form emits a flatbuffers file and a collection of npy files for each layer weights - it is not a single monolithic file with everything embedded in, like .tflite. We’d like to be able to define command-line parameters for an output directory and optionally an output file name for the flatbuffers file.

What would be a good approach here ? Keeping it as a pass as it is now would be easiest to tie together, but it’s technically a translator from MLIR to another format; the MLIR translation interface is just a little restrictive here.

Ah that’s interesting!

One way to do it would be to have the flatbuffer file be emitted through the stream, and add another command line option (and function parameter) specifying an empty directory where to write the other files.

Yeah that might work - it does add the redundancy of having something like -o path_to_output/test.fb --tosa-flatbuffer-dirname=path_to_output . The serializer is already presented with the opened output file to stream to so -o get the full path anyway.

Another thought - how does the MLIR translation API integrate with an external project ? For example, I can invoke any builtin MLIR TOSA passes directly from within tf-opt as we do for the TF legalizations. But how does an external project like that access the translator from TOSA to flatbuffers ?

As a pass, this is easily done, but is there such a concept as being able to access a translation to/from MLIR from within an external project that has MLIR as an out-of-tree dependency ? Like MlirOptMain, is there some kind of translation manager that can let a developer constructor a translation manager that registers translations defined elsewhere and make them visible from that tool ?