LLVM Discussion Forums

Any front-end framework support such as TF, Caffe?

Is there any way or examples to show how to transform or bridge from front-end net model built from TF or Caffe to MLIR to make compilation?

Such an e2e case that takes opinions about frontend frameworks and dialects is outside of the scope of MLIR itself (afaik) and is in the realm of the various frontends or other middleware.

I’m hesitant to elaborate on this forum in a way that is framework specific, but for the purposes of answering your question and providing context on how we at Google are putting the pieces together for things related to tensorflow, some additional details follow. I’m hopeful that the MLIR project will soon take more of an opinion on some aspects of generic, high level input dialects suitable for compilation/transformation, but even if it does that, your question is more end to end than that, and right now the responsibilities for such things lies with the frontends themselves as users of MLIR.

Tensorflow specific information follows. I apologize for not including source links (discourse says “new users may only post two links”!?!?):

There are various tools for translating graphdefs/savedmodels to corresponding MLIR here:
(Tensorflow git repo)/tensorflow/compiler/mlir/tensorflow (see libraries with names containing “translate”)

In addition, tensorflow also contains a set of passes for lowering the tensorflow dialect (imported above) to xla_hlo. Generally, this is what we consider the entry point for compilation tasks (at Google, we often refer to this as the “tf to xla bridge” or just “the bridge”):

(Tensorflow git repo): tensorflow/compiler/mlir/xla/transforms

If you’re looking for a quick way to do some conversions without a lot of plumbing, you are welcome to use the high level support that my project has for this. Here is a colab for converting a tf.Module to corresponding xla_hlo (click through for a preview):

And a more complete example:

While still evolving, here is our compete list of passes needed to legalize arbitrary tensorflow modules to a form that our compiler can accept (another compiler may vary but this can serve as a guidepost):

TF_IMPORT_PASS_PIPELINE = (
    # Clean up tf_executor and extraneous unused functions.
    "tf-saved-model-delete-unused-funcs",
    "tf-executor-graph-pruning",
    "tf-standard-pipeline",
    "canonicalize",

    # Clean up control flow
    "tf-functional-control-flow-to-cfg",
    "inline",

    # Some further cleanups now that control flow is in better shape.
    "tf-saved-model-delete-unused-funcs",
    "tf-shape-inference",
    "canonicalize",

    # Legalize to XLA
    "xla-legalize-tf{allow-partial-conversion=true}",
    "canonicalize",

    # Now that the IR is starting to look nice, optimize global tensors.
    "tf-saved-model-optimize-global-tensors",

    # Adopt saved_model exports into IREE.
    "iree-tf-saved-model-adopt-exports",
)
  • Stella
1 Like

Thank you, Stella!
Yes, front-end work is not a part of MLIR, it’s the user of MLIR. I just want to see some examples about how to leverage MLIR in a standard or mature workflow of e2e. It involves what kinds of dialects to choose and the bridge between the front-end framework such as TF, Caffe, Pytorch and MLIR dialects. Although there are many dialects showed in MLIR doc, I hope there will be “standard” dialects that many front-end DL frameworks can all translate into. Like the “Clay” in TVM, especially in high level abstract layer in IR stack.

Yes, I’m hoping for the same kind of primitive dialect in mlir-core. In addition to being a likely thing that multiple frontends could legalize to, it would be useful on its own for upstreaming more of the frontend oriented infra that is currently project specific (shape inference, quantization, etc – all of which are hard to do when you have no ops or opinions taken).

I’ve heard talk of a proposal for a new working group to push on this…

Hi @Zeson,

Allow me a little tangent here.
In a previous life, some of us in the MLIR team, have had experience building TensorComprehensions which allowed creating individual layers and JIT compiling them + integrating in PyT and C2. This work was essentially framework-agnostic and runnable directly from C++. Here is an example that used the aten library to alloc/dealloc etc.

The reason I am mentioning this is because I have a quite similar system internally that builds structured ops using this API. There are more examples to metaprogram the IR here (e.g. matmul).

If you are comfortable enough with metaprogramming MLIR in C++, I can provide a few functionalities that make these things available with a JIT similarly to what I am using internally to accelerate my own work. I am able to build small models and check that things run.

You will be able to emit CPU code that JITs and runs and that is fully independent of any ML framework so you can use whatever you prefer. The caveat is that performance will be essentially what you get with clang -O3 from sequential LLVMIR. But you can call whatever MLIR passes you want, see what happens and experiment further.

If you want something more exotic like GPU, framework integration out of the box or performance, then this is probably not what you’ll want but I thought I’d still put the option out there :slight_smile:

I expect framework integration to be relatively simple and in line with the work we had done before, but it’s clearly not something that will happen on the MLIR side.

Would that independent C++ metaprogramming path + JIT runner be of interest?

Right. In exact, MLIR is just the framework to construct IR, but not IR itself(only with basic IR structure but no concrete operations and types). So we need make a concrete consistent IR layer(dialect) for DL front-end framework. Although it’s easy to make different dialects in MLIR, it’s better to get a consistent one for former phase of a compiler to make it target-independent. As I know now, many different DL IRs different team use, right? For the most entry level IR for DL, I think it’s no more big difference between each other.

If you have more information, you can put links here. Thanks.

Thanks, Nicolas!

Yes, it’s good example. I just want to see what kinds of dialects that normal project would choose to use, especially the entry level IR. With common front-end phase work, we can only pay attention to the latter phase in MLIR such as my own target chip. So is there any common or consistent entry level IR as an interface to many DL front-end frameworks?

It would better to know how to integrate GPU and other target accelerator. I would check your example first. Thanks.

We (IREE), have chosen to accept inputs that are a combination of xla_hlo, our “flow” and “shape” dialects and a subset of MLIR std ops (for cfg and some other odds and ends).

We also built the saved_model dialect for tensorflow so that it has an MLIR-modeled way to represent functions and global state that is compatible with a familiar module structure (as opposed to the “sea of nodes” and under typed functions of pure graphdef modeling).

We then helped build/leveraged the pass pipeline I showed in my earlier message to legalize that to the set of input dialects we support. From there we compile to multi-arch binaries, generating cpu based host/scheduling/interface code and device binaries (currently for GPU but meant to expand to other kinds of devices). All of the codegen is done against our HAL layer, which is inspired by vulkan’s buffer/dispatch/synchronization primitives (and can be implemented directly in top of a vulkan driver with no further glue). In general, we’re trying to fill in a plausible “rest of the story” from frontend dialects down to binaries which target devices that can be implemented in terms of certain vulkan-like primitives in our HAL.

I’d really like to see what Nicolas refers to made publicly available as it is nice, expressive and simple to poke at from high level things. I’d also like to see the community start to tackle definition of some common denominator frontend dialects as you say. What we have is fragmented or weighted strongly towards tensorflow (by way of xla hlo).

In general, I’d say that hooking a framework up to an mlir-based workflow is relatively easy for common algebra ops and gets quite hard/detailed when you get to the level of modeling all features and quirks. Our experience with tensorflow (which has a high quirk to op ratio) is hopefully not typical in terms of hardness, but I think that high fidelity implementations of any of them is likely to be a big project.

Hi @Zeson

As of today, there is nothing currently officially adopted by MLIR for the ML domain.

The WorkGroup on Tensor Compute Primitives Dialect will be where these discussions will happen longer term. Regular meetings have not yet started.

In the meantime, the Linalg dialect (see the rationale) is known to be sufficient to represent a large class of layers. It adopts a representation that extends what was previously demonstrated by prior art in Halide, TVM and Tensor Comprehensions.

I have put the end-to-end ModelBuilder JIT example I mentioned earlier in the IREE repository here if you want to experiment. The rationale above should be clear as to why we think this dialect is a good path to invest in (irrespective of the current implementation details of the C++ JIT or the need of a user-friendly language). Linalg will evolve along with the WorkGroup discussion and help iterate to a common abstraction.

The concrete example usage in IREE/experimental/test demonstrates how to make a 3-MLP model written in MLIR interop with C++ allocated buffers (for input and output) as well as MLIR allocated buffers (for weights and bias in this example).

The parallel should be easy to make the the following TensorComprehensions example which has been previously integrated with C2 and PyTorch.

For all these frameworks, the integration is essentially a simple API matching effort.

Please let me know if this is useful and whether you’d be interested in seeing (or collaborating towards) a more serious example.

Thanks!

Thank you very much. As I see the rationale of Linalg, it’s a middle level of IR and can not handle graph optimization. So is it better to enter from higher level IR such as HLO? Or there are common higher level IR corresponding to the level of HLO?

The HLO equivalent is: Development of high-level Tensor Compute Primitives dialect(s) and transformations ; it should layer nicely and re-use as much of Linalg as possible though.
Do you have specific graph optimizations in mind that Linalg wouldn’t be suitable to support?

This is a bold statement :slight_smile:

One thing to consider is that Halide has been shown to be very strong at the whole graph optimization level (https://halide-lang.org/papers/halide_autoscheduler_2019.pdf). Word on the street is that the latest WIP results are far better than existing alternatives. As you can see in the rationale Linalg learns from Halide and other things. In fact Tensor Comprehensions was an attempt to compose Halide transformations with Polyhedral transformations but the impedance mismatch was just too big.

You can view “Linalg + Affine” as “Halide + Polyehdral in a common IR” in a first approximation. I (and others) would call the loop/affine/polyhedral level “mid-level”, then Halide/Linalg would be high-level in my book. The level above is functional and operates on tensors (which Linalg has evolved into supporting).

Now clearly, Linalg does not have graph level optimizations implemented. Actually I’ll retract that, @MaheshRavishankar contributed this nice transformation at the tensor level to fuse arbitrary pointwise + broadcast-y ops into a single generic.

In the Evolution section of the doc we mention the other inspiring work, and TensorRT/TASO and friends are definitely there. This is what Linalg is evolving towards being capable to express.

It would be useful to list the transformations that you view as fundamentally “graph optimizations” and that should be the kind of discussion that the Development of high-level Tensor Compute Primitives dialect(s) and transformations group should start discussing but so far the conversation has been very noisy and focusing on expressiveness and existing implementation artifacts rather than core essential IR principles to support transformations.

I would also caution against thinking of Linalg as “just the generic op”. Other ops are necessary for value and memory representations (e.g. linalg.reshape, linalg.concat(NYI) etc). The key is to keep these other ops small because they will have multiplicative effects and will become unmanageable.

This is why starting from transformations to design this IR is crucial.

I’d like to understand more about what the JIT runner entails. Perhaps you could present on this at the Open design meeting?

Steve

So, about graph optimizzations has MLIR any role inside the green box (in XLA/HLO)?

Cause in the ONNX there are optimizers:

with currently these contributed passes.

Probably I suppose Ngraph has It Is own and so on the other frameworks/frontends…

Thanks for surfacing these @bhack this is very helpful.
So I looked at a few of those and from a compiler writer’s perspective I’d say they are simple rewrites on higher level IR.

From glancing through a bunch of those I can say with some confidence that they fall within the following types of simple MLIR pattern rewrites;

  1. Constant foldings
  2. Dead code/argument elimination
  3. Simple rewrites in “fused” form / canonicalizations

I think a lot of this happens out of core, on the Tensorflow MLIR dialect, @jpienaar and @joker-eph should be able to point to the right places (I am currently OoO). Each of those should be pretty trivial small rewrites in MLIR.

The TF dialect is higher level than the HLO dialect. ATM there is nothing interesting happening on the MLIR HLO dialect directly I think. Some of us are using existing XLA transformations and specifying them on HLO so that they are applied later by Linalg.

I have not had time to scan deeper though, having no computer access, so I’d be interested in feedback whether there are less trivial transformations in the list @bhack linked.

Thanks!

About full graph optimizzation at next C4ML 2020:

For example, TensorFlow currently contains approximately 53,000 lines of manual optimization rules, while the operator specifications needed by TASO are only 1,400 lines of code.

I don’t know if MLIR Is going to impact these 53k lines. But if these 53k lines are outside MLIR perimeter I suppose that there Is still a big missing higher level common space/rappresentation to be shared between frontends/frameworks.