Which version of MLIR to base my project on?

As a part of a larger research project, we want to build a custom compiler based on MLIR. We will most likely not change the MLIR source code, but implement our own dialect(s), types, operations, transformations etc. in our own codebase using MLIR, as outlined in the tutorials. I’m still relatively new to MLIR and LLVM, and I’m wondering which version of MLIR we should use.

Question 1) It might make sense to use an official release of the LLVM monorepo. However, since the release notes do not mention MLIR, I would like to ask what is the relation between LLVM releases and MLIR. Does a release of the monorepo contain an especially stable, consistently documented, etc. version of MLIR?

Question 2) Given that MLIR still seems to be a young and quickly evolving project, would you rather encourage using the latest commit on the main branch of the LLVM monorepo and upgrade to the new latest commit from time to time, e.g. when there are helpful new features or important bug fixes?

I see that this is generally a trade-off between (a) version stability and frequency of changes required to our code to reconcile it with breaking changes in MLIR, and (b) access to the latest improvements in MLIR. Thus, I do not expect a solution for my concrete case. However, if there are any best practices (e.g. to reduce the amount of work to adapt to breaking changes), it would be great if you could share them!

The releases do not impact stability or documentation. We try to ensure it is building and green inside the monorepo continuously. But we do not have a more exhaustive test that we employ for releases than not. Same with documentation although we do have the occasional doc sprint which improves things more than done during general development.

I’d recommend staying close to head and upgrading frequently given the age. I have done both kinds of syncing (synching every month’ish to head or doing it daily) and honestly found doing it frequently to be less painful. That way I was able to know what changed that resulted in me not building, rather than trying to update callsites for multiple separate commits all in one go (found this a lot more error prone and unstructured). Besides it also makes it easier to commit changes/improvements/analysis/… back :slight_smile:

– Jacques

Hi @pdamme . I expect you could learn alot by following the development process of the mlir-npcomp or circt projects. These are LLVM projects, but exist outside of the monorepo. Both projects have an informal system for tracking updates to the llvm source tree, which typically happens at the scale of days or weeks. The current ‘known good’ version of LLVM is stored in the project source tree as a git submodule reference. Generally speaking, MLIR is still changing frequently, so it would probably be better to update LLVM more often than once a release, but this can be tuned for your organization’s personal tradeoff between stability and update cost vs. regular investment. I will say that in my personal experience, delaying updates like this beyond one release tends to increase the deferred cost of this maintenance to the point where it hard to manage. Small regular updates (at least once a month?) can be a reasonable tradeoff. With some automation it’s possible to check for changes in a nightly build process, but this can run into overhead with normal daily churn.

I would say that this is true for now, but as MLIR mature I hope that the release will get more stability and back port for bug fixes in the same way that LLVM does.
I’d also add that if you rely on plugin a JIT ultimately, even without any back port for MLIR you still get all the testing done on the LLVM optimizer and backends in the release branches.

We’ve been doing what you want to do for a year now. Based on my experience, my suggestion in answer to Question 2 would be to update regularly, but not all the time, to the last version of MLIR. We’re doing it once a month on average, now. Expect from time to time to spend some time to make your code recompile or to re-understand how some piece of C++ interface or tool works. Technical questions here are answered quite rapidly, but it is expected of you to do you due diligence in finding answers (again, this is my experience).

The dialects part of the MLIR distribution are well documented, it was quite nice to develop on top of them. On the other hand, the second you want to touch things like TensorFlow the level of support and documentation is quite low.

Hope this helps,

Thanks everyone for the quick and insightful responses! I will include the LLVM monorepo as a git submodule and try to update regularly. Thanks also for sharing your experience regarding the update frequency; I will try to find a good balance here.

Which is funny as it is many of the same people involved that wrote the previous documentation, processes and support :slightly_smiling_face: This is a bug that needs to be improved. Currently the focus there has been very much along usage of very specific execution needs and custom entry points have been up to the user unfortunately. Those are evolving but a couple of designs for APIs have been canned post review which made presenting a best practices there more difficult as we wanted more stability there first. The current effort ongoing seems more promising than the previous ~3.

I did not mean to criticize. Indeed, the difficulty of dealing with TF is not so much related to MLIR, but to the whole ecosystem of tools and transformations around it. The various dialects inherit the lack of documentation and sometimes clarity of this ecosystem.