[RFC] PDLL: a new declarative rewrite frontend for MLIR

Hello all,

Before reading, I would strongly recommend watching the recent ODM to see a lot of the current language features discussed and also to see it in action.

At a recent ODM, I presented a new declarative rewrite frontend targeting PDL, named PDLL. This frontend is intended to represent a modern take on pattern rewrites for MLIR and fix a lot of the ergonomics issues that our current approaches suffer from. The language inherits heavily from the structure of PDL, which itself inherits strongly from the MLIR assembly format and other various MLIR constructs. Aside from the design, the language is also built with modern language tooling in mind. From the beginning, we intend to support IDE features such as code completion, go-to-definition, error reporting, and much more.

With this RFC, I would like to formally propose that we develop this frontend upstream within MLIR with the eventual goal of deprecating our current tablegen solution. The main goal of this RFC is to try to gain an understanding of if the community thinks this is the right direction to go in, and to gather those that are interested in contributing to this effort.

Let me know what you think
– River

The Language itself

I’ve uploaded a temporary phabricator review that contains a markdown document detailing some of the rationale (most of it is also listed below), as well as the current structure of the language itself (I would have posted it here, but it is a tad bit too large). I would like to note that the final design of the language is definitely not set in stone, and represents an initial vision of what we have that works and our intuition towards where we think the language should go. As part of upstreaming, we intend to separate the different features into chunks where appropriate, and we hope that during the review process the language itself can become more refined as members of the community contribute to its design.

One major goal of the language is to be able to properly and nicely support all of the constructs within MLIR. This includes things such as Regions, Blocks, Successors, optional/variadic components, etc. Not all of these are currently supported by PDLL though, as the initial goal has been to build a feature set that can start replacing TDRR (Tablegen DRR). By keeping the initial feature set contained, it will also allow for more design discussions to take place in the community instead of in private (as the language has mostly been designed so far).

Why build a new language instead of improving TableGen DRR?

Note: The section assumes familiarity with TDRR, please refer the relevant documentation before continuing.

Tablegen DRR (TDRR), i.e. Table-driven Declarative Rewrite Rules, is a declarative DSL for defining MLIR pattern rewrites within the TableGen language. This infrastructure is currently the main way in which patterns may be defined declaratively within MLIR. TDRR utilizes TableGen’s dag structure to enable defining MLIR patterns that fit nicely within a DAG structure; in a similar way in which tablegen has been used to defined patterns for LLVM’s backend infrastructure (SelectionDAG/Global Isel/etc.). Unfortunately however, the TableGen language is not as amenable to the structure of MLIR patterns as it has been for LLVM.

The issues with TDRR largely stem from the use of TableGen as the host language for the DSL. These issues have risen from a mismatch in the structure of TableGen compared to the structure of MLIR, and from TableGen having different motivational goals than MLIR. A majority (or all depending on how stubborn you are) of the issues that we’ve come across with TDRR have been addressable in some form; the sticking point here is that the solutions to these problems have often been more “creative” than we’d like. This is a problem, and why we decided not to invest a larger effort into improving TDRR; users generally don’t want “creative” APIs, they want something that is intuitive to read/write. Some of the problems that have popped up include things like multi-result operations, replacing multiple operations, constraint application, etc.

Why not build a DSL in “X”?

Yes! Well yes and no. To understand why, we have to consider what types of users we are trying to serve and what constraints we enforce upon them. The goal of PDLL is to provide a default and effective pattern language for MLIR that all users of MLIR can interact with immediately, regardless of their host environment. This language is available with no extra dependencies and comes “free” along with MLIR. If we were to use an existing host language to build our new DSL, we would need to make compromises along with it depending on the language. For some, there are questions of how to enforce matching environments (python2(please no)? or python3?, which version?), performance considerations, integration, etc. As an LLVM project, this could also mean enforcing a new language dependency on the users of MLIR (many of which may not want/need such a dependency otherwise). Another issue that comes along with any DSL that is embedded in another language: mitigating the user impedance mismatch between what the user expects from the host language and what our “backend” supports. For example, the PDL IR abstraction only contains limited support for control flow. If we were to build a DSL in python, we would need to ensure that complex control flow is either handled completely or effectively errors out. Even with ideal error handling, not having the expected features available creates user frustration. In addition to the environment constraints, there is also the issue of language tooling. With PDLL we intend to build a very robust and modern toolset that is designed to cater to the needs of pattern developers, including code completion, signature help, and many more features that are specific to the problem we are solving. Integrating custom language tooling into existing languages can be difficult, and in some cases impossible (as our DSL would merely be a small subset of the existing language).

These various points have led us to the initial conclusion that the most effective tool we can provide for our users is a custom tool designed for the problem at hand. With all of that being said, we understand that not all users have the same constraints that we have placed upon ourselves. We absolutely encourage and support the existence of various PDL frontends defined in different languages. This is one of the original motivating factors around building the PDL IR abstraction in the first place; to enable innovation and flexibility for our users (and in turn their users). For some, such as those in research and the Machine Learning space, they may already have a certain language (such as Python) heavily integrated into their workflow. For these users, a PDL DSL in their language may be ideal and we will remain committed to supporting and endorsing that from an infrastructure point-of-view.

3 Likes

:rocket: :rocket: :rocket:

Thanks for this awesome step forward in MLIR, even by your standards this is deeply impressive!

I am looking forward to misusing this to try and bring more production-ready customizable solutions to the phase-ordering problem.

As someone who has never grokked TDRR, +1 on something that better matches what we’re trying to express