Let me give you one key piece of information that seems missing from your reasoning: MLIR does not have a list of predefined instructions, types or attributes (unlike LLVM). In a sense, MLIR is not an IR itself, it’s an IR constructor. The entire point of MLIR is its extensibility, that is, anybody can add operations and types with (almost) any semantics and they are as good as, e.g., “standard” operations. So the answer to a question “can I do X with MLIR” is always “yes, provided you define the operations/types you need to do X”. The harder question is how these operations/types look like, and how they interact with other operations/types that may exist in the MLIR ecosystem.
You don’t need a language or an AST to produce an IR, MLIR or otherwise. You can just create the appropriate components (operations, blocks, regions, types) using the relevant APIs, e.g. mlir::Builder
. Those APIs are explained in the tutorial, you can call them from any code you have, there is no need to have an AST.
Define the dialect that has the semantics you want, make sure it is expressive enough. Then write your binary analysis code and use mlir::Builder
and relevant APIs to construct the IR from your analysis results. Then you can write the lifting to the LLVM dialect using the pattern rewriting infrastructure. It may require some extensions to the LLVM dialect, it’s not complete yet. At the LLVM dialect level, you can export to LLVM IR and compile it back to binary using LLVM.
But MLIR is a compiler infrastructure and so is LLVM. This is an unfortunately common misconception perpetuated by some textbooks that compilers are about lexing and parsing. They are much more than that.
MLIR is a compiler framework that lets you define the abstractions you need. By itself, it is not necessarily “multi-level”, it’s the set of abstractions you use that is. To me, at this point, MLIR is a proper name, the same way as LLVM used to stand for Low-Level Virtual Machine but that has not been true for quite some time.
What are the different levels of abstractions you would need? Assembly instruction level and LLVM dialect level? Something else?
MLIR is very flexible so it is definitely feasible. If by MLIR approach you mean expressing everything as MLIR operations and transforming them using the pattern rewriting infrastructure, then also yes, it sounds feasible. It will be interesting to see how we should adapt the infrastructure to support such use cases (it was mostly designed for lowering, not raising).
MLIR has the LLVM dialect, which gives you the same abstractions as LLVM IR, but which is not complete yet. And you must translate the LLVM dialect to the LLVM IR if you want to use LLVM proper.
As long as you introduce operations and/or attributes for that purpose.