mlir-opt-like testing tools will likely want to be able to operate on every possible passes and dialect, a production compiler would try to minimize the binary size and the number of dialects loaded in the context to reduce the memory footprint as well as some runtime aspects (number of canonicalization patterns to go through, etc.).
As MLIR is growing (we maintain almost 60 dialects out-of-tree inside Google), we also hit the limit in maintenance of MLIR-based tooling in terms of registration and dependency management.
I took some time to revisit how we’re handling it.
At the moment there is a global registry of Dialects, and factory function are registered globally there. When constructing a MLIRContext, all the globally registered dialects are automatically loaded in the context.
This is causing some problems as the client creating the MLIRContext has to ensure that before creating it all possible dialects that will be needed are in the global registry.
The global registry encourage a pattern of using dynamic global constructors to register dialects, and force-linking these objects in the final binaries. This complicates the build system configuration and often breaks the modularity of the build: for simplicity these global constructors are always linked-in, pulling inside the binary all the code for the associated dialect, and worse: we end up with loading in the Context many more dialects than needed, increasing the startup time and the memory footprint.
As an example,
tf-opt (TensorFlow flavor of
mlir-opt) always register and load in the Context these dialects:
affine, avx512, chlo, coo, corert, data, gpu, linalg, llvm, llvm_avx512, lmhlo, mhlo, nvvm, omp, quant, rocdl, scf, sdbm, shape, spv, std, tf, tf_device, tf_executor, tf_saved_model, tfd, tfjs, tfl, tfrt, tfrt_dht, tfrt_fallback, tpurt, ts, vector
The direction I’m taking right now is to move away from relying on the global registry as much as possible and not load any dialects in the MLIRContext on construction.
Instead Dialects will be loaded in the context under three mechanisms:
mlir::MLIRContext context; // Load our Dialect in this MLIR Context. context.getOrLoadDialect<mlir::toy::ToyDialect>();
A compiler like the Toy compiler needs to load the ToyDialect before the frontend emits operations from this dialect. However we don’t want the compiler to explicitly list all the possible dialects involved in the optimizer. Hence the second mechanism below.
As pass dependency: the other way Dialects are needed after the frontend emits the IR is because transformations are applies to the IR. As such it is natural that the individual passes declare the dialects they intend to produce: a Pass consuming Toy and producing a mix of Linalg and Affine would declare that it depends on the Affine and Linalg dialects (no need to declare Toy as it is expected in the input).
The PassManager, when starting the processing of a PassPipeline collects the required dialects from the list of passes in the pipeline and load them in the context.
Lazily loading during parsing: production compilers (Flang, TensorFlow, etc.) don’t need to parse arbitrary MLIR and should only rely on the two mechanisms above. For
mlir-opttools and similar, we still need to load in the context all the dialects we expect to process. For now we will still use a global registry pending more refactoring.
This registry however won’t trigger the load of all dialects in the context ahead of time. Instead the Parser is taught to load dialects from the registry lazily as it encounter Operations/Types/Attributes from an unknown dialect. This setup is intentionally designed this way to allow testing for the pass dependency mechanism described above. When using mlir-opt and running an individual pass, only the dialects present in the input file are loaded in the context, if a Pass creates operations in a dialect not present in the input it’ll fail, which will help ensuring that the pass dependency is correct.
This is all implemented in a prototype revision here: https://reviews.llvm.org/D85622