See the previous published edition.
Welcome to the sixth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
TRFT, a new TensorFlow Runtime, is now open-sourced!
This project was previously presented at a MLIR open-design meeting in March (slides - recording)
- mlir-npcomp is a new project intended to prototype compiling numpy programs using MLIR.
::buildfunctions in all ops now take
Builder*, allowing them to create new ops within regions of the ops being constructed.
- The MLIRContext has now an option to disable multi-threading, which can increase the performance of single-threaded processing by avoiding locking on context data structure.
- Liveness analysis now also supports nested regions
- OpBuilder now uses
Listenersinstead of inheritance for tracking updates, meaning that pattern rewrites can now create additional builders so long as the listener is propagated properly.
- DeclareOpInterfaceMethods can now specify a set of methods that should always be generated, allowing for methods with default implementations to have declarations generated.
- A new MutableOperandRange class was added to allow for mutating a range of operands with an operation, i.e., adding/removing/inserting operands. This class also supports updating operand segment attributes such as AttrSizedOperandSegments.
- The pass manager now supports generating “local” crash reproducers. These are reproducers that include the IR directly before the pass that is failing/crashing
- A new general SCCP(sparse conditional constant propagation) pass was added that supports propagating constants across unstructured, structured, and call-graph based control flow.
- All operations now have resizable operand lists, at no additional memory cost.
- A new DenseStringElementsAttr attribute was added to support dense storage with string elements.
- Symbol is now an interface. An immediate effect is that symbols can now define when it is safe to discard when used, regardless of visibility.
- EDSC implementation is undergoing significant simplification and aims to converge with core Builder APIs, according to the discussion.
- ODS now generates accessors for mutating operands.
<operand-name>Mutablereturns a MutableOperandRange, which can be used to add/erase/set operands.
Optimizations and Code Generation
- A pass of buffer assignment has been upstreamed from TensorFlow.
- Linalg generic operations can now be lowered to parallel loops.
- Named Linalg ops are now exercised via batched matmul op.
- GPU modeling and lowering has been reinforced: gpu.func is now only allowed in gpu.module; a nested symbol-based scheme is used to reference launched kernels; multiple GPU modules can be processed in parallel during the lowering.
- Affine analyses now support arbitrary ops as symbol scopes, not just functions.
- Arbitrary ops can now be marked as automatic allocation scopes.
- Vector dialect now supports masked transfers of nD vectors.
- Multiple ops, including std.load/store and loops, now have memory effects.
- ROCL dialect is being homogenized with other GPU target-specific dialects.
- mlir-vulkan-runner supports integer buffers now.
- Work in progress to improve non-32-bit type conversion from standard dialect.
In the Ecosystem
Flang, the LLVM Fortran Compiler
- FIR (the MLIR-based Fortran IR) is getting integrated in the repo piece-by-piece: it was developed out-of-tree originally and wasn’t in the main flang branch when it got merged in the monorepo.
IREE : An Experimental MLIR Execution Environment
- ResNet50 with small input sizes works on VMLA, LLVM JIT, and SwiftShader Vulkan backend.
- IREE how can do an end-to-end dynamic shaped matmul (and batch matmul) from Python to execution on its VMLA backend. This is the culmination of many recent changes across the upstream shape dialect, TF2XLA bridge, converting the shape dialect to IREE’s shapex dialect, and dynamic-shape aware vmla backend lowering.
- The old HLO to SPIR-V direct lowering path was fully retired and removed from the codebase and now the structured ops (Linalg) lowering path is the default for SPIR-V code generation.
- IREE sees various documentation improvements, including refreshed getting started docs and a new testing guide, to the doc website.
mlir-npcomp: Prototype for compiling numpy programs
- Google has open sourced the repo as an out-of-tree prototype that can have components contributed upstream as applicable (this is meant to drive architecture/layering of the system, not to be a commentary on existing ML systems, which can be layered in at a later date).
- Initial dialect modeling ufunc definition and call
- Python bindings sufficient to generate such IR
- Working towards representation of a simple distance transform for review at an upcoming ODM.
- 2020-04-23: LLHD: A Multi-level Intermediate Representation for Hardware Description slides - additional slides - recording
- The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data Analysis
In this paper, we propose the “Collection Virtual Machine” (or CVM) – an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators.
They cite MLIR as “compiler framework aims to provide tools and abstractions for expressing, transforming, and composing of a wide range of intermediate rep- resentations and compilation to a broad range of hardware targets, including ML accelerators, but with a focus on deep learning on GPUs and TPUs” (this last part does not seem accurate).