See the previous published edition
Welcome to the forty-sixth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
- Multiple RFCs are in-flight:
- [RFC] Dialect for bufferization-related ops
- [RFC] Add a printf op
- [RFC] Arith dialect versions of affine.apply/min/max
- [RFC] Elide Type/Attribute prefixes when using Declarative Assembly
- [RFC] Verification Order In MLIR
- [RFC] Range analysis when evaluting/lowering affine maps
- [RFC] Tosa import/export tool
- Attribute/Type parser/printers have gotten a few touchups:
- ::parse/::print methods should take
- The mnemonic should no longer be added explicitly by the printer.
- The boilerplate Dialect::parse*/print* dispatch can now be autogenerated.
- Attributes and types can now define their assembly formats declaratively!
- ::parse/::print methods should take
- ElementsAttr value access API has been tweaked:
- getValue/getFlattenedValue have been removed in favor of operator access on the range returned by getValues.
Identifier has been replaced by StringAttr
- Please update your usages! The Identifier alias will be deleted in a few weeks!
- Various performance tweaks: attributes (verify), unique ODS constraints
- Numerous usability improvements: construction of affine expressions, moving operations between blocks and detaching them, optional operands in constructors.
- Split padding and hoist padding out of the tiling pass into a separate pass.
- Improve the buffer size computation for hoist padding to reduce the footprint of the hoisted buffers.
- Use Fourier Motzkin to compute the size of the padding and of the hoisted buffers (now they share the same code).
- Separate Comprehensive Bufferize from Linalg dialect + various other improvements.
- LLVM type conversion now supports recursive types.
- Rewrite vector.transpose lowering to a single unrolled vector.shuffle.
- Add AVX2-specific lowering patterns and ongoing investigation of non-peak perf.
- Various improvements to convolution lowering and vectorization.
- Ongoing experiments to get to peak single thread CPU performance.
- Sparse compiler progress:
- Reduction “scalarization” now spans all for-/while-loops over all invariant dimensions. When vectorized, SIMD chains are formed
- Sparse tensor output supports “injective” cases (without reduction)
- SPIRVBase.td was regenerated from upstream spec to include new extensions and symbols.
- More atomic ops were defined and a pattern to convert shufflevector is added to SPIR-V to LLVM conversion.
IREE : An Experimental MLIR Execution Environment
- Making progress towards using upstream ComprehensiveBufferization in Linalg for dispatch region code-generation. (IREE uses a version of this that is much more simplified, but special-cased for IREE)
- CPU backend being evolved to perform better on x86. With (this) PR, IREEs x86 backend runs the transformer model in 30 ms (Baseline is TF+XLA run in 45 ms as measured by us). Effort underway to make the x86 backend mirror sandbox as closely as possible, that is known to get peak GEMM performance on a range of
- CUDA backend:
- Add support for tensorcore code generation for fp16 and tf32 types
- Basic tuning for tensorcore performance but still bound by copy to shared memory
- Code generation for softmax has landed. In some benchmarks MLIR-based compiler is up to 6 times faster than Eigen.
- Vectorization for TensorFlow Tensor on boolean types is broken: memrefs of i1 type don’t map directly to vectors of i1 type: memref of i1 is actually stored as memref of byte sized values. We have a workaround to unblock experimentation by adding a pass that performs i1->i8 tensor type conversion early in the pipeline. However, this pass is too broad to use it for actual workloads.
LLVM Dev Meeting (will be on youtube later):