MLIR News, 26th edition (2/5/2021)

See the previous published edition.

Welcome to the twenty-sixth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

Highlights

MLIR Core

Infrastructure

  • New IRRewriter and RewriterBase classes were added to enable writing transformations that can be shared between pattern rewrites and non-pattern code.
  • Dialect conversion now better supports rollback of operations changing result or region types.

Optimizations and Code Generation

  • Codegen Dialect Overview (the “diagrams document”) is now available.
  • Connected sparse codegen with a sparse tensor set up implementation (see patch and discourse discusion) which makes “everything work for now” without introducing a first-class citizen sparse tensor type yet
    • enables Linalg-to-JIT/AOT runs without need to hand modify code for setup
    • avoids elaborate codegen module for storage scheme set up code
    • efficient “one size fits all” sparse storage scheme solution
    • enables future integration tests and benchmarking efforts (already ongoing)
  • Improvement and extensions to Linalg transformations:
    • Padding added as an option to tiling on tensors to inject static shapes at the tensor level.
    • Hoist padding transformation added to lift padding out of loops and create packed tensors.
    • Refactoring started on codegen strategy to make it more generally usable with OpInterfaces.
    • Generic linalg contraction interface added that a refactored vectorization now targets.
    • Simple benchmarking added to integration_tests, including mixed precision linalg.matmul_i8_i8_i32.

Benchmarking codegen

Benchmarked linalg.matmul against existing hand engineered libraries like Accelerate, OpenBLAS, MKL, BLIS, RUY, Blasfeo and other codegen solutions like TVM and Halide. MLIR can outperform Accelerate, MKL and OpenBLAS when combined with auto scheduling / search for small / medium sizes. Larger sizes will probably require linalg on tensors support. Results captured here: https://mmperf.org/

SPIR-V

  • Graphics support sees progress: (de)serialization for image types are now supported.
  • spv.VectorShuffle op is defined.
  • A few more patterns were added to convert more vector ops to their SPIR-V counterparts.

In the Ecosystem

Flang, the LLVM Fortran Compiler

Upstreaming of the FIR dialect and associated passes that were developed in a downstream fork is in progress!

mlir-npcomp: Prototype for compiling numpy programs

TensorFlow / MLIR-HLO

  • XLA GPU: Migrated AllReduce, AllGather, AllToAll, ReplicaId, PartitionId, DynamicUpdateSlice, RngGetAndUpdateState, Slice operations to use LMHLO.
  • Kernel codegen:
    • Launched most generated unary kernels into production.
    • More investigation into performance differences for binary operations. Kernel generator uses fewer specializations (9 kernels vs. 18 used by Eigen). Some of the extra specializations are not needed in kernel gen as we can model their effect using only host-side code changes. For others, the performance difference is very small. We plan to not add further kernels to keep code size in bounds.
    • Generalized infrastructure for generating MLIR kernels to support CPU platform. This is an early prototype with no guarantees for performance parity, yet.
    • Improved performance of the binary GPU kernels by fixing CollapseParallelLoops pass. Collapsing the loops to 1d before mapping to GPU could lead to non-coalesced memory access patterns. This impacted binary kernels only.
    • Modified code generation to use 32-bit IndexType for device code. This is the same as what Eigen does.

TFRT: A New TensorFlow Runtime

Progress on the MLIR CPU JIT:

  • TFRT JIT can compile nested MLIR modules (modules annotated with tfrt.compiled attribute) into kernels via lowering from Linalg to LLVM dialect, and execute with !t.tensor or !corert.tensorhandle arguments.
  • Currently only bufferized kernels are supported: inputs and outputs to the compiled function must be buffers, and kernel completion signalled with an !async.token.

Recent Talks

Open Meeting on 2/4/2021: Implicit Attribute Propagation (recording)

Recent Publications

We demonstrate the utility of the Multi-Level Intermediate Representation (MLIR) for quantum computing. Specifically, we extend MLIR with a new quantum dialect that enables the expression and compilation of common quantum assembly languages. The true utility of this dialect is in its ability to be lowered to the LLVM intermediate representation (IR) in a manner that is adherent to the quantum intermediate representation (QIR) specification recently proposed by Microsoft. We leverage a qcor-enabled implementation of the QIR quantum runtime API to enable a retargetable (quantum hardware agnostic) compiler workflow mapping quantum languages to hybrid quantum-classical binary executables and object code. We evaluate and demonstrate this novel compiler workflow with quantum programs written in OpenQASM 2.0. We provide concrete examples detailing the generation of MLIR from OpenQASM source files, the lowering process from MLIR to LLVM IR, and ultimately the generation of executable binaries targeting available quantum processors.