MLIR News, 47th edition (11/13 - 12/10/2021)

See the previous published edition
Welcome to the forty-seventh issue of the MLIR newsletter covering developments in MLIR, and related projects in the ecosystem. This is brought to you by a collective effort of contributors, we welcome your contributions!

MLIR Core

Infrastructure

Codegen

  • Sparse compiler now fully supports sparse tensor outputs that materialize uninitialized using (1) insertions in pure lexicographic index order if all parallel loops are outermost or (2) one-dimensional access pattern expansion (a.k.a. workspace) where feasible
    • This now allows for e.g. SpMSpM with result sparse as well
  • Improvements and refactoring of Comprehensive Bufferize. Working towards making the code base compatible with Core bufferization.

SPIR-V

  • scf.while to SPIR-V conversion is now supported.
  • Math ops to SPIR-V conversion now can generate OpenCL extended instructions.
  • spv.AtomicFAddEXTOp is defined and capability dependency bugs for atomics were fixed.
  • SPIR-V serialization now supports an option to control symbol name generation.
  • A few issues in SPIR-V nested structured control flow (de)serialization were fixed.

TOSA

  • Degenerate cases of Conv2D and DepthwiseConv2D lower to FullyConnected / Mul
  • Dynamic batch dimensions are support on the majority of MobileNet operations
  • Tosa.pad now supports an arbitrary padding value
  • Strided TransposeConv can now lower to a regular Conv2D

In the Ecosystem

Flang, the LLVM Fortran Compiler

IREE : An Experimental MLIR Execution Environment

  • The static library example using an EmitC path has landed. This allows to convert the VM dialect to EmitC and to skip the bytecode interpreter in the final executable.
  • IREE moved to use the new Streams dialect. This allows for better overlapping of work, barrier elimination, etc. This resulted in significant reduction in runtime overheads on Vulkan benchmarks
  • IREE SPIR-V backend migrated to use the tensors to vectors path (with late bufferization). This effort is part of evaluating fusion of padding with convolution ops on vision models.
  • IREE CPU backend now uses the tile + fuse being developed in MLIR core for the x86 and ARM backends. This is part of the work to improve x86 performance in IREE. Current status is for Bert model that is being tracked, the performance is better than Tensorflow + XLA (30ms with IREE vs 40ms for TF+XLA).
  • IREE Cuda backend on the Bert model is also being improved for Ampere cards. Part of this work is to
    • Enable use of tensor core units on GPUS with tf32 types
    • Enable use of asynchronous copy from global to shared memory instructions.
  • The CPU backend in IREE is preparing to move to use the Tensor Codegen Drivers in IREE-LLVM-Sandbox as a way to pull in the performance that sandbox is getting on GEMMs, Reductions, Transpose, Convolution, etc. into the IREE CPU backend.

TensorFlow / MLIR-HLO

  • XLA CPU: We have implemented a first prototype that hooks up a purely MLIR based flow to JAX via the existing XLA CPU interface. This will allow us to experiment and identify missing items in our planning before we execute on a purely MLIR based flow next year.
  • Kernel Generator: We have launched some more kernels to bring the GPU dtype coverage closer to parity with coverage on CPU.

mlir-hs

  • Progress towards enabling roundtrip between AST and Native form.