MLIR News, 38th edition (7/10 - 7/23/2021)

See the previous published edition.
Welcome to the thirty-eighth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

Highlights

MLIR Core

Infrastructure

  • InferShapedTypeOpInterface is split into two. One of the parts, ReifyRankedShapedTypeOpInterface handles the reification of the shape of a result type in terms of its operands when the type is ranked. This was previously done using the reifyReturnTypeShapesPerResultDim method in InferShapedTypeOpInterface. The newly created interface better matches needs of compilation lower down the stack which deals with ranked shape codegeneration, whereas the InferShapedTypeOpInterface better matches needs of compilation higher up the stack where unranked shapes are resolved. The pass ResolveShapedTypeResultDims is also split to reflect this change, with the ResolveRankedShapeTypeResultDims using the newly created interface.

Table-driven Infrastructure

Codegen

  • Sparse compiler progress:
    • Replaced linalg.copy with memref.copy, which will fit nicer with bufferization improvements (this also removed dependence on linalg-to-loops pass)
  • Landed the first version of software pipelining transformation.

SPIR-V

  • spv.GLSL.FMix was defined.

Build

  • libMLIRPublicAPI.so was removed. It was an early artifact of the Python integration and was not intended to force a shared library API for everyone. Note that we are not opposed to an aggregate library similar in purpose to this existing but the existing mechanic was wrong and needs to be revisited.
  • Python Build Re-engineering. The Python build was re-engineered to directly incorporate downstream packaging needs in the core setup. The design is based on creating static, self contained packages, as is a best practice for Python deployment. As a consequence, it also makes it impossible to have the “TypeID mismatches” that plagued previous versions. Downstreams updated: npcomp, circt, mlir-hlo. Looking for someone to finish an up-stream sample and exercise Windows builds (which should now work across projects).
  • Emit strong definition for TypeID storage in Op/Type/Attributes definition (and dialects). It is our belief that with this patch and the previous, we should not be experiencing “TypeID mismatches” in MLIR based projects anymore. Please reach out if not the case.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

  • CUDA backend:
    • Many performance improvements across codegen and HAL, based on BertTraining model profiling. Optimized it to run in 135ms per iteration.

TensorFlow / MLIR-HLO

Kernel generator

  • Enabled unsigned int kernels for more TF ops (Cast, LeftShift, RightShift, NonEqual, Equal, BitwiseOr, BitwiseXor and more)
  • Lowering for AddOp/SubOp from ComplexDialect to Standard
  • Conversion of math::Exp2Op to NVVM/ROCDL.
  • Infrastructure for JIT compiled kernels is being added.

Recent Talks

Recent Publications

ScaleHLS: Scalable High-Level Synthesis through MLIR

High-level Synthesis (HLS) has been widely adopted as it significantly improves the hardware design productivity and enables efficient design space exploration (DSE). HLS tools can be used to deliver solutions for many different kinds of design problems, which are often better solved with different levels of abstraction. While existing HLS tools are built using compiler infrastructures largely based on a single-level abstraction (e.g., LLVM), we propose ScaleHLS, a next-generation HLS compilation flow, on top of a multi-level compiler infrastructure called MLIR, for the first time. By using an intermediate representation (IR) that can be better tuned to particular algorithms at different representation levels, we are able to build this new HLS tool that is more scalable and customizable towards various applications coming with intrinsic structural or functional hierarchies. ScaleHLS is able to represent and optimize HLS designs at multiple levels of abstraction and provides an HLS-dedicated transform and analysis library to solve the optimization problems at the suitable representation levels. On top of the library, we also build an automated DSE engine to explore the multi-dimensional design space efficiently. In addition, we develop an HLS C front-end and a C/C++ emission back-end to translate HLS designs into/from MLIR for enabling the end-to-end ScaleHLS flow. Experimental results show that, comparing to the baseline designs only optimized by Xilinx Vivado HLS, ScaleHLS improves the performances with amazing quality-of-results – up to 768.1x better on computation kernel level programs and up to 3825.0x better on neural network models.

Do we have more detail on this to share Since I was looking at MLIR on bert?

THere are a list of things that were done (and more to come) which were done to get here. None of these were particularly novel. Just small tweaks and addressing low hanging fruits to get to this number. If you want more info on how to try this out, or have your own information to share, please reach out to the IREE discord channel.