See the previous published edition
Welcome to the forty-third issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
- A proposal has been sent on the PyTorch forums about Torch-MLIR!
- The current OpConversionPattern::matchAndRewrite are deprecated and being removed in favor of the OpAdaptor overloads.
- Sparse compiler progress:
- Sparse constants no longer “expand” into a dense iteration space, but are directly converted to sparse tensor storage at runtime (courtesy Bixia)
- Generalized support for reductions beyond just SUM
- Fixed ABI issue on ARM64 in support lib (thanks to Javier for debug help)
- Min and max ops have been added to std and capture FP/nan semantics better.
- Reduction detection has been refactored and improved across dialects. Linalg vectorization now supports more cases including min/max.
- Linalg.pad_tensor gains a “nofold” attribute which prevents folding and keep the ops around to enable packing even in the cases where sizes divide evenly.
- Various improvements in progress to Linalg comprehensive bufferization and refactorings to allow better interop with external projects such as IREE.
- Older C++ only ops are being retired in favor of their OpDSL equivalents.
- Codegen strategy refactored to make better use of the pass infrastructure and become more usable.
- New foldings of
tensor.insert/extract_slicehave been added.
- Some general improvements to quantization
- Relaxing quantized tensor type requirements
- Ranked constraints fixed on quantization builders
- Type verification expanded for basic shape manipulations
- was causing crashes during shape inference.
IREE : An Experimental MLIR Execution Environment
- Making progress towards fixing some gaps in the codegen backends
- All ops that are to be executed on the device need to be tiled and distributed. A couple of ops remaining, after which all ops are default parallelized on CPU and on GPUs
- Looking into using the newly added fusion transformations in MLIR core for doing fusion at vector level, which will allo
- CUDA Backend
- Added option to control tile and workgroup size from IR to enable search for CUDA using the same mechanism as CPU
- Integration of IREE in mmpref (GitHub - mmperf/mmperf: MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.) to allow contiguous comparison of IREE GEMM with cuBlas and TVM
- Misc bug fixes and configuration tweaks for Bert learning
- Implementation of kernel generator JIT mode is complete. We are now completing final steps for the launch approval in the next TF release.
- We have further optimized the calling convention for unranked results. The
memrefdescriptor is now allocated on the stack of the caller, avoiding heap allocations in the call.
CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’
- A lot of progress has been made on lowering the SCF dialect to the Calyx dialect. Woo! Since the Calyx compiler is not completely fleshed out in CIRCT, there is also an emitter to the native Calyx compiler IR (documentation for this is found here). That means we can currently lower:
SCF dialect → Calyx dialect → Calyx native compiler (with spunky optimizations) → SystemVerilog.