In the past few weeks, @nicolasvasilache and myself have been writing a few MLIR case study docs on CPU codegen for the Vector dialect, following the principles of (1) building technology bottom-up, i.e. first make sure one level works really well before building the next level and (2) keeping low-level code generation as architectural-neutral as possible, for example, by using generic intrinsics (rather than CPU specific intrinsics, or even an intermediate, CPU specific dialect), which enables the LLVM backend to generate good code for e.g. x86-64 and AArch64 flavors alike, with only the need for changing a few simple parameters in the lowering strategies.
So far we have
- AVX512 Codegen for the Vector Dialect Ops
- Sparse Matrix Time Vector in the Vector Dialect
- Transfer Operations in the Vector Dialect
- A Simple Retargetable Matmul Strategy
The docs focus on AVX512, although the principles are more widely applicable. Furthermore, the docs are simple case studies, not fully worked out academic papers. Nevertheless, if there is a general interest, we can post the docs here on this forum (after some internal cleanup). Please let us know if that is something we should invest time in.