The Vector Dialect document discusses the vector abstractions that MLIR supports and tradeoffs. One of the layer that is missing in OSS atm is the Hardware Vector Ops (HWV) level.
I am interested in experimenting in core with an AVX512-specific dialect for the specific purpose of implementing portions of XNNPack in MLIR and benchmarking them.
I am proposing to add a new
Dialect/Targets/AVX512 Dialect that would directly target useful intrinsics to implement XNNPack. The first function I am interested in implementing is exp-avx-512.
I think it is time for such dialects because atm, we rely too much on LLVM’s peephole optimizer to do a good job from small
shufflevector. We have some intrinsics defined and used in the LLVMDialect but these are all “portable” intrinsics, I am looking for defining layering to attack the right instructions in avx512 directly.
I think iterating at this level of abstraction in core will be a useful scouting work to getting the abstractions and layering right, and pave the way for a future ARM SVE dialect and other non-generic CPU dialects. Of course, when possible generic abstractions should be preferred. We also expect to learn more about when HW-specific vs generic abstractions should be used and how they compose in MLIR.
Edit: It was pointed that I should use the template for new dialects so here goes.
- What is the overall goal of the dialect?
Start filling the void in OSS between target-agnostic and target-specific vector operations.
- What is the first implementation milestone?
vector<16xf32> to AVX512 LLVM intrinsics for the exp-avx-512 function.
- How does it fit into the MLIR dialect ecosystem?
It is the first HWV dialect(s) in OSS (see the Vector Dialect doc ).
- Connection: how does it connect to the existing dialects in a compilation pipeline(s)?
VectorOps -> AVX512-MLIR -> AVX512-LLVM -> LLVM
- Consolidation: is there already a dialect with a similar goal or matching abstractions; if so, can it be improved instead of adding a new one?
- Reuse: how does it generalize to similar but slightly different use-cases?
There will be different HW-specific dialects we want to target. The union of all the ops in HW-independent and HW-specific dialects will represent the set of valid ops for a particular Target.
- What is the community of users that it is serving?
CPU users who want performance with AVX512 intrinsics.
- Who are the future contributors/maintainers beyond those who propose the dialect?
Anyone interested in AVX512 and making it a successful target for MLIR.
Please let me know if you have questions or concerns.