This is more a statement of intent than an RFC.
As the vector dialect is starting to get more fleshed out, @ThomasRaoux, myself and others have been discussing and prototyping Vector -> GPU lowering in the context of SPIRV. However we expect most of the lessons and techniques to translate more generally.
Some of the high-order bits are:
- a vector op is a good approximation for a divergence-free block / warp level chunk of computation that can be distributed across threads within a block.
- higher-order n-D vector operations such as pointwise, vector.contract and reductions can be lowered to proper warp-synchronized operations along certain dimensions and unrolled along others.
- vector.transfer_read and _write ops have masked + padding semantics that can take advantage of the predicated SIMT model + massive parallelism + dynamic HW memory hiding properties of GPUs.
This could get us to a nice SSA-based vector programming abstraction in MLIR with unsurprising mapping to the right underlying GPU features, while punting for now on more difficult issues related to thread divergence, by construction.
After early discussions with @ThomasRaoux prototypes include:
- mapping simple vector.transfer + pointwise ops to 1 block - many threads with 1-1 multiplicity
- mapping vector.transfer + vector.contract ops to cooperative matrix ops with 1-1 multiplicity
- introduce a “CooperativeCompatible” op interface on relevant vector ops that would be generally ignored except for the specific purpose of lowering to GPUs with cooperative support.
- in the SPIR-V model, CooperativeCompatible ops require using a specific opaque. This will likely require slicing between CooperativeCompatible ops and other ops to introduce write-read pairs and bitcast through memory. Maybe this would even be a special cast that has to go through memory at the lower level.
- support more general striding and transposes in the vector.transfer -> SPIRV lowering
- Allow n -> 1 multiplicity and explore various forms of lowering to vector type and/or unrolling
As part of this work we expect a bunch of other things to surface while connecting the pieces, e.g. vector.transfer canonicalizations, alignment requirements that are currently mostly not implemented etc.
For now things get prototyped on the IREE side but we hope to start sending things upstream as we make progress.