Similarly to vector.create_mask, llvm.sufflevector doesn’t accept scalable operands as it is. This operation is generated during the lowering of vector.broadcast/splat operations, and there’s no reason for those not to work with scalable vectors.
I’ve created a small patch that enables that lowering ([mlir][LLVM] Allow scalable vectors in ShuffleVectorOp) and seems to be doing its job, but I see a couple of issues that I believe require a design solution.
For background, this is how the lowering process works for a fixed-length vector.broadcast:
%0 = vector.broadcast %arg0 : f32 to vector<4xf32>
mlir-opt -convert-vector-to-llvm
%0 = splat %arg0 : vector<4xf32>
mlir-opt -convert-std-to-llvm
(notice the array of attributes: [0: i32, 0: i32, 0: i32, 0: i32]
)
%0 = llvm.mlir.undef : vector<4xf32>
%1 = llvm.mlir.constant(0 : i32) : i32
%2 = llvm.insertelement %arg0, %0[%1 : i32] : vector<4xf32>
%3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xf32>, vector<4xf32>
mlir-translate -mlir-to-llvmir
(notice the array of attributes is translated to zeroinitializer
)
%1 = insertelement <4 x float> undef, float %0, i32 0
%2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> zeroinitializer
For scalable vectors, if I simply allow scalable operands, I get a similar lowering:
%0 = vector.broadcast %arg0 : f32 to vector<[4]xf32>
mlir-opt -convert-vector-to-llvm
(so far, so good)
%0 = splat %arg0 : vector<[4]xf32>
mlir-opt -convert-std-to-llvm
(once again: [0: i32, 0: i32, 0: i32, 0: i32]
)
%0 = llvm.mlir.undef : vector<[4]xf32>
%1 = llvm.mlir.constant(0 : i32) : i32
%2 = llvm.insertelement %arg0, %0[%1 : i32] : vector<[4]xf32>
%3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<[4]xf32>, vector<[4]xf32>
That mask vector, applied to scalable vectors, have no meaning. In the scalable case, the number of elements in the operands is a multiple of 4, not necessarily 4, but we’re indexing only 4 elements.
mlir-translate -mlir-to-llvmir
(because it’s all zeroes, it translates to zeroinitializer
but with a scalable type)
%1 = insertelement <vscale x 4 x float> undef, float %0, i32 0
%2 = shufflevector <vscale x 4 x float> %1, <vscale x 4 x float> undef, <vscale x 4 x i32> zeroinitializer
Here, everything makes sense again according to the LLVM Language Reference Manual:
For scalable vectors, the only valid mask values at present are
zeroinitializer
andundef
, since we cannot write all indices as literals for a vector with a length unknown at compile time.
It all works out because an “all zero” mask gets translated to zeroinitializer
, but, in general, it doesn’t have to.
Left as it is, an assert for the scalable case that checks that mask
is all zeroes would ensure we don’t have invalid shufflevector
ops, but I’d rather find a cleaner solution. The obvious one would be to add a type-less way to express that your index mask is all zeroes, the same way as LLVM IR. But, I don’t know if adding something like that is desirable. Any idea for a more MLIR-esque way to solve this?
Any ideas will be greatly appreciated. Thanks!