Llvm.shufflevector operations for scalable vectors

Similarly to vector.create_mask, llvm.sufflevector doesn’t accept scalable operands as it is. This operation is generated during the lowering of vector.broadcast/splat operations, and there’s no reason for those not to work with scalable vectors.

I’ve created a small patch that enables that lowering ([mlir][LLVM] Allow scalable vectors in ShuffleVectorOp) and seems to be doing its job, but I see a couple of issues that I believe require a design solution.

For background, this is how the lowering process works for a fixed-length vector.broadcast:

     %0 = vector.broadcast %arg0 : f32 to vector<4xf32>

mlir-opt -convert-vector-to-llvm

    %0 = splat %arg0 : vector<4xf32>

mlir-opt -convert-std-to-llvm (notice the array of attributes: [0: i32, 0: i32, 0: i32, 0: i32])

    %0 = llvm.mlir.undef : vector<4xf32>
    %1 = llvm.mlir.constant(0 : i32) : i32
    %2 = llvm.insertelement %arg0, %0[%1 : i32] : vector<4xf32>
    %3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xf32>, vector<4xf32>

mlir-translate -mlir-to-llvmir (notice the array of attributes is translated to zeroinitializer)

  %1 = insertelement <4 x float> undef, float %0, i32 0
  %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> zeroinitializer

For scalable vectors, if I simply allow scalable operands, I get a similar lowering:

     %0 = vector.broadcast %arg0 : f32 to vector<[4]xf32>

mlir-opt -convert-vector-to-llvm (so far, so good)

    %0 = splat %arg0 : vector<[4]xf32>

mlir-opt -convert-std-to-llvm (once again: [0: i32, 0: i32, 0: i32, 0: i32])

    %0 = llvm.mlir.undef : vector<[4]xf32>
    %1 = llvm.mlir.constant(0 : i32) : i32
    %2 = llvm.insertelement %arg0, %0[%1 : i32] : vector<[4]xf32>
    %3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<[4]xf32>, vector<[4]xf32>

That mask vector, applied to scalable vectors, have no meaning. In the scalable case, the number of elements in the operands is a multiple of 4, not necessarily 4, but we’re indexing only 4 elements.

mlir-translate -mlir-to-llvmir (because it’s all zeroes, it translates to zeroinitializer but with a scalable type)

  %1 = insertelement <vscale x 4 x float> undef, float %0, i32 0
  %2 = shufflevector <vscale x 4 x float> %1, <vscale x 4 x float> undef, <vscale x 4 x i32> zeroinitializer

Here, everything makes sense again according to the LLVM Language Reference Manual:

For scalable vectors, the only valid mask values at present are zeroinitializer and undef , since we cannot write all indices as literals for a vector with a length unknown at compile time.

It all works out because an “all zero” mask gets translated to zeroinitializer, but, in general, it doesn’t have to.

Left as it is, an assert for the scalable case that checks that mask is all zeroes would ensure we don’t have invalid shufflevector ops, but I’d rather find a cleaner solution. The obvious one would be to add a type-less way to express that your index mask is all zeroes, the same way as LLVM IR. But, I don’t know if adding something like that is desirable. Any idea for a more MLIR-esque way to solve this?

Any ideas will be greatly appreciated. Thanks!

Hi all!

As a compromise, I’ve added a strong restriction for the shufflevector mask operand in the case of scalable vectors. Shufflevector only makes sense when we’re performing a splat operation so enforcing an “all zero” mask, if not entirely aesthetically pleasing, ensures that the operation is valid all the way down to LLVM IR.

I’m reluctant to include a zeroinitializer key word given that it wouldn’t be needed anywhere else (for now), and it will require changes to the syntax of the operation, several lowerings that generate shufflevector ops, and the translation to LLVM IR itself.

If people are happy with the compromise, this is enough to have a working solution and unlock the stack of patches on top of this one.

As always, all feedback is welcome. Thanks you!