Affine loop fusion creating illegal std.alloc

I am trying to run --affine-loop-fusion on this small example:

   func @calc(%arg0: memref<?xf32>, %arg1: memref<?xf32>, %arg2: memref<?xf32>, %len: index) {
      %c1 = constant 1 : index
      %1 = alloc(%len) : memref<?xf32>
      affine.for %arg4 = 1 to 10 {
        %7 = affine.load %arg0[%arg4] : memref<?xf32>
        %8 = affine.load %arg1[%arg4] : memref<?xf32>
        %9 = addf %7, %8 : f32
        affine.store %9, %1[%arg4] : memref<?xf32>
      }
      affine.for %arg4 = 1 to 10 {
        %7 = affine.load %1[%arg4] : memref<?xf32>
        %8 = affine.load %arg1[%arg4] : memref<?xf32>
        %9 = mulf %7, %8 : f32
        affine.store %9, %arg2[%arg4] : memref<?xf32>
      }
      return
    }

after running loop fusion this generates:

func @calc(%arg0: memref<?xf32>, %arg1: memref<?xf32>, %arg2: memref<?xf32>, %arg3: index) {
  %c0 = constant 0 : index
  %0 = dim %2, %c0 : memref<?xf32>
  %1 = alloc()[%0] : memref<1xf32>
  %c1 = constant 1 : index
  %2 = alloc(%arg3) : memref<?xf32>
  affine.for %arg4 = 1 to 10 {
    %3 = affine.load %arg0[%arg4] : memref<?xf32>
    %4 = affine.load %arg1[%arg4] : memref<?xf32>
    %5 = addf %3, %4 : f32
    affine.store %5, %2[%arg4] : memref<?xf32>
    %6 = affine.load %arg0[%arg4] : memref<?xf32>
    %7 = affine.load %arg1[%arg4] : memref<?xf32>
    %8 = addf %6, %7 : f32
    affine.store %8, %1[0] : memref<1xf32>
    %9 = affine.load %1[0] : memref<1xf32>
    %10 = affine.load %arg1[%arg4] : memref<?xf32>
    %11 = mulf %9, %10 : f32
    affine.store %11, %arg2[%arg4] : memref<?xf32>
  }
  return
}

Here the instruction %1 = alloc()[%0] : memref<1xf32> fails the verifier.
error: 'std.alloc' op operand count does not equal dimension plus symbol operand count.

Is there something wrong with the example?

There can’t be anything wrong with your example - it’s valid IR and the output is invalid. :slight_smile: This is a fusion pass bug because it wasn’t tested much with dynamically shaped memrefs (mainly because it currently handles constant trip count loops, and so statically shaped memrefs were often used in conjunction).

Changing this line to drop the last argument so that you have:

Value newMemRef = top.create<AllocOp>(forOp.getLoc(), newMemRefType);

fixes this. D82409 fixes this.

1 Like