MLIR: Failed to lower affine dialect to gpu dialect

Hello, I’m a beginner in mlir and Toy Dialect. After following the tutorial, I’m now trying to convert some operations to GPU dialect based on affine dialect. I’m using the pass SCF to GPU, and some problems occur when I’m trying to convert the basic add between two tensor to GPU dialect.
The affine dialect of add is like:

affine.for %arg0 = 0 to 2 {
      affine.for %arg1 = 0 to 3 {
        %4 = affine.load %3[%arg0, %arg1] : memref<2x3xf64>
        %5 = affine.load %3[%arg0, %arg1] : memref<2x3xf64>
        %6 = addf %4, %5 : f64 %6, %2[%arg0, %arg1] : memref<2x3xf64>

and after adding the SCFToGPU pass, it throws out an error:

error: 'affine.load' op index must be a dimension or symbol identifier

This should be a type check in affine dialect, however there’s no error while emitting affine dialect and error occurs in SCFToGPU pass.
A temporary solution is to use LoadOP instead of AffineLoadOP while lowering Toy to affine, but considering that I may want to add some more complex operations using AffineMap in the future, this couldn’t solve the problem perfectly.

What should I do to avoid such error and where’s the verify function of affine.load called in SCFToGPU pass?