Memref.subview, affine_map and symbols

Hi all,
I am trying to understand the relationship between memref.subview, affine_map and symbols. I have this simple MLIR test program:

module  {
  func @func(%arg0:memref<16x16xf32> ) -> f32{
        %c0 = arith.constant 0 : index
        %c8 = arith.constant 8 : index
        %s2= memref.subview %arg0[%c8,%c8][8,8][1,1]  : memref<16x16xf32> to memref<8x8xf32, #map0>
        %2 = memref.load %s2[%c0, %c0] : memref<8x8xf32, #map0>
        return %2 : f32
   }

   func @main(){
        %0 = memref.alloc() : memref<16x16xf32>
        call @func(%0) : (memref<16x16xf32>) ->  f32
        return

   }
}

The idea of func is to get a subview of arg0 at [8,8] of size [8,8] and then read the element [0,0] of the subview.

Let’s focus on func and let’s have a look at the LLVMIR generated with this command:

mlir-opt ./test.mlir  -convert-memref-to-llvm -convert-std-to-llvm -canonicalize -reconcile-unrealized-casts | mlir-translate --mlir-to-llvmir 

Let’s consider different cases for the affine_map

With #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1)>

define float @func(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6) !dbg !3 {
  %8 = getelementptr float, float* %1, i64 0, !dbg !7
  %9 = load float, float* %8, align 4, !dbg !9
  ret float %9, !dbg !10
}

This clearly is not correct. I am not moving the pointer in the right position of my input. Why is this happening? Aren’t the dimensions of the memref known at compile time?

With #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1 + s0)>

define float @func(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6) !dbg !3 {
  %8 = mul i64 8, %5, !dbg !7
  %9 = add i64 %2, %8, !dbg !9
  %10 = mul i64 8, %6, !dbg !10
  %11 = add i64 %9, %10, !dbg !11
  %12 = add i64 %11, 0, !dbg !12
  %13 = add i64 %12, 0, !dbg !13
  %14 = getelementptr float, float* %1, i64 %13, !dbg !14
  %15 = load float, float* %14, align 4, !dbg !15
  ret float %15, !dbg !16
}

This is correct. s0 is a symbol read from the input memref and then the pointer is shifted (seemingly) in the right way.

With #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1 + 234*s0)>

define float @func(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6) !dbg !3 {
  %8 = mul i64 8, %5, !dbg !7
  %9 = add i64 %2, %8, !dbg !9
  %10 = mul i64 8, %6, !dbg !10
  %11 = add i64 %9, %10, !dbg !11
  %12 = add i64 %11, 0, !dbg !12
  %13 = add i64 %12, 0, !dbg !13
  %14 = getelementptr float, float* %1, i64 %13, !dbg !14
  %15 = load float, float* %14, align 4, !dbg !15
  ret float %15, !dbg !16
}

This is exactly the same, although I am multiplying s0 by 234. Why this multiplication does not appear in the IR?

Thank you to whomever can clarify all this confusion :slight_smile:

Giuseppe

CC @stevenvar @chelini

Ok I dug deeper here because I was also very surprised by the behavior and thought “it must be a bug”.
Unfortunately it isn’t: it is correct but surprising behavior.

Case 1:

 #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1)>
 memref.subview %arg0[%c8,%c8][8,8][1,1]  : memref<16x16xf32> to memref<8x8xf32, #map0>

does not do what you expect: you are telling the compiler that the result has offset 0 when in reality what you meant to say is that the offset is 8*1 + 8*16 = 136.

As a result, the lowering takes the most static information it can access (i.e. the offset 0) and uses that instead of computing the actual value. We could argue it should compute the actual value but really, the IR is wrong.

Part of the solution is likely to improve the verifier for subview to fail with incorrect offset here.

Case 2:

 #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1 + s0)>
 memref.subview %arg0[%c8,%c8][8,8][1,1]  : memref<16x16xf32> to memref<8x8xf32, #map0>

will work fine because lowering will see an unknown offset and compute it.
Note however that your computation is more dynamic than it could be as the stride value will be the dynamic value (this is what you are telling the compiler: the offset is dynamic).

The proper static value is still “136”.

Case 3:

 #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1 + 234*s0)>
 memref.subview %arg0[%c8,%c8][8,8][1,1]  : memref<16x16xf32> to memref<8x8xf32, #map0>

the lowering will also see the offset is dynamic here (i.e. 234 * ?) and it will just go grab the value from the descriptor. Similarly we should probably define this IR to be invalid.

The real problems

I’ll try to not go full berserk on details right now but this has to do with how to generally represent a “structured dense strided array type” in MLIR.

Atm we only have a convention and it is far from “good enough”.

The convention is basically that an offset/stride is either a static constant or a ? (the fact that it is represented by an affine map is an unfortunate accident of history and was necessary for progress to happen; it may be time to revisit this @ftynse also based on your layout proposal):

  • When offset/stride is static, the IR has enough information to just use the constant and it will indeed just use it. Note that in your first case, the “computed offset” would be 136 while your “specified offset” is 0. It should be possible to harden the verifier for this case and be less surprising. But really, the issue is the affine_map cruft when really you’d like to see offset: 0, strides: [16, 1]

  • When offset/stride is dynamic, all bets are off: there are infinitely many possibilities to write a ? with e.g. 42 symbols: s0 * 47 - 127 * s1 + ... - ceildiv(s41, 123) * 134746) is technically a possible way of writing an offset (as long as there is no multiplication with any of the dims). Again having offset: ?, strides: [16, 1] would avoid confusions here.

The strided memref type convention, tries to add some sanity here by restricting the dynamic complexity that valid expressions can take but there are still surprising cases like you uncovered.

However, this representation is still less powerful than we’d like.
Ideally we’d want to write offset: %0, strides: [%42, 1] and be able to detect that %0 and %42 are both divisible by say 8, which would allow propagating alignment guarantees. However, that would require significant extensions to MLIR as dependent types are not a thing. Multiple people have complained about that omission in MLIR. Interestingly, affine maps + op semantics can be convolved into represent some of that. But experience and your examples shows it is already super counter-intuitive in the static case so I’m not enclined to push that direction further (tongue in cheek: we could also represent this with limits of Fourier series : meh).

Also note that there isn’t atm a good non-ambiguous way to represent “dynamic rehapes” in MLIR: we cannot properly detect form the IR whether collapsing 2 dimensions requires a copy or not. This is also related to dependent types.

Sorry for the digressions here, premature early anchoring on “affine-only thinking” continues to bite us to this day and I would love some hands-on-deck to seriously clean all this up and ideally help us move towards some way to represent dependent types (@ftynse again re. the layout proposal :slight_smile: ).

In the meantime, I think the first simple static case can still be improved with a better subview op verification. Would you care to send a PR ?

Thanks for reading!

I had in fact brought up this issue at

and has to do with the design of memref.subview op itself. It is encoding information that is available in the result type as attributes and this isn’t the right design. An op should never duplicate information across its attributes (obviously because they may become inconsistent with each other). For the static case, one can just pull out this information from the layout map for pretty printing purposes and you can still have the same form printed.

If your offsets, sizes, and strides are [%c8, %c8], [8, 8], [1, 1] resp., the custom syntax form of the op should allow specifying them in only one way! and not two which is the case above. This was somehow missed both during the design and review of the revisions that added these ops.

It’s much simpler than that. The design for memref.subview/reshape and related ops is completely broken for the static cases as a result of duplicating static information that should be and is already present in the result memref type (what was already in the layout attribute was added as additional integer attribute arrays). This should be fixed. Given a set of offsets, sizes, and strides, there is only one canonical affine map form that represents that layout that can be used to represent and read them back. There appears to be a misunderstanding that one has to get rid of the generality instead of using that canonical form to represent a strict subset.

This isn’t really any simpler than fixing it the right way. You are proposing a verifier to check if the duplicate information carried by the op is consistent with what it’s duplicating. Instead, (1) remove the attributes holding the static sizes, strides, offsets, (2) add an accessor to get those from the layout map itself, and (3) have the custom builders take either the offset/strides/sizes form or the affine map-based one – not both! The result memref type is unambiguous and unique. The custom form should be ideally emitting a parse error if the affine map on the result type doesn’t conform to the offset/size/stride specification.
For eg., (d0, d1)[s0, s1] -> (d0 + s0) + (d1 + s1) over here IIUC.
The verifier only needs to check whether the layout map on the result memref type here is in the expected canonical form (for offset/strides/sizes) and the utility for this already exists. succeeded(mlir::getStridesAndOffset).

Hi @nicolasvasilache , @bondhugula
Thanks a lot for your answers (and sorry for the delayed reply). I went through them, and I still have some questions:

  • #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1)> ok, this is saying to the compiler to read a symbolic value but I am not using it so the offset disappears
  • #map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1+234*s0)> @nicolasvasilache I am not sure to follow your answer. Are you saying that any symbolic expression containing s0 will result in s0? Why is this? I mean, we say to the compiler to read s0 dynamically and to multiply it by 234. Is there any information that the compiler is missing to do this operation?

Also, @bondhugula , is there a way to get a “static” behaviour ? I tried this:

#map0 = affine_map<(d0, d1) -> (d0 * 16 + d1)>

But it still didn’t read the offset statically (i.e., from the memref’s type). In other words, is there a way for 136 to be present (as a constant) into the LLVMIR generated? Something like:

%8 = getelementptr float, float* %1, i64 136

(I was able to generate this by putting 136 directly inside the affine_map)

I would be happy to get a stab at a PR, but I would first like to understand exactly what is going on. Also, could you point me to the relevant source files where all this magic is happening :slight_smile: ?

Thanks a lot,
Giuseppe

You are making a lot of wrong assumptions, I’ll just answer to the most glaring ones.

No idea what you mean in this particular instance but [%c8, %c8], [8, 8], [1, 1] and [8, 8], [8, 8], [1, 1] are both valid forms, the first one is dynamic, the second is static and both are needed to progressively canonicalize while introducing cast ops.

Avoiding the hard questions does not make them disappear, you just choose to focus on the trivial question.

This is flatly wrong.

In your particular case, %c8 is a dynamic value.
The verifier seems to have a bug because it should have required the ? offset form.

I am looking at doing some refactorings and cleanups, starting with tensor.extract/insert slice.
I’ll get to subview a bit later this week, a lot of this code has been moved around a bunch of times in the past few months and I’ve already found issues with the tensor ops.
One thing I am looking at is whether to allow %c8 to be detected as the static value 8 during verification; there are a bunch of deeper tradeoffs involved and I am leaning towards “no” atm.

Yes, this is one of the difficult areas to unpack completely and touches on dependent types.
For now, take as an approximation that the layout is in the form offset: x, strides: [x, x, x] where x is either a static constant or a ?. The encoding with an affine_map is very unfortunate and misleading on the capabilities of the underlying type system. I’ll also be changing this soon since @ftynse added support to have other types of layouts that affine_map.

Yes, just use memref.subview %arg0[8, 8][8, 8][1, 1] and if that part of the verifier is not broken, it should force you to use the proper offset. Getting that behavior automatically in the verifier from a value %c8 is subject to multiple tradeoffs as I alluded to earlier in my message.

I started digging a bit myself, as I explained a lot of this code has moved around in the past few months.
For this instance it actually is prob. better I get on it and add you as a reviewr so you can see the moving parts.

Hi Nicolas,
Thanks for the reply. While I am trying to understand the first part of your answer (about 234*s0), I wanted just to let you know that I tried:

#map0 = affine_map<(d0, d1) -> (d0 * 16 + d1)>
%s2= memref.subview %arg0[8,8][8,8][1,1]  : memref<16x16xf32> to memref<8x8xf32, #map0>

But it is still giving me 0 offset in LLVMIR (and no errors)

I started digging a bit myself, as I explained a lot of this code has moved around in the past few months.
For this instance it actually is prob. better I get on it and add you as a reviewer so you can see the moving parts.

Sure, I would be happy to read the review!

Thanks,
Giuseppe

This is a verifier bug that I will fix.

So @stevenvar was able to find the winning combination. This:

#map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1+s0)>
%s2= memref.subview %arg0[8,8][8,8][1,1]  : memref<16x16xf32> to memref<8x8xf32, #map0>

Generates the behaviour I wanted:

%8 = getelementptr float, float* %1, i64 136

Now, there are things I still don’t understand. In particular, this:

For now, take as an approximation that the layout is in the form offset: x, strides: [x, x, x] where x is either a static constant or a ? .

This means that basically in the assembly I see a moderate number of madd to calculate the offset: this basically happens every time I try to create a subview with %cX indices, which happens a lot of times in the gemm code. Isn’t it possible to “propagate” the %cX constant to simply be X? I guess this connects with the following point:

One thing I am looking at is whether to allow %c8 to be detected as the static value 8 during verification; there are a bunch of deeper tradeoffs involved and I am leaning towards “no” atm.

Why this is not possible? Please note that LLVMIR detects that %c8 is a constant, and indeed it only reads the stride from the memref definition:

define float @func(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6) !dbg !3 {
  %8 = mul i64 8, %5
 ....

To be clear, I am only trying to understand what is going on here, there is no urgency in fixing anything at this stage. Although, this might be something that could have a small impact (to thoroughly investigate when we need that 1% performance push :slight_smile: )

Thanks,
Giuseppe