`tensor_to_memref` and its folding

I recently hit an issue w.r.t to tensor_to_memref that triggered a lot of discussion internally. It might be worth just recording the discussion here for everyone’s visibility.

The canonicalizations of LoadOp here folds away LoadOp (TensorToMemrefOp $value), $indices)) to TensorExtractOp $value $indices . Specifically the following sequence of instructions

%1 = tensor_to_memref %0 : tensor<f32>
store %val, %1[] : f32, tensor<f32>
%get_val = load %1[] : tensor<f32>

will get canonicalized away to

%1 = tensor_to_memref %0 : tensor<f32>
store %val, %1[] : f32, tensor<f32>
%get_val = tensor.extract %0[] : tensor<f32>

%get_val is getting the wrong value, i.e. you would expect it to get %val and it doesnt.

One way to interpret this is that the canonicalization is wrong. It is undoing bufferization and the pattern is not considering the intermediate operations that might be updating the buffer leading to incorrect behavior. Therefore it needs to be removed (patch).

An alternative way to reason about it is that the input IR is exhibiting undefined behavior. The semantics of tensor_to_memref is that it is illegal to write to the memref that is the result of a tensor_to_memref. The instructions is meant only for use for necessary type-legalizations when going from tensors to buffers, and that any use of it outside of that is inherently undefined behavior.

  • The caveat here though is that there is no way today to enforce that a memref has to be read only. So for a given memref it is impossible to reason about locally if the memref came from a tensor_to_memref and therefore cannot be written to.
  • There is not too much difference between a read-only memref and a tensor. So there should not be a tensor_to_memref instruction at all in the first-place. This would also imply that any bufferization pass (or sequence of passes) is not expected to convert all tensor operations to operations on memrefs and that all backends (LLVM, and SPIR-V) must be able to handle tensors that are not converted during bufferization. This typically means that backends need to handle std.constant natively.
    • The LLVM backend does not handle any tensor operations. Instead there is a pass that converts std.constants to global memref objects which can be used to lower std.constants to LLVM.
    • The SPIR-V backends natively handles std.constants.

The IR with the store after a tensor_to_memref just exhibits undefined behavior right now.

But the general theme of adding a store to a memref is not specific to tensor_to_memref I believe: you can’t introduce a write to a memref when you can’t prove this is possible in the first place. In general memref has pointer semantics, and adding write to any buffer that you don’t have full knowledge of (some sort of oracle telling you about it, or some buffer you introduce from alloc to dealloc and all the uses) isn’t possible in general (hard to prove the legality of it).
A memref coming as a function argument for example would require to inspect all the call sites (possible only with private functions) to make sure it does not come from a global constant.
It can also alias with others memref, and you’d need to prove that the data in the memref aren’t live outs at the point where you intend to introduce a store.

Thats a lot of restrictions. I guess if you “know what you are doing” you can introduce stores to it? If that is the case, if I have an oracle that told me that it is safe to store to a memref (even if it comes from a tensor_to_memref) then the canonicalization I pointed to above is undoing what an oracle told me is OK to do. Thats kind of the point I was trying to make. The canonicalization does not have this oracle knowledge to decide if it is a valid thing to do. There might be cases where it is valid to do that, but it needs to be outside of any canonicalization pass that does not have such context.

That’s a fair point, except I’d dispute the correctness of such an oracle since it was an important point discussed when we introduced the tensor_to_memref that it can’t allow write to the resulting memref.
The reason is that the tensor domain is supposed to be pure values, and exposing the possibility of writing to a tensor breaks any reasoning at this level. As an example, considering:

  %val = ... : tensor<f32>
  print(%val) : (tensor<f32>) -> () // print the tensor value
  %m = tensor_to_memref %val : tensor<f32> to memref<f32>
  .... // whatever here
  print(%val) : (tensor<f32>) -> () // print the tensor value

The tensor domain should guarantee that the two print(%val) statement should print the same content. This is only possible as far as I can tell with the restriction that either tensor_to_memref does not allow a write to the produced memref (preserve the immutability aspect of the tensor value), or that tensor_to_memref has “malloc + copy” semantics (it produces a new memref that does not change the value of the tensor in the tensor domain). These were considered when introducing tensor_to_memref and we went with the former because the latter comes with other pessimization for the bufferization.
The drawback is that it makes tensor_to_memref quite coupled to the current bufferization expectation of course.

I discussed an almost identical complication with this canonicalization during the sparse compiler presentation (slide 45), where I actually exploited the “flaw” to get fast bufferization (since the end result works as expected).

Thanks Aart, but that is scary. It is violating tensor semantics and is actually an example of the issue I posted with tensor_to_memref. If you ran canonicalization on it

  • The load will get folded with the tensor_to_memref to become a tensor.extract.
  • You happen to have a case where no element of the result of tensor_to_memref is read after the store, i.e. your loop is parallel. If that was not the case and you had a load -> store -> load sequence on the same element, with the canonicalization pattern, the value of the store-ed would not be the one load-ed in the second load.

@MaheshRavishankar Yes, but that was exactly the point I was trying to make in the presentation! By default, the sparse compiler does the right thing and bufferizes with an alloc+copy (slide 43/44). However, this introduces an O(n) component in a potentially just O(nnz) operation. In order to get the O(nnz) solution, I had to use a questionable IR (under an experimental flag, slide 45/46). I discussed this with several bufferization folk, and there is no satisfactory solution right now. I am just hoping what whatever we come up with will enable me to get the second, optimal solution, using valid IR.

The semantics of tensor_to_memref, which now is called buffer_cast, is that it produces as its result a memref that corresponds to the tensor operand value. That memref is immutable and mutating it leads to undefined behavior. That is a fundamental assumption of the current bufferize pass to allow for partial bufferization.

If you want to get some in-place update behavior, you can

a) implement a bufferization pattern or pass that creates this specific pattern. So for the linalg.generic example with the init tensor that @aartbik had, you would write a pattern that bufferizes the combination into something that does update in place.

b) you rely on buffer optimizations to identify the case where potential copies are not needed.

The example we discussed back then was (with now likely wrong syntax)

  func @mul(%arga: tensor<32x32xf64>, %argb: tensor<32xf64>, %argx: tensor<32xf64>) -> tensor<32xf64> {
    %0 = linalg.generic #trait
      ins(%arga, %argb: tensor<32x32xf64>, tensor<32xf64>)
      outs(%argx: tensor<32xf64>) {   
      ^bb(%a: f64, %b: f64, %x: f64):
        %0 = mulf %a, %b : f64
        %1 = addf %x, %0 : f64
        linalg.yield %1 : f64
   } -> tensor<32xf64>
   return %0 : tensor<32xf64>           
  }

and in this example, whether you can reuse argx depends on your calling convention. If the buffer is owned by the caller, reuse is actually not possible. That is why the default bufferization (which only has local knowledge) would need to insert a copy here. In the common case where argx is locally defined and maybe even a constant, you could build something clever right away by having a special bufferization pattern.

If you bufferize everything in one go, you can also build a system that tracks reuse and in-place information as you go and produces a reuse instead of a copy right away. I think this works well for a closed system but does not compose well with other dialects. It would still compose with the current bufferize approach if it transforms only “islands” within the code and uses buffer_cast and tensor_load on the boundaries.

For the current bufferization approach, you would need a transformation that removes copies if it is safe to do so. This is on the list of things we want to add (beyond the simple cleanup pass we have for bufferization that is not powerful enough for this).

And, as a final remark, I would also like to be able to model immutable memrefs in the type system (also for constants) but that is a big change that opens another can of worms. I hope we get there eventually.

1 Like

Thanks all for the responses. More than anything, just wanted to call out that using tensor_to_memref in code outside of bufferization that exists in core can end up with subtle issues. Since currently there is no way to represent read-only memrefs this falls into the undefined behavior category.

This is extremely scary! Is it intended to be a temporary hack? There was (or perhaps still is?) the tensor_store operation that stores the tensor value (as if it was stored in a register) into a memref. And one can then freely mutate the memref. You can’t be creating the memref if you want to say that it can’t be mutated because the memref is memory you can load/store from. If it’s the way you describe it, the abstraction just looks wrong or has gone in the wrong direction?! In the presence of such ops, the defining op of a memref Value is determining whether one can store to it or not - what happens when some of its users get outlined to or are in another function which is also called from elsewhere?

This is intended to be an operation that is transient and model some intermediate steps of the bufferization. You don’t have an allocated memref just yet, so instead of creating a temporary that you have to find and recover the state of the bufferization across the various steps, this models more directly where the algorithm is at.

Not really: first in general if you don’t have visibility on the entire lifetime of a memref you can’t introduce new writes to it. In a function taking a memref as parameter, you can’t know that the data isn’t reused by the caller for example.
Even saving and restoring the data to reuse the memref (not that I see a use case for this) into something like save = a[0]; a[0] = foo(); .... ; a[0] = save; isn’t obviously correct to me (assuming a[0] wasn’t already written to).

You have to be conservative with memref, always. That’s not new, we have had for a long time in the standard dialect this operation: 'memref' Dialect - MLIR which has this description:

If the global variable is marked constant, writing to the result memref (such as through a memref.store operation) is undefined.

I dont think I have full context of this, but if this is a restriction, I am not sure how this can stand the test of time. Isnt this saying that all functions have to treat memref as const * . So if you have a function that takes a memref argument then you can never store to it. This function can never be the compiled down to LLVM and shipped as part of a library cause it can never write to it and do anything meaningful.
To me the specification should be reversed. If your memref is supposed to be read-only, and it escapes the context, say gets passed to a function, then that is undefined behavior, cause the function is free to update it. Then if and when we have some way of propogating read-only semantics through ABI boundaries then you have some more guarantees.
Basically, I see memref as equivalent to a multi-dimensional pointer, but the spec here seems to say that it is a multi-dimensional const *. I am sure I am not the only one who sees it this way. There is no way we can enforce the semantics of const * right now, and saying that it is undefined behavior I think is a bit of a shortcut.

I really don’t quite get your point of view, I see this working just like C++ actually (and any pointer-based compiler I worked with), which we compile it down to llvm successfully.

For example when you write in C++:

int foo(int *p) {
  return *p;
}

The compiler cannot introduce a write to p I believe. Can you expand on why would it be different for memref here?
I.e. isn’t memref just a “fat pointer” (a pointer + metadata on the logical->physical indexing conceptually)? At least that is my mental model when I look at memref, if not it is worth clarifying I think.

Maybe I misunderstood what you said. Here is where I am coming from. Take the call from BLAS

void cblas_sgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA,
                 const enum CBLAS_TRANSPOSE TransB, const int M, const int N,
                 const int K, const float alpha, const float *A,
                 const int lda, const float *B, const int ldb,
                 const float beta, float *C, const int ldc);

The A and B are "input"s and C is the output, and this contract is enforced by the const * on A and B and just float * on C.

This in MLIR would presumable be.

func @cblas_gemm(%arg0 : memref<?x?xf32>, %arg1 : memref<?x?xf32>, %arg2 : memref<?x?xf32>) {

}

So obviously we need to “write” to %arg2. So I wasnt sure how this would be compatible with “you cannot write to a memref unless you have visibility of its lifetime” cause the visibility spans the function, and there is no way to specify “read-only memref”.

Even the function you mentioned, leaving aside what the compiler can/cannot do, if I were writing a the function foo with a int *p argument, Its valid to write to *p (i.e. code will compile), but if it was const int *p, it would be compile time error for me to write to *p. But you are probably talking about a different thing. I agree that the c++ compiler should not introduce loads/stores to *p in your example.

MLIR is at a higher level than a C++ compiler. It is meant to generate the implementation of the cblas_sgemm call above. You can start from the memref world and generate the code. (I might be transplanting what you said in a different context).

Of course you might not be starting at this level in MLIR. You might start with

func @cblas_gemm(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>) -> tensor<?x?xf32>) {

}

This is a bit more easy to fit within the semantics that you mentioned cause you will be explicitly introducing the memref.

This might have been a bit of a digression. My point here is take this

func @do_write(%val : f32, %buffer : memref<f32>) {
   store %val, %buffer 
}

func @foo(%t : tensor<f32>) {
  %0 = tensor_to_memref %t
  call do_write(..., %0)
}

Where is there a violation? I am saying passing the result of %0 to do_write is illegal (and is verifiable). You are saying that do_write is doing something illegal. I am not sure how to verify that. If that is illegal, how do you say that the func @cblas_gemm(%arg0 : memref<?x?xf32>, %arg1 : memref<?x?xf32>, %arg2 : memref<?x?xf32>) is legal?

The difference here is that this is not a compiler transformation, it is the frontend that emits the store because this is what is in the original program.
Note that it does not really change the constraint, because you can do something does not mean it is directly enforced, for example:

const float C;
int main() {
   cblas_sgemm(....., (float*)C, ...);
}

but it would be incorrect.

The IR does not carry const/non-const, we would emit it as expected

The violation here is the same as the C++ sample above, the program is just incorrect, whoever created this did it incorrectly.

I look at it from a program transformation point of view, i.e. to be able to reason about this you need to start from a “correct” state and then we can talk about what you can or cannot do with it in the compiler.
This is really the critical aspect of the system as I see it: for example the C++ compiler “assumes” that the input is correct according to the C++ spec and operates accordingly. If the user writes an incorrect C++ program (a program that exhibits UB) then it’s too late to try to reason about it.

This is why it is a bit hard for me to answer your examples in general, you show me an “incorrect” state of the IR which leads me to ask you about “how you got there”, because we cannot start discussing about invalid IR and reason about it.
The “tensor_to_memref” isn’t written by the user here, the compiler is the one who create this situation, so there is one step along the way where “a store to a memref was introduced” and we can look into it (or a step that “created a tensor_to_memref and replaced an existing memref with it but this memref is used with a write”.
One of these steps is incorrect, from a program transformation point of view, but to be able to understand it we’d need to see an example starting from a “correct state” and discuss how we get into an “incorrect state”.

I agree this code is wrong. My question is what is wrong and that is where I think I see it differently. If someone introduced a tensor_to_memref the pass introducing it should/can know that passing the result memref to a function call is illegal. This is verifiable in a local context (look at all uses of the tensor_to_memref and only a few defined uses are allowed). Whether the code is from the user or was generated by a compiler pass would make no difference.

The alternative of saying its ok to pass a result of tensor_to_memref to a function call cause semantics guarantees the function cannot store to it is a very heavy hammer. You need a global context to verify this, i.e. is this user input or is this compiler pass output. By construction all bufferization passes are introducing stores to memref. So would all of them be exhibiting undefined behavior? I am having hard time reasoning about this cause its OK to store to memref sometimes, but not OK sometimes. To me it seems more well defined to say stores to memref is fine, and you assume that if your memref is escaping context, it can be stored to. So if you are introducing a transformation/writing input MLIR that is resulting in a memref that is supposed to be read-only escaping, then you are playing with fire and thats undefined behavior. Later on if and when we ever have read-only memrefs, then you have some more guarantees.

My question is what is wrong and that is where I think I see it differently. If someone introduced a tensor_to_memref the pass introducing it should/can know that passing the result memref to a function call is illegal.

Yes, I agree, if you introduce a tensor_to_memref and as such you are creating a new memref (this is a new SSA value right?), you cannot also pass this memref to an opaque function if you don’t know it is safe.
But that shouldn’t be a problem I think: if you create a memref and pass it to a function you likely know what’s going on (you have a specific purpose to do that, it can’t be opaque right?).

I am having hard time reasoning about this cause its OK to store to memref sometimes, but not OK sometimes.

For people who have LLVM experience, is is no different than having an i8*, it may come from a global constant as well. The IR does not carry the const/non-const as part of the type.

If it helps, you could mock-up some IR where you annotate the memref type with the const annotation, it is just painful to have to cast in all directions (calling a function taking a const_memref with a memref requires casting memref->const_memref first). You can think of the current IR as just eliminating all these cast and dropping the “const” annotation.
One could define a const_memref type, this allows some static type system verification, but that does not really change anything fundamentally to the reasoning with respect to the examples we visited.
I guess if we forbid to cast from const_memref to memref it may prevent from creating this invalid IR situation in the first place? But helping to catch these issues does not really provide you with the “oracle” on how to make the right decision (you still need the global analysis, etc.).

By construction all bufferization passes are introducing stores to memref. So would all of them be exhibiting undefined behavior?

If they do so on a memref created with tensor_to_memref, then yes these transforms are incorrect.

Hey folks, this seems to be a hot topic, so I’m starting work to fix this.

I think we all fundamentally agree on what constitutes a correct compiler. I think the main tension here is that the current state’s “correct” behavior is fairly painful for folks dealing with it – we need a few new concepts to make it workable in practice.

The core issue here is the jump from value-semantic tensors to reference-semantic memrefs that need to be mutated in place – the highly local nature of today’s bufferization means that information is lost or not exploited (or cannot even be annotated in the first place) – for example, bufferization patterns today in upstream cannot write to converted memrefs, which makes them constantly allocate/copy memrefs just to get memrefs they can write to. This type of overly conservative behavior (which is “unacceptable in practice”) is needed for correctness but can be eliminated with a better system.

Working on fixing this…

– Sean

2 Likes

@MaheshRavishankar Can you share an “end to end” example that demonstrates the issue? I.e. an IR snippet that operates on tensors (and does not have memrefs) that would bufferize to the example in the first email?

Like Mehdi I too am failing to see the fundamental problem here – in some cases memrefs allow us to write programs with UB, so what? I assume even OOB writes are UB.

1 Like

I am not sure that I can give a silver bullet example here. If I write an end-to-end system that includes buffer allocation then it would be fair to say “that buffer allocation is wrong”. I think thats not my concern.

If I had to boil this down, my concern here is:

  1. My main concern is that canonicalization of a tensor_to_memrefload => tensor_extract seems ill formed, or at least valid in very specific situations. So its should not happen always. That is assuming something about the usage of the memref.
  2. The justification for the canonicalization is that you are not supposed to write to memref that you dont have full visiblity over. Except that if I did have full visibility over it, and I wrote (or generated, but lets say wrote) the tensor_to_memrefload the canonicalization is going to undo it. So either writes to a memref from tensor_to_memref should be illegal, or the canonicalization should be. We dont have “read-only” memref types. So you cant enfores the first part of that statement.
    That leads to a justification using “dont write to memrefs” with the caveat that if writes to memref is “input” to the compiler, then its OK.

I understand the reasoning comes from LLVM. I am not debating that compiler passes cannot introduce load/stores to memrefs “unless it knows what it is doing” in MLIR (think bufferization passes). My problem is that the canonicalization is done without any of this context/restrictions. That is really my issue. If I can get consensus on removing the canonicalization then I would go away happily.