RFC for introducing `abstract_tensor` as a `builtin` type

Based on some previous discussions (here), there is an interest to make Linalg dialect less monolithic, and to piecewise graduate things out of it. As part of this effort, this RFC was proposed to deprecate linalg.init_tensor operation. The RFC has been abandoned for now, but it did bring to the fore some issues with respect to how
linalg.init_tensor is used currently. In its current state, linalg.init_tensor cannot be easily moved out of the Linalg dialect because it is used in two modes

  1. To create a tensor where its uses only need the shape and element-type of the tensor.
  2. To create an tensor whose values are undefined, into which values/tensors are inserted using destructive updates.

These are two separate use cases at different levels of the stack, which need to be decoupled before linalg.init_tensor can be deprecated.

The second part is more relevant to transformations like tiling on tensors and bufferization that are typically used lower down the stack. This RFC is not related to this use case. Indeed [this RFC] [RFC] Promoting `linalg.init_tensor` to the `bufferize` dialect) was intended to adapt linalg.init_tensor to better address this use case. Decoupling the two uses cases will make the solution proposed in that RFC viable.

This RFC is intended to address the first use case . This is typically relevant for front-end dialects (like MHLO and TOSA) and their lowering into Linalg on tensors. For example linalg.pooling_nhwc_sum operation (and other variants of this) are represented in IR as:

%window = linalg.init_tensor ... : tensor<?x?xf32>
%output = linalg.pooling_nhwc_sum
    ins(%input, %window : tensor<?x?x?x?xf32>, tensor<?x?xf32>)
    outs(%output_init : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>

The %window value here is only used to determine the loop bounds of one or more loops of the operation. Its value is not used as part of the semantics of the operation.

Another example is from linalg.generic operations with broadcasting semantics.

%operand = linalg.init_tensor ... : tensor<?x?xf32>
%result = linalg.generic {
    iterator_types = ["parallel", "parallel"],
    indexing_maps = [affine_map<(d0, d1) -> (d1)>,
                     affine_map<(d0, d1) -> (d0, d1)>]}
    ins(%input : tensor<?xf32>)
    outs(%operand : tensor<?x?xf32>) {
  ^bb0(%arg0 : f32, %arg1 : f32) :
    linalg.yield %arg0 : f32
} -> tensor<?x?xf32>

Here %result is a tensor created by broadcasting values from %input . The shape of %result is same as %operand, but the values in the %operand tensor is not used by the operation. This is indicated by the fact that %arg1 of the region of the Linalg operation is not used. (The method
payloadUsesValueFromOperand of the LinalgInterface allows an easy way to find such operands.)

In all such cases a linalg.init_tensor operation is used to create the tensor value without associating any value with its elements. A better modeling of this though is through use of a separate type which allows you to define only the shape and element-type of a tensor. This RFC is aimed at addressing this by adding an abstract_tensor type to the builtin types. The builtin.abstract_tensor type would be similar to builtin.tensor type, but represents a tensor shape without associating any data with its elements. A new operation will be added to the tensor dialect that creates a value of abstract_tensor type.

%0 = tensor.create_abstract_tensor [%d0, %d1] : builtin.abstract_tensor<?x?xf32>

creates a 2D abstract_tensor (of dimensions %d0 and %d1) with element type f32.

The above IR examples would be represented as

%d0 = arith.constant ... : index
%d1 = arith.constant ... : index
%window = tensor.create_abstract_tensor [%d0, %d1] : abstract_tensor<?x?xf32>
%output = linalg.pooling_nhwc_sum
    ins(%input, %window : tensor<?x?x?x?xf32>, abstract_tensor<?x?xf32>)
    outs(%output_init : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>
%d0 = arith.constant ... : index
%d1 = arith.constant ... : index
%result_shape = tensor.create_abstract_tensor [%d0, %d1] : abstract_tensor<?x?xf32>
%result = linalg.generic {
    iterator_types = ["parallel", "parallel"],
    indexing_maps = [affine_map<(d0, d1) -> (d1)>,
                     affine_map<(d0, d1) -> (d0, d1)>]}
    ins(%input : tensor<?xf32>)
    outs(%result_shape : abstract_tensor<?x?xf32>) {
  ^bb0(%arg0 : f32, %arg1 : f32) :
    linalg.yield %arg0 : f32
} -> tensor<?x?xf32>

Work plan

To not make sweeping changes the steps would be as follows

  1. Add the builtin.abstract_tensor type and tensor.create_abstract_tensor operation.

  2. Replace all cases where result on linalg.init_tensor is used in ins list of Linalg operations, i.e. allow use of Value of type builtin.abstract_tensor in ins list of Linalg operations. This addresses uses like the linalg.pooling_* operations above.

  3. Replacing uses of linalg.init_tensor in outs list of Linalg operation, i.e. allow use of Value of type builtin.abstract_tensor in outs list of Linalg operations. This can have impact lower down the stack. It ties into the use case where linalg.init_tensor is used as a tensor with undefined values. Specifically tiling Linalg ops on tensors using scf.for uses destructive updates. For example,

    %operand = linalg.init_tensor ... : tensor<?x?xf32>
    %result = linalg.generic {
        iterator_types = ["parallel", "parallel"],
        indexing_maps = [affine_map<(d0, d1) -> (d1)>,
                         affine_map<(d0, d1) -> (d0, d1)>]}
        ins(%input : tensor<?xf32>)
        outs(%operand : tensor<?x?xf32>) {
      ^bb0(%arg0 : f32, %arg1 : f32) :
        linalg.yield %arg0 : f32
    } -> tensor<?x?xf32>
    

    gets tiled to

    %operand = linalg.init_tensor ... : tensor<?x?xf32>
    %result = scf.for %iv0 = ... iter_args(%arg0 = %operand) {
      ...
    }
    

    After bufferization the linalg.init_tensor gets converted to a memref.alloc. To avoid cascading effects due to changes proposed here, with the following IR

    %result_shape = tensor.create_abstract_tensor [%d0, %d1] : abstract_tensor<?x?xf32>
    %result = linalg.generic {
        iterator_types = ["parallel", "parallel"],
        indexing_maps = [affine_map<(d0, d1) -> (d1)>,
                         affine_map<(d0, d1) -> (d0, d1)>]}
        ins(%input : tensor<?xf32>)
        outs(%result_shape : abstract_tensor<?x?xf32>) {
      ^bb0(%arg0 : f32, %arg1 : f32) :
        linalg.yield %arg0 : f32
    } -> tensor<?x?xf32>
    

    the Linalg tiling transformation will create a linalg.init_tensor for outs operands of type builtin.abstract_tensor and use that as the iter_args operand of the scf.for. This will insulate the downstream from any changes.

After this RFC, all remaining uses of linalg.init_tensor will be related to their use as a tensor with undefined values. This use can be deprecated by using ideas from this RFC.

1 Like

FYI : @nicolasvasilache @stellaraccident @_sean_silva @mehdi_amini @River707 @ftynse @herhut @pifon2a @gysit @matthias-springer

(Sorry for the long “@” list. Didnt want to miss anyone who might have an opinon/might be impacted by this, but also I could only add upto 10 people :slight_smile: )

Haven’t looked closely, but -1 on adding a builtin type for this. The builtin dialect does not look like the right place to build this out. I’m a bit wary of the tendency to build out domain specific constructs inside of the builtin dialect (which everyone depends on, and is supposed to serve the general needs of the infra itself). Tensor/Memref/etc. are there for legacy reasons IMO, not necessarily because that is where they should be. What are the trade offs of defining this in the tensor (or some other) dialect? Especially given that this is only (at least initially) generated by the proposed tensor.create_abstract_tensor operation.

Aside from that, what are the exact semantics of this type?. I think (from reading, though I might be wrong) it is intended to contain everything about a tensor except the value itself? I suppose shape, element type, metadata, etc.?

– River

I believe the intent here is that this is basically a value-less tensor and it can be used in contexts that only require access to the metadata (shape and element type). While the RFC is focused on one motivating use case, I have often run into the lack of completeness of the ShapedType hierarchy from this perspective: it can be very useful to have a type that represents only the degenerate shape/element-type and is part of the same hierarchy. Making tensor.dim, tensor.rank, tensor.reshape, tensor.expand_shape, and tensor.collapse_shape operate on either an AbstractTensor or Tensor would make a number of program constructs compose better in my experience.

I do agree that it is an accident of history that the ShapedType hierarchy is modeled as it is, is closed, and spans types associated with three different dialects (tensor, memref, vector) while itself being in builtin; however I think the questions we should be asking are: a) is this type valuable to exist in core, b) does it belong in the ShapedType hierarchy, and c) do we have a plan for a better future factoring/location for ShapedType? If we converge on yes for (a) and (b), then it belongs in builtin since that is where ShapedType is defined. This doesn’t preclude a better future plan, but that should look at breaking all of these up using some more modern constructs vs just declaring it off limits to make any more changes to that type hierarchy.

1 Like

Thanks for the explanation. I was glazing over some things while reading the context. I can understand the nature of the RFC in general, my main initial concerns right now are on layering.

The way I look at it, it is much harder to remove things from core places than to add them (mostly obvious). Putting something domain specific inside of builtin (IMO of course) is generally an indication of something broken somewhere along the stack.

a) is this type valuable to exist in core

I am biased here of course, but to me “core” is not tensor codegen. So I hope we can rather fix the broken layering now, than add to it (as I may have to be the one to fix it in the future).

b) does it belong in the ShapedType hierarchy

Right forgot about that thing. I can see how this makes sense.

c) do we have a plan for a better future factoring/location for ShapedType ?

Plans like these generally arise from necessity. I didn’t really have a plan, but if what we need to get better layering for these types of things is to make ShapedType an interface… when would you like the patch?

(I apologize once again for being the grumpy -1 person. Just want to make sure that we surface and fixes any legacy design decisions that don’t make sense anymore. I get a weird tingly feeling any time something is proposed for builtin, even when it should go there).

– River

Oh, it’s fine - I’m grumpy about this one too. The way ShapedType is factored and the implications that fan out from there has annoyed me since the very beginning. I’m not opposed to saying now is the time to fix it. But maybe before getting there, we see if there are other objections/support for the idea?

Right. I can look into a patch to see what kind of impact it has. Like with ElementsAttr recently, a lot of these should really just be interfaces. My main desire here is that we figure out the right way forward medium/long term, so that these types of things can be done more easily and in a less controversial way. Adding things to the builtin dialect naturally has a lot of push back and reservation (it affects a lot of people, many of which likely don’t care about e.g. tensors), and we really shouldn’t need the builtin dialect as a funnel point.

– River

+1 - let me file an issue to track. For old, pervasive things like this, it is likely to cause a fair amount of migration pain for our users. It’d be good to get an idea of just how bad that is and whether it needs some pre-staging or something. The initial split may not be that bad. I’m not relishing the idea of writing !tensor.tensor everywhere, though (if we go so far as to moving the existing types).

Ok, leaving aside whether this lives in builtin or tensor, I’m interested if there is other feedback about the topic.

In general I’m happy to see a type being added now, this was raised originally when the init_tensor and the general mechanism in linalg was introduced. I’m wondering what is the relationship with the shape dialect?
It seems like there is a non-trivial overlap between manipulating “shapes” and what is sought here. The initial example of the RFC about the %windows used in the linalg.pooling_nhwc_sum does not seem any different to me than manipulating the concept of “a shape”.

On the “builtin” aspect: this is a long lasting issue that is under-invested. I’m with River in the sense that we only seem to make progress on this kind of layering / software architecture aspects when we push back on adding more technical debt. This kind of event (the need of an addition) is always a good forcing function to at least make a plan (like we did with the std dialect a while ago).

My understanding of the !shape.shape type, which is quite dated as I haven’t looked at it in a while is that it is a more “analysis-centric” type that lives as part of a DSL for constructing standalone representations of shape transfer functions. Critically, it has error-or semantics and a definition that is quite different from tensor. There may be an undiscovered relationship between the two in terms of transferring in and out of “shape DSL” land to a concrete representation, but I haven’t given it a lot of thought.

From my perspective, the interesting thing about this type is that it fills a gap in the existing type hierarchy and is coherent with it. We don’t want to switch to a type system with different semantics just to represent the metadata of a tensor as part of a regular program.

I suppose the other usage we see a lot tensor<?xindex> to represent a truly data-dependent sequence of dimensions. I think what the proposed abstract_tensor is trying to capture is not a sequence of dimensions so much as a partial shape specialization in the same way that tensor does.

Example:

  • abstract_tensor<3x4xf32> would yield shape dimensions of type tensor<2xindex> or a !shape.shape if bridging to a shape transfer function
  • abstract_tensor<*xf32> would yield shape dimensions of type tensor<?xindex>
  • abstract_tensor<2x?xf32> would yield shape dimensions of type tensor<2xindex> but representing as the latter exclusively would be a loss of information.

I imagine that with the type, we would then immediately find need of various casting ops to get in/out of these other adjacent representations.

1 Like

This semantics difference is not clear to me from the doc describing the type: 'shape' Dialect - MLIR (could be a doc issue)

I think this RFc is also an opportunity to clarify the role of the shape dialect and what it includes, as much as I can sympathize with “We don’t want to switch to a type system with different semantics just to represent the metadata of a tensor as part of a regular program”, the counter part would be my concern that we create two parallel worlds that don’t inter-op seamlessly, that would make me question the shape dialect itself.
There might be good reasons for having two hermetic worlds, but I don’t feel I have seen it clearly spelled out right now.

Other aspect, if an abstract_tensor is the tensor without the data, it also raises the question of the “encoding” attribute.

We should definitely get the opinions of @jpienaar and @herhut. There is a lot of good and careful thought that has gone into the shape dialect for certain uses. I am aware of, but cannot fully represent, the viewpoints on this, which also include @_sean_silva. Iirc, in multiple attempts to use the !shape.shape type for this part of the problem, the sticking points were:

  • The "[invalid] for an invalid shape" part of the definition and whether you were dealing with the problem before or post validation of shapes. After a certain point, you assume that program constraints have been met and having such a high level type mixed in at those lower levels does not compose well.
  • As noted in this RFC and other cases I have seen in practice, the shape is often not needed on its own but in conjunction at least with the element type (and I am not sure about the encoding, as you point out).
  • The shape dialect is mixing three different concerns, none of which have in-tree uses, making it hard to reason about: definition of a shape type and supporting ops, a constraint system for establishing some invariants, and a concept of “shape libraries” for expressing shape transfer functions (i.e. for custom ops).

It is my understanding that the “shape library” functionality and related lowerings is what the rest exists in support of. If that is indeed the case, then I think these are different levels of the problem and we should be talking about how to interop between the different representations.

While it would be great to split up the linalg dialect, moving a single op this way really doesn’t address that. As you detail below, you have other objectives behind moving this op out. If you really would like to split up the linalg dialect , that could be discussed in a separate thread but I would recommend splitting along these lines as a starting point:

  1. ops used for representational purposes for tensor/memref compute (named or generic)
  2. ops being used for transformational purposes (generated by in-dialect passes, etc. and used by transformation pipelines)

There could be a finer split or another class but looks like ops used for transformational purposes were dumped into the same dialect. LinalgOps.cpp already takes 20 s to build (even in release mode and on a modern fast processor), and I believe this is really affecting build times for everyone including those who aren’t using linalg (for eg., changes to memref, scf, affine, or other dialects would slow the build critical path). I’m surprised that the file grew this big! There are of course other ways to split (for eg. TF uses tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc tensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc) but you might as well split the dialect.

A lot of discussion since yesterday, I only have a few moment now but will unpack more later :slight_smile:

  • Making ShapedType an interface, +1, I did this knowing it was not optimal but we needed something. I think we could switch to an interface with 0 downstream user changes. Even changing Tensor etc post could have little downstream changes, but pushing on type interfaces so that ShapedType is simply swapped sounds great to me.
  • !shape.shape indeed is type as value part of the world. And in that world the shape is part of the value, so making it part of type has always felt awkward. E.g., like if one had tensor<2xi64, [?, 5]> and encoded part of the value in the type. Doesn’t mean we couldn’t add !shape.dims<?x5> or the like.
  • Splitting shape dialect sounds interesting here. Moving the types out into its own dialect, making it lightweight and 0 to minimal ops. Having tensor, vector and memref dependent on it (pushing on how we represent dependent dialects so that this isn’t a big change a long with that). The computation and constraint representation ops can be moved out then.
    • Then we could also move the dim attribute Mehdi has in a different change here and out of builtin :slight_smile:

(will expand more later)

Thanks everyone for all the responses. I read through the whole thread, and will try to answer some of the points raised here. (Thanks @stellaraccident for responding to the concerns here in the meantime).

Having a ShapedType as an interface would be really good. I am happy to wait for that and use the interface for the abstract_tensor type. Indeed it layers better. In terms of putting it in builtin dialect, I said this dialect cause tensor is in this dialect. IMO, this type should be in the same dialect as tensor type, cause any place you would use tensor you would probably need to use abstract_tensor as well. So that was the intent. I am not super tied to putting this in builtin dialect. I am happy to put this in some other dialect (say tensor dialect) if that makes more sense.

Stella answered it to some extent. I cant find the words for it, but as she said there is a gap in the current system that the shape dialect does not seem to address. Thats what this RFC is trying to fill.

I am a bit apprehensive of trying to force fit what exists in shape dialect currently for this use case. Also +1 on clarifying the role of shape dialect (I would also like to know, really cause it is not entirely clear to me). Just hoping that we don’t get into a cyclic dependency between having to untangle things w.r.t shape dialect and this RFC. My understanding (as what Stella explained above) is that there is no redundancy here w.r.t what is in Shape dialect, and they are decoupled issues.

What is encoding attribute?

No it doesn’t solve the whole problem. Indeed, I am not the person to solve that problem. What is really at the core of this RFC is that I see linalg.init_tensor was added as a stop-gap solution cause we did not have the type that is proposed here. So its really technical debt. AFAICS, before we even start thinking about graduating things out of Linalg, this technical debt needs to be addressed. The uses of linalg.init_tensor have already gone past its initial use case. Without decoupling the current use cases and addressing them appropriately this will become a blocker to the overall effort of graduating things from Linalg. For example, couple of solution here which I think would be down a wrong path would be either defining an undef tensor (see all the issues w.r.t undef semantics in LLVM), or having side-effecting tensor ops to prevent CSE (which is what this RFC proposed).

That’d be great if it works out that way! I have tried this patch before and found it to spider a lot more than that, but that was a long time ago and the infra has improved (and I may have been holding it wrong).

I don’t quite get how is it different than how it is modeled on tensor: when the shape is fully encoded in the value, this is unranked (and there is a gradient where you can get some of the dimension dynamic).

I didn’t quite get this as clear from Stella’s previous posts so far (and Jacques’). Maybe I’ll re-read this over the week-end :wink:

Look for it in the doc for the tensor type: Builtin Dialect - MLIR ; it was added for sparse originally IIRC.

Thanks again for trying to pay off tech debt @MaheshRavishankar :slight_smile:

+1 for making ShapedType an interface and moving MemrefType to memref and TensorType to tensor and VectorType to vector to clean up the layering of that (very) old legacy.

Regarding the proposal itself, wouldn’t we be better off adding an abstract encoding on the tensor type?
This would be almost completely transparent to the infrastructure and all transformations, without the need to deduplicate or extend specific ops .
This would avoid the confusion re. “the encoding of an abstract tensor”: abstract is the encoding.

The difference between an abstract tensor and a notional shape is that you can take a subset of an abstract tensor at a certain offset.
An abstract tensor could convert to a shape where needed but not the converse.
I don’t think it makes sense to take a shape within a shape at a certain offset?