TensorList as tensor of tensors?

If tensors in MLIR allowed a tensor of tensors then tensorLists can fall out of this?

List of tensors can be

tensor<5xtensor<2x3xf32>>
or
tensor<?xtensor<2x3xf32>>

List of scalars can be

tensor<?xtensor<f32>>

List of list of tensors can be

tensor<?xtensor<?xtensor<2x3xf32>>>

Shape type inference will also nicely flow.

Currently tensors cannot have another tensor as an elementType.

But is this something we could do? (Vector of tensors would make sense too I guess but we mostly deal with tensors)

Something unclear to me is: what is the difference between tensor<5xtensor<2x3xf32>> and tensor<5x2x3xf32>>?
(or rather why do we need to differentiate / when does it matter?)

Good question, let me add context, when one tries to add control flow even by using scf.while,

then in the gradient of the while loop you want to take the intermediates in the forward pass for each iteration.

So that is essentially a stack, which is done with a stateless tensorList with push in forward while and pop in the gradient while (reverses order nicely)

So while :

tensor<5x2x3xf32>

is an extra dimension

a tensorList is a list of tensors of the shape <2x3xf32> which one wants to push and pop to.

You could work with concat and slice to do many copies, but a list allows you to just pass around pointers and avoid a whole bunch of copies at runtime…

Summary, it is a different semantic structure, it clarifies what the “layout” of objects / data is, one is a collection of tensors, another is an extra dimension thus a “1 bigger tensor”

That makes sense, but I’m question why the right type to model this is then “tensor” and not “tensor_list” or something like that :wink:

It’s a little bit hard to reason about this in the abstract. In TF for example, the operations defined for tensorlists make it pretty clear that it is more list-like than tensor-like: in TF 2.0 eager mode, in fact, it is just implemented as a Python list. There are various access patterns on them that can/should be canonicalized down to simple (non-nested) tensors.

In IREE, where we have taken opinions on runtime semantics, we lower all such things coming from TF to a real list type containing ref-counted, allocated arrays. There is not enough machinery upstream to represent such things cleanly, and pushing it further would put upstream firmly in the position of taking opinions that it currently avoids (not saying it shouldn’t get there but that it is not a simple gap - it forces a number of other design decisions).

Right, we can have a special tensor_list type, but what we really just wanted was a tensor of tensors, tensor is a collection of elements, element here being a tensor.

Nested loops are going to have list of lists for intermediates… it is recursive at that point, extending a tensor type just seemed like a clean idea to some us, except MLIR has validation code forbidding tensor of tensors completely.

It is a question of reusing existing types verses making new tensor_list type. Currently MLIR doesn’t have a canonical solution to the entire space itself, maybe it is on purpose but gradients of control flow has to be solved at some point anyways.

TF dialect did some games with opaque type and making tf.variant, but it kinda seems unintuitive TBH, it completely drops types for more than 1 recursion and somehow mixes optionals and lists (it is might be just us not getting it).

Tensor of tensors is just very clear in what it means and a lot of the code for shapes just carries forward.

Agreed, whether it be tensor of tensors or a tensor_list type, something should be done in MLIR (builtin types?) for this, it is a common problem which needs a standard solution sooner or later.

I also agree a tensor of tensors (or a list) is essentially a set of refcounted tensors under the hood, that can be a runtime/lowering detail left upto the lower dialects or runtimes.

Probably if tensor of tensors were legal, everyone could just use those whenever lists happened, unless we are missing something.

You still haven’t mentioned which environment you are coming from. Is this one of the ml frameworks, something custom, etc? To my knowledge, they all do this differently and runtimes also all have their own mechanisms. We were unable to find a universal mechanism to reduce them all to and just chose to embrace the weird that exists.

Back when I used to work closely on this aspect of Tensorflow, we had discussed doing away with the variant based type erasing and just give it a real tensor list type in the IR, but it is still a very Tensorflow specific topic and would belong with that project (some of the ops and rules it implements are really quite weird and defy a completely generic implementation that would be used anywhere else). IREE does some type inference to get it into this form, but there was never any consensus to actually change Tensorflow at the source, and since the topic is moot on the others, we stopped putting energy into it.

Right now llvm/mlir doesn’t really have a “north star” ML integrations project which tries to model such frontend and runtime characteristics. Perhaps it should, but it isn’t entirely clear what opinions it would take on the esoteric parts like this.

The dialect “for discussion’s sake” would be closest to something between TF dialect and TOSA/HLO dialect. With a bunch of ML ops like conv2D, pool, etc, along with the basic math ops which work on tensors of course.

We are probably actually looking for that same “north star” and whatever is missing we add it for ourselves and try to upstream when possible.

We don’t want to change TF, we just want to represent lists for ourselves, there seem to be 2 ways (that we know of)

have tensor of tensors or have a list type.

Since tensor of tensors are illegal we will probably be forced to do the latter, but we just wanted to ask maybe the former is nicer and can be legal? I haven’t heard from anyone why tensor of tensors might be bad, apart from the work involved…

It isn’t clear to me that a list is the same as a “tensor of tensor”, for example wouldn’t a list have a dynamic size? You mentioned before wanting to push and pop for example.

Then there is the question of uniformity: would tensor<2xtensor<?xf32>> guarantee that the inner tensors have the same size?

Ok, thanks - didn’t mean to pry but was just wondering if it was a case I had already studied at some point.

I kind of agree with Mehdi below on the nested tensor semantics questions. When I’ve seen this modeled before with just tensors, it has been really restrictive and complicated and not been a great match to the source language – kind of one of those cases that is really calling for a real type and supporting ops to capture what is desired. Your case may be different, but sounds similar to ones I’ve struggled through semantic mismatches on.

I’d introduce a list type, supporting ops and folding/simplifications to simple dense tensors for conforming cases. I don’t know if such a thing belongs upstream (ie. In the tensor dialect), but it might – especially if being designed fresh/cleanly. The other examples we’ve had all came with a lot of historical baggage and were never really defined to a level of fidelity that we expect for the core mlir project.

Great question again, and to that I point to the history of TF again, they used to have stateful tensorArrays with dynamic sizes, but then they moved to stateless TensorLists

When one pushes or pops from these lists (variant based) they get back a new list with a new size. So semantically it is NOT a dynamic size list it is literally fixed size.

The TF runtime notices that the input list is never used again and then mutates the list in place, it is a lower level dialect or runtime optimization. Thus tensor of tensor objects can be valid representation.

As to the question of uniformity:

It is uniform in the fact that the tensor of tensor objects will have types tensor<?xf32>

tensor<2xtensor<?xf32>>

beyond that it can be a tensor with 2 tensors in it where the 2 tensors CAN have 2 different sizes (same rank though), this is absolutely necessary because it is an actual use case, imagine a 2 iteration while loop with a concat going on inside it, the intermediate spit out for use in gradient pass added to this list will be sizes:

tensor<1xf32>
tensor<2xf32>

This is actually something missing in the scf dialect type validation we pointed out in previous posts and are trying to upstream a potential solution for.

So to your point, we don’t intend to match a source of any language here, rather we want the source to follow what makes it easy in MLIR, so in our specific case we have no baggage.

Given that, if we had a tensor of tensors and the flexibility to do the “right thing” in the API then, what can be a blocker from having tensor of tensors be a valid type in MLIR?

The problem with list type seems, I can’t see (especially in stateless list) how it is actually different than a tensor of tensors… and I would prefer reusing existing concepts than making a new one.

I am not married to tensor of tensors I am happy to get a reason why it is a bad idea so I may live in peace and focus on a specific list type then :wink:

I’m not married one way or another either, and I’m also not the best person to reason through the implications/intents behind the built-in tensor type (some other folks do have opinions on that and probably best to wait for non-weekend hours for a real discussion).

My main visibility into the usage comes with what we do with such types during lowering. There is always the TensorFlow runtime approach which uses a lot of runtime slight of hand to allow its tensor types to represent anything, but more from a compiler oriented system, one of the first things we do is separate such types depending on whether they contain things with value semantics or reference semantics in our setup: for a high performance implementation, there is very little overlap between those worlds and most of the machinery is dedicated to the containers that hold value/non-ref components, with the ref-containing containers all being simplified down to some variant of flat lists with runtime support for managing the ref-counts.

From that perspective, so long as the choices in the type system allow us to write transformations/canonicalizations which flatten everything that can be flattened, and so long as we have non ambiguous ways to separate the worlds during lowering between value/ref-containing tensors and perform other transformations, such as realizing a mutable list-of-array-refs from the original program – it is fine with me. I will note that other frontends like PyTorch and various-Numpy oriented things just take the easy path and either a) use the mutability of their tensors and differentiation model to express these kind of computations, or b) just use a real mutable list type in the frontend.

My experience with TensorFlow in this area has been very non positive, and especially with seeing other options, my baggage is that I tend to bias towards a more complete frontend type system at this level vs an ever expanding definition of tensor into things that actually don’t have anything to do with the primary goal of numerical optimizations of tensors. I may be over-correcting :wink:

A couple of heuristics I’ve used in the past to determine whether things “fit” in the tensor type:

  • How would one express constants of the expanded type, and does that work with the way the attributes are modeled?
  • Is this a job for a dialect type – either a domain-specific replacement for TensorType or a dialect type for the element type (tensors may contain arbitrary dialect types, where it is then assumed that the dialect has defined the semantics)?
  • Is the expanded type congruent with the decisions made in various bufferization approaches for realizing concrete buffers for tensors (or is this defining more of a type island)?

Does MLIR have a ragged_tensor type? Perhaps tensor lists could be modeled as a ragged tensors? [edit: was ragged_tensor<?xtensor<?xf32>, but I did not mean to be that specific]

No, it doesn’t - and I’m not aware of any mlir based frontends that have modeled that in their IR either.

You definitely have some great points there, seems we just have to go with a custom List type for now, but I think it might be worth consideration to model lists in core MLIR somehow.

This has come up a few times in the things I have visibility into, and is surprisingly hard to model something abstract that is both generically useful and not tightly coupled to source or target concepts that MLIR core does not currently take opinions on (i.e. memory management semantics, mutability, value vs ref, ownership, heterogenous/homogenous, etc).

Some of the existing items:

  • (npcomp) basicpy.list : decidedly quite “python” (also differentiated from tuple) when considering the operations defined for it. Currently standing in for TorchScript lists as well (but needs to be extended with type constraints).
  • (iree) vm.list : Models the IREE virtual machine’s built-in list type with ops for manipulating it. A subset of the functionality is also mirrored on the iree.list type, which is the “public” analog to vm.list (i.e. suitable for interop with IREE from the outside). This is type erased at runtime (variant), mutable, resizable, and able to store primitive or reference-counted VM objects. It crashes on illegal accesses.
  • (iree) tf_tensorlist.list : Attempts to model a TensorFlow TensorList as a discrete type following some type inference to raise it from tf tensor/variant types. When compiling we lower this form to something like iree’s vm.list (but which predated vm.list and we are working to normalize things).

The last one may be somewhat like what you are looking for, but to my eye it is quite domain specific and isn’t really a “universal” list type. It is very different both in level and capabilities to just the two others listed here. We’d be open to contributing the tf_tensorlist.list somewhere more useful, but discussions with the TF team to normalize any of this didn’t go anywhere and we needed something. That was a while ago, though, and there may be different results now…

In the absence of a real universally useful abstraction, it is perfectly fine to have domain specific types and ops – in fact, it is even preferred if there is enough “weird” that needs to be modeled such that a more common form would lose information.

1 Like

Ignoring the question of how best to model the real use cases (like a list-like thing holding tensors that are necessary for backprop, which is a valid use case)…

I would support removing the “no tensors of tensors” verifier restriction if that happens to help anyone (not saying it’s a good approach). We allow arbitrary dialect-specific types, which clearly indicates that there are no real requirements or rationale for restricting the contained type. E.g. a user could implement a !mydialect.tensor_element_type_wrapper<tensor<?xf32>> as the element type to fool this verification – let’s just not have the constraint at all.

Or to put it another way: if the semantics of tensor are sufficiently broad to permit !mydialect.tensor_element_type_wrapper<tensor<?xf32>> as an element type, then clearly they must be broad enough to permit tensor itself as an element type.