[RFC] split the `tensor` dialect from `std`

This is part of splitting the std dialect.

tensor Dialect Scope and Goal

The tensor dialect is intended to hold core tensor creation and manipulation ops, which are not strongly associated with any particular other dialect or domain abstraction. The primary smoke test of this is ops that make sense for any tensor element type.

We leave it to other dialects to hold the vast swath of possible computations one might want to do on a tensor, such as:

  1. TOSA ops
  2. Linalg on tensors.
  3. ElementwiseMappable ops like addf that support tensors
  4. TensorFlow’s various dialects

Examples of potential future additions to the tensor dialect could be:

  • tensor.pad to pad a tensor to a given shape
  • tensor.reshape to reshape a tensor.
  • tensor.undef for creating a tensor with undefined contents but a given shape (with appropriate discussion of course; undef is tricky :wink: ).

This is intended to follow the success of the vector dialect for having a “home” for ops uniquely related to the vector type.

Deliberately constrained scope

There is one key difference between vector and tensor though. The vector type has a much more constrained set of use cases due to its careful definition as only representing vectors of machine data types in registers. The tensor type is (for better or for worse) used to represent all kinds of things, and supports an open-ended set of element types. Examples:

  • representing large, dense aggregations of primitive types, suitable for high-performance numerical computing (such as in linalg, mhlo, or TensorFlow).
  • representing shapes in the shape dialect, which consist of small 1D tensors of index data type.
  • representing aggregations of strings or “variant” types, such as in the TensorFlow dialect in certain cases
  • representing large, sparse aggregations of primitive types, suitable for high-performance numerical computing

Thus, for the tensor dialect, we prefer for now to constrain the documented scope as much as possible. The expectation is that at some point in the future, the tensor dialect’s scope could be broadened through a careful discussion of the tradeoffs. We believe that this RFC is an incremental step towards reconciling these disparate use cases.

Ops being moved

I propose to move the following ops (with the following suggested renames) from std to tensor.

  • std.tensor_cast -> tensor.cast
  • std.extract_element -> tensor.extract_element
  • std.tensor_from_elements -> tensor.from_elements
  • std.subtensor -> tensor.slice
  • std.subtensor_insert -> tensor.insert_slice
  • std.dynamic_tensor_from_elements -> tensor.generate

Notable non-inclusions (at least initially)

For now, we are not including certain ops that are less obvious and merit their own discussion that would add too much complexity to this RFC. Postponing decisions on these ops doesn’t seem to affect the overall direction of the RFC.

  • std.constant: In the future, it might make sense for std.constant to be disaggregated into vector.constant, tensor.constant, int.constant, etc.
    • std.splat: will probably need to be handled similarly to std.constant, and also enhanded to support dynamic shapes for the tensor case
    • These discussions should wait until more of the std dialect has been split.
  • std.dim (and std.rank): this op works on both tensors and memrefs. It seems like tensor.dim/rank and memref.dim/rank might make sense to split.
    • This discussion should wait until we have split the memref dialect out of std.
  • tensor/memref bridging ops: std.tensor_load , std.tensor_store, std.tensor_to_memref. Given that these assume element types compatible with memref, they are not suitable for the tensor dialect as proposed in this RFC.

Development process

We expect that split proposed in this RFC can be accomplished in one or a handful of patches, as it is just dialect boilerplate + migrating some op definitions. Some modernization patches might also be involved as part of the split.

After the initial split, we expect to circle back to some of the more ambiguous ops listed in “Notable non-inclusions”, which would add a few more ops to the dialect.

5 Likes

+1 on a tensor dialect! I believe it will become a handy toolkit for people (me) who are working on loop transformation on tensors!

  • tensor.undef for creating a tensor with undefined contents but a given shape (with appropriate discussion of course; undef is tricky :wink: ).

+1 on this operation.

This sounds good to me. I think that constant/dim/rank should definitely be split across dialects as you propose, but doing that as a separate step is perfectly reasonable.

In order to make effective progress on such a big change, I think it is important to not get bogged down on perfection. The std dialect includes a lot of weird stuff, I think we should anchor on what existing pieces make sense to put in which bucket, not necessarily whether the existing stuff was a good idea or not. We can continue to refine the design after the buckets are defined and the refactoring is complete.

1 Like

Looks like folks are generally in favor of this direction (also factoring in some offline conversations).

I started a patch https://reviews.llvm.org/D92991 adding a tensor dialect and moving std.extract_element -> tensor.extract (which is consistent with its sibling that already exists: vector.extract). Would love it if folks took a look (feel free to add yourself as a reviewer)

It looks like the winning approach in this case will be to move one or a small number of ops at a time, cleaning them up as part of the move. The reason for this is that the cleanups to the ops can’t naturally be done as a pre-refactoring because a substantial part of those cleanups is moving segments of ancient tests (like test/IR/core-ops.mlir) into the new properly factored locations, which is natural to do as part of moving the op.

Makes sense. thank you again for driving this!

Nothing major, but I tend to prefer tensor.get_element to tensor.extract for two reasons: (1) extract in standard data structures/ADT literature also involves removing the element out of the container structure (eg. extract_min/extract_max for heaps, priority queues in standard textbooks), (2) extract is more suitable for mutable containers but the tensor SSA value is an immutable entity.

Great proposal! :slight_smile:

Right now there is no RFC for splitting the ‘memRef’ dialect from ‘std’, but wouldn’t there be the same problem with the ops: std.tensor_load, std.tensor_store and std.tensor_to_memref in the ‘memRef’ dialect?

We think that a “bridging” dialect could help to solve this problem. Furthermore, it should also include a copy operation inspired by ‘linalg.copy’, which at least in our opinion should not be part of the ‘linalg’ dialect. In addition, this approach could help us to progress with removing tight coupling of the ‘BufferDeallocation’ pass.

Yes, a bridging dialect would probably be a good solution.

Personally it seems like copy belongs in a “memref” dialect, as it does not “bridge” between tensor and memref. Agreed that having it in a more neutral place would be great!

With Tensor being a builtin type, I’m fine with having memref.tensor_load in the memref dialect. This does not introduce any dependency and is conceptually much closer to the memref domain than the tensor one.

I generally agree. I would also, for simplicity, prefer to just have memref.bufferize_cast in the memref dialect for simplicity instead of creating a bufferize dialect just to hold bufferize.cast.

I agree that it does not make any sense to create a dialect for one operation. An additional question is whether we should also include some form of copy operation to make the need for user/dialect-specific copy operations unnecessary for (hopefully) most use cases :wink:

There were concerns about dialects having operations that cannot be lowered further down in the original discussion on splitting std - [RFC] Splitting the Standard dialect - #11 by rengolin. Having tensor_load and bufferize_cast in the hypothetical memref triggers those concerns.

I suppose it’s not necessarily a problem the conversion setup remains easy, e.g., all memref operations can be declared invalid in the conversion target either because they can be converted or because they should have been removed before the conversion starts.

Having a bufferize dialect is cheap and the ops therein are local to the bufferization transformation. They should only be used in that context and placing them in their own dialect makes that clearer IMHO.

I am also fine with a “prefix” solution but then we should apply it to all operations, in particular also the bufferize_copy operation that creates a new copy of the argument wrt. its lifetime but necessarily with respect to its contents. So mutating the result of such a copy might or might not mutate the input operands and hence one should not mutate them.

This is similar to my desired semantics of the cast operation and placing all operations that have these special semantics into their own dialect seems preferable to me.

It isn’t clear to me that tensor_load is inherently tied to the “bufferization transformation”, actually I’m fairly sure it has always been there.

On the “bufferize dialect”: I’m also reluctant to see any “magic” dialect with operations that have “special semantics” that can’t be conservatively described to the system and must be handled in a special way. Is there already a description of the intended ops for this domain somewhere?

Yes, it was reused because it was already present in the dialect and adding a new op that does, what was perceived as, essentially the same thing seemed overkill. I see that more as an argument to have dedicated ops in their own dialect to avoid someone reusing these operations in a different setting.

And I am not proposing to remove tensor_load or tensor_store. Whether they are needed and desirable is an independent discussion.

We discussed this on ⚙ D91967 [mlir][std] Mark tensor_to_memref as no side-effect for the cast like operations. The need for a copy operation was motivated in Remove tight coupling of the BufferDeallocation pass to std and linalg operations - #18 by herhut and I sketched potential semantics above.

Finally got back to this after the holiday break. I managed to move all the ops (one patch still in flight) except for std.subtensor (and std.subtensor_insert, which I’m choosing to keep with it). std.subtensor has a bunch of common helper code with the std.subview op, and that common code relies on being able to create an std.constant of index type.

  • std.subtensor and std.subview depend on common code, which relies on std.constant 1
  • currently, std dialect depends on tensor (for example, some std ops have canonicalization patterns that use tensor ops)

Thus, putting std.subtensor into the tensor dialect would create a circular dependency.

So that means that we need to wait on defragging std.constant. The minimal change here would be to create an int dialect and have int.constant there that excises the int part of std.constant.