[RFC] Introduce a sparse tensor type to core MLIR

Note that this really is a continuation of two previous threads, Sparse Representation and MLIR Support for Sparse Tensors, but now concretely focusing on adding a sparse tensor type as first-class citizen to core MLIR (viz. a SparseTensorType residing in IR/BuiltinTypes). The considerations below are meant to restart the discussion on a proper design.

The proposal is based on the conviction that sparsity should merely be a property of tensor, not a tedious implementation detail. Starting with a program or kernel that purely operates on tensors as mathematical abstractions, annotating some of the tensors as “sparse” by means of this new type (including hints for a suitable storage scheme) will enable a sparse compiler to lower the higher level abstraction into actual sparse code that takes advantage of sparsity to reduce computational and storage requirements. This approach was pioneered in the source-to-source sparse compiler MT1 for linear algebra and formalized to tensors in the Tensor Algebra Compiler (@fredrikbk). The COMET project (@Gkestor) used this idea for a DSL compiler.

The feasibility within MLIR starting from a Linalg operation has been recently demonstrated with a first prototype, but yet without proper type support. This work was recently presented at ODM (slides and recording). Adding the sparse tensor type would enable this prototype to gradually develop into a full end-to-end solution, where tensors are annotated at source level and progressively lowered into efficient code by MLIR before handing it over to LLVM.

Meta-data

The sparse tensor type should at least carry the following information.

(1) Similar to a regular tensor, the type should define shape, rank, dimension sizes (including dynamic sizes), and element type. The unranked variant is probably not needed.

Examples
tensor<32x128xf64>, tensor<?x?xf32>, tensor<4x?x8xbf16>

(2) A per-dimension annotation of sparse or dense, which covers the specification of many sparse storage schemes already. Blocked variants can be expressed with synthetic dimensions. More advanced storage schemes would become possible by extending the annotations in the future.

Example
for a 3-dim tensor, 8 combinations, from dense/dense/dense to sparse/sparse/sparse

(3) Order of storage on the dimensions. Unlike dense tensors, which implicitly declare a row-major or column-major order on the dimension storage, for sparse storage schemes the preferred order of access should be made explicit. An affine map would suffice for this.

Example
for a 2-dim tensor,
(i,j)->(i,j) would define row-wise storage, and
(i,j)->(j,i) would define column-wise storage

(4) Preferred bit-width for the pointers and indices in the underlying sparse storage scheme. Unlike dense tensor storage, which consists of primary storage only (viz. the numerical values), sparse storage schemes may contain some overhead storage to reconstruct the actual tensor. Although 64-bit could be used for the pointers and indices in such overhead storage, more storage savings could be obtained with narrower bit-widths (sufficient to implement the indirection of the number of nonzeros at each level for pointers and sufficient to reconstruct the index in each dimension). Note that although we could use a per-dimension approach here as well (viz. 32-bit in the first dimension, 64-bit in the second), it probably suffices to simply define a single separate bit-width for pointers and indices.

Example: 32-bit pointers, 64-bit indices

Implementation

One possibility would be to use a sparse_tensor “is-a” tensor relation so that any operation on tensors would work for sparse tensors too. Since all passes will have to learn how to handle and possibly propagate the meta-data, however, a better choice may be to simply introduce sparse tensors as a new type, with only limited reuse of some of the tensor/shape types.

Syntax

A possible syntax would define a rank-2 32x16 matrix with f32 elements, the 16-size dimension sparse, column-wise access, and types i32 for pointers and index for indices as follows.

sparse_tensor<32x16xf32, DxS, (i,j)->(j,i), i32, index>

Proper defaults (i.e. identity permutation, index types for overhead storage) would allow shorter forms in most cases.

sparse_tensor<32x16xf32, DxS>

Perhaps we could even drop the sparse_prefix, and the sparsity by apparent from the presence of S after the shape. The sparse/dense annotation could even be incorporated into the shape, as originally proposed by John Jolly, with dense the default, to get an even more concise syntax.

tensor<32x16Sxf32>

Looking forward to a constructive discussion! If we can converge on a design, implementation will soon follow which in turn, will enable replacing some of the ad-hoc glue currently present in the MLIR sparse compiler prototype. Hopefully the new support will be useful to the MLIR community at-large.

1 Like

Hey Aart,

How is this different than using a sparse layout in the existing tensor type? The intention (oft discussed but never implemented?) was to have a layout attribute/string in tensor type, so you could say:

tensor<? x 42 x f32, "NHWC">

and similar things. Given such functionality, you could use a sparse layout just as well as a dense layout. If we go with a general attribute, you can encode as much data about the sparse layout as you want (and different sparse formats can have different parameters).

-Chris

Thanks Chris, I remember your suggestion from the earlier discussion and should at least have referred to it in the above summary.

My preference has shifted a bit since then towards a true first-class citizen sparse tensor type, in the sense that all passes will have to learn how to deal with sparse tensors properly (be it propagating, rewriting, or lowering). Adding an attribute to the existing tensor type is of course the quickest and easiest extendable way, (seemingly) requires no changes in a lot of code (like linalg on tensors) but it has the serious danger that existing passes will simply interpret or propagate the metadata encoded there incorrectly. It seems safer to just force passes to start thinking about a brand new type, with some familiar components, but also some new components. It also feels cleaner to represent all meta-data in a first-class citizen way, e.g. affine maps that can be verified as a permutation, rather than extracting that information from an opaque attribute like a string.

Happy to be convinced otherwise, though, since I don’t feel very strongly either way (I just feel strongly about adding proper type support after finishing a first version of sparse codegen in MLIR that required “glue” to work around a lack of such types).

+1. Adding a general Attribute to TensorType feels right to me. Aart could then define a custom Attribute that provides the information desired. We could go further and define an AttributeInterface that provides the information needed as well.

The main issue with that approach is that it is prone to having the information discarded, or have it just not make sense to be considered. For example, various bufferization patterns probably won’t know how to handle sparse tensors properly and it won’t make sense to locally fix them to at first (or possibly ever).

E.g. “fixing” the tensor.extract bufferization patternto “scan the overhead storage to find the right element”, which is the best we can do with a local pattern, just doesn’t make sense – some sort of global understand is needed. Consider a tensor.extract in the body of a linalg.generic used to implement a gather operation. For these, a more global understanding perhaps using techniques and specialized IR’s like those of Taichi pdf (e.g. see figure 2 or section 4.2) is needed.

In terms of bufferization, the most important thing is making sure to sync with @nicolasvasilache’s stuff on bufferizing linalg with in-place ops (code landing in IREE “soon” I’m told). This is something that the current upstream bufferization approach just doesn’t seem likely to work really satisfyingly on ever (except perhaps certain static shape, limited control-flow, limited parallelism scenarios) We should focus bufferization efforts as they relate to sparse on a small set of compute ops, likely limited to those in the linalg dialect at first, and then grow as we come to understand the requirements. Also, we should try to connect with linalg-on-tensors and try to have “tensors all the way down” (no memrefs at all, or at least not until very, very late) like we are finding works well with standard dense tensor codegen.

I think the right thing is for passes to treat unknown attributes/layouts conservatively - they should throw up their hands and say “I don’t know how to handle this type” just like they do for any other opaque type.

You’re right that we have a migration/adoption problem in that lots of stuff already works on TensorType but doesn’t check for this. I think we fix that by just pushing it through the ecosystem one step at a time.

Another thing I forgot to mention is that, although adding a sparse type (in whatever format) will help moving the sparse codegen part in MLIR that I started forward, a much more important objective is underlying to any good retargetable compiler effort, namely allowing several “sparse compiler front-ends” to connect with several “sparse compiler back-ends”. I tried to illustrate my long term vision with some poor ASCII art below. That way, although I enjoyed writing the sparse compiler prototype, the moment a better sparse codegen is written, the community at-large will benefit right way! Likewise, anyone writing a new FE that supports tensors with sparse annotations can feed directly into this new ecosystem.

annotated Python ---                     ---> sparse compiler 1 ---
|                    \                  /                          \
annotated TF -----------> MLIR with  -------> sparse compiler 2 ------> MLIR with -> LLVM IR
|                    /  sparse tensors  \                           /   coiteration
annotated XXX   ----                     ---> sparse compiler 3 ---

+1 for generic attribute (ShapedTypeComponents has had that random attribute expecting it :wink: ). The interfaces questions will be large and passes will need to be conservative, but I see the same conservative need being true for new sparse type (it may just fail louder today as that type is unknown) - only difference is you’ll have casts to tensor failing with it (many of those casts may also be too strict and they don’t really care about tensors but shapedtype).

Type interfaces is the other approach agreed, where one has ValueBasedShapedValue (or what not) and Tensor, SparseTensor etc all implement it and passes operate on it - but then those passes need to do the right thing if they don’t consider the attribute, which is probably the same then as the above world, except more opt-in perhaps.

First, thank you Aart for starting this discussion.

+1 for type interfaces. This moves the burden of change to the sparse front-end and back-ends as suggested by Aart. I imagine this will also concretize the attributes needed for a specific type interface. It would be good to have a list of methods that passes may need from the type to be successful. Perhaps methods for generating accessors and inserters?

I agree, but why then should we shove the extra-semantics in an attribute instead of a separate type? It seems that all client code will have to switch based on the attribute anyway.
The advantage of keeping a single factored “tensor” type and having an attribute only materializes IMO if the type can be handled uniformly and the attribute does not need to be handled. Otherwise it just makes every places that uses “tensor” more brittle: having a “TensorType” does not allow you to do anything with it in itself.

Whichever way we go, we’re going to need to allow something like a TensorType that also carries additional type level and/or dim level metadata. Of the examples I know of, there comes a certain place in the transformation stack where the metadata is transformed into discrete IR and can be erased (ie. Channel mappings like NCHW preserve until you lower to a form where that is op carried, and a similar process happens with quantization, and, I think, sparsity).

I don’t see a way around having switchy, “trait” like structures on the type which are defined such that you know when you have to be conservative. Once we’ve allowed that, I’m ambivalent about whether it is one base type (TensorType) or a family of types. A trait-like system probably argues for the single type and an appropriate interface for querying its “traits” and doing simple type transforms between common forms (ie. TensorType::withoutChannelMapping()).

My use of the word “trait” above is not great, since it is overloaded – it is just that it reminds me of how we use traits on ops, but able to be controlled on a per instance of a TensorType. I think this could be modeled with an attribute, and that attribute should have some design work done on it in the same way we try to design op traits so that we can, to the extent we care about, reason generically.

If you squint at it, the quantization infra basically does this with a differently spelled mechanism, and at the levels it matters, it hasn’t been hard to have most things be agnostic to it – up to the point where it is transformed away to a “vanilla” tensor. They won’t all be that easy (quantization was only easy in hindsight – there was a lot of design work that went into achieving a relatively simple/unobtrusive form), but I don’t think we can ignore the general complexity much longer and just need to embrace it.

The problem if we go with individual types, we’re going to have a hard time composing “traits” (what about when we want to have a quantized tensor, with an NCHW channel mapping and a sparse dimension?). To get that, we should go with the attribute approach and design it carefully, imo.

If we are going the approach of having one TensorType with an associated attribute, there may be some migration affordances that would make it a bit easier to make sure that only code which has been audited for generality gets is. As an example, we could introduce a new AttributedTensorType and then have the existing TensorType extend it, and we could arrange for the existing TensorType to be instantiated only if the attribute is empty (i.e. making it more of a “PlainTensorType”). Since they could be different C++ classes under the covers, it might make auditing and migration easier – I haven’t thought through all of the implications, though.

I agree that you can always model Type<bool> as TypeTrue and TypeFalse if you want to. This is an engineering tradeoff, there is no one right answer.

My thought here is that a lot of operations over TensorType would benefit from a uniform representation: e.g. in an XLA like system you want to do layout analysis and go from Tensor<stuff, no layout> to Tensor<stuff, layout>, and you want all the cast<TensorType>'s in the C++ code to work in both modes. My other sense is that many things (including shape analysis etc) will work across all of these things uniformly. Generic transformations on values of TensorType (e.g. constant folding) should be able to propagate the layout without knowing about it.

Coming back to the general issue, the upside of modeling as an attribute (vs separate types) is if you want open extensibility (like we do here), if you don’t want to have an abstraction mechanism (“TensorLikeType”) to unify things together, and when the commonality across the family members is high (e.g. all have shapes, dtypes etc). It is better to have separate types when the parameter isn’t orthogonal to the other data in the type - e.g. if sparse types only work with some dtypes, weren’t hyperrectangular etc.

IMO, all of this says that sparsity works well as a layout on TensorType, but things like ragged tensor do not.

-Chris

You’re getting a bit extreme here :slight_smile:

What about

I agree this is the salient aspect!

Since we don’t have a topic for tomorrow, I propose to make this the topic of the discussion :slight_smile:
Reminder everyone: the meeting starts 1h earlier at 9am (California Time).

I assume the sparse type will be a segmented array as in CSR. Longer term, I think it would be good to add the singleton/coordinate type, meaning it may be good to design up front a way to add it down the road without a lot of disruption.

The coordinate format is quite common. Many applications would provide you with coordinates and there are at least two good reasons to produce coordinate temporaries (e.g., for transposes and as a temporary workspace to support scattering into sparse/compressed result tensors that do not support random insert). It is, of course, possible to live without native support for coordinates and instead have hand-written code to convert the other formats to coordinates at some performance loss. And I get that supporting coordinates require non-trivial work in the compiler. But I think that a good medium-term strategy would be to aim for supporting three level types: dense, sparse/compressed, and singleton.

One thing to consider is changing the sparse name to compressed or building in support for later being able to specify sparse sub-types. That way dense, compressed, and singleton could be supported longer term without changing the types again.

Fair enough, I was going to make the level-format something like an enum anyway, but you are right that it is better to include singleton from the start. I also don’t mind using compressed instead of sparse to be more consistent with the literature. The important part is being able to specify a level-format per dimension, and to define an order on dimensions.

I agree, per-dimension descriptors and ordering is definitely the more important points and my comment was more in the weeds. I do think, though, that with those three we will have a really solid starting point.

1 Like

Here is the recording of the discussion for the last ODM.

2 Likes

Thanks Mehdi, that’s very helpful!

In an attempt to regain momentum on this discussion again, I prepared a WIP CL that adds a generic format attribute to the tensor type:

tensor<16xf32, "format">

Right now the format attribute can be anything, and has no meaning, other than that the type system knows that

tensor<16xf32, "format"> != tensor<16xf32, 12345> != tensor<16xf32>

If we get consensus on going this direction for sparse tensor types, then after finalizing and submitting this CL, we will introduce interfaces that assign actual meaning to a non-empty attribute.