Sparse Tensors in MLIR

Note that this really is a continuation of three previous, and by now rather lengthy threads:

After a lot of constructive debate, we made good progress. We now have a SparseTensorDialect, which will provide a home for anything related to sparse tensors (attributes, operations, rewriting, passes, flags, etc). Furthermore, we now also have a proper first-class citizen sparse tensor type in MLIR, implemented through an “encoding” field on the built-in tensor types.

There will be a few more consolidating revisions that move code into this new home (most notorious from Linalg; we thank @nicolasvasilache for providing us with a temporary home, realizing very well we overstayed our welcome). After that we expect the sparse compiler related code to become much more stable. I will use this new thread to keep you posted on this consolidation.

Also if you are interested in contributing to this effort (either with FE work that maps to the new sparse tensor type, or BE work in improving the actual sparse compiler), please let us know, either in this thread or through private channels.

5 Likes

Revision D101573 introduced the SparseTensor dialect (including a minor migration of some sparse primitives from Linalg).

The first in a series of consolidating revisions is D101669, which migrates the sparse tensor encoding from the Tensor dialect into the new SparseTensor dialect as its permanent home. This revision also includes some internal improvements of the attribute that were requested by @mehdi_amini, namely using struct-like storage, verification, and construction (even though the encoding still parses and prints as a dictionary).

As a result of this consolidation, the encoding of a sparse tensor type now looks as follows.

#CSR = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>
...
  func @foo(%arga: tensor<1024x1024xf64, #CSR>, ...
...

Up next: more migration from Linalg into SparseTensor, and removal of glue and clutter…

Revision D101811 migrates all “sparse compiler” related files from Linalg dialect into their permanent new home in the SparseTensor dialect (this includes rewriting, options, and test files). In addition it introduces two proper compiler passes, which replace the previous and temporary test passes.

Besides the file migration, the user visible changes are:

(1) The test switch --test-sparsification has been replaced with the proper compiler switch --sparsification. This switch accepts options, as before, although the pointer/index bitwidth options are scheduled to be replaced by information in the proper sparse tensor type.

(2) The test switch --test-sparsification="lower" variant has now been extracted into the proper compiler switch --sparse-tensor-conversion, so that the conversion can be tested separately from the sparsification pass.

Up next: remove all last glue and clutter from linalg, and make SparseTensor dialect operate on properly typed sparse tensors only.

With the very elaborate, but also very fun revision D102095, my dream of migrating to a first-class citizen sparse tensor type has become a reality!

Some major difference are described below.

(1) The sparse annotations in Linalg ops have been replaced in favor of proper sparse tensor types. So, for example, rather than using an annotation on the operation trait

#trait_sum_reduce = {
  indexing_maps = [
    affine_map<(i,j) -> (i,j)>, // A
    affine_map<(i,j) -> ()>     // x (out)
  ],
  sparse = [                            ;
    [ "S", "S" ], // A                  ; THIS PART
    [          ]  // x                  ;   IS REMOVED
  ],                                    ;
  iterator_types = ["reduction", "reduction"],
  doc = "x += A(i,j)"
}

This information is now carried by the sparse tensor type itself:

#SparseMatrix = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed" ]
}>

(2) Removal of the secondary storage choices for pointer and index width (viz. --sparsification="ptr-type=2 ind-type=2"), which applied to all sparse tensors in favor of a per-tensor specification of these widths.

#SparseMatrix = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed" ],
  pointerBitWidth = 32,
  indexBitWidth = 32
}>

(3) Glue and clutter to materialize the sparse tensor storage as opaque pointer into proper tensors is completely gone! So no more:

func @kernel_sum_reduce(%argA: !SparseTensor ... ) {
    %arga = sparse_tensor.fromPtr %argA : !SparseTensor to tensor<?x?xf64>
    %0 = linalg.generic #trait_sum_reduce
      ins(%arga: tensor<?x?xf64>)
      ...

But simply:

func @kernel_sum_reduce(%arga: tensor<?x?xf64, #SparseMatrix> ...) {
    %0 = linalg.generic #trait_sum_reduce
      ins(%arga: tensor<?x?xf64, #SparseMatrix>)
     ...

Subsequent bufferization takes care of replacing the types with whatever underlying implementation is selected by the compiler.

Also, setting up a sparse tensor in an integration test sees the biggest improvement to connect with the sparse support library. No longer:

    %annotations = memref.alloc(%c2) : memref<?xi1>
    %sparse = constant true
    memref.store %sparse, %annotations[%c0] : memref<?xi1>
    memref.store %sparse, %annotations[%c1] : memref<?xi1>
    %i64 = constant 1 : index
    %f64 = constant 1 : index
    %a = call @newSparseTensor(%fileName, %annotations, %i64, %i64, %f64)
       : (!Filename, memref<?xi1>, index, index, index) -> (!SparseTensor)

But just the following (sic!):

%a = sparse_tensor.new %fileName : !Filename to tensor<?x?xf64, #SparseMatrix>

(4) More rigorous verification of type consistency of all ops, including the generated sparse primitives that connect the generated sparse code with the support library (note that the latter is simply there for convenience, in the long run even the library could be replaced with codegen).

Next: one last revision that removes all obsoleted sparse code form Linalg dialect.

2 Likes

Finally, revision D102098 removes the now obsolete sparse annotations code in the Linalg dialect.

So long, and thanks for all the fish!

1 Like

Hey Aart, this is fantastic! I have to admit I didn’t think we were this close.

I’d love to get some of this exposed through our Python API and give it a try. I think the things we need to wire up are:

  • C/Python API support for your sparse_tensor.encoding attribute.
  • Python Tablegen bindings for the sparse_tensor dialect
  • New APIs to construct a tensor type with an encoding attribute

Do you have any of that in-flight? If not, I could probably get it pounded out this weekend.

1 Like

Yes, please give it a look over the weekend! Getting proper API’s for setting up the sparse tensor encoding and sparse tensor types are indeed the proper next steps.

Most ops in the sparse tensor dialect are not needed externally (since they are generated by the sparse compiler), with the exception of the sparse_tensor.new operator, which is my “Swiss Army Knife” operator for materializing sparse tensor values into a computation (reading and initializing from an external file format like FROSTT being the most obvious implementation, but we would also generalize this to buffers or runnable initialize code).

All the consolidating revisions have been submitted. The sparse compilation project remains under very active development, but with a new home in the SparseTensor dialect, going forward we expect a lot less extensive renaming and refactoring (famous last words?). So, if you were interested in this project, but had not gotten around looking at the sources yet, this would be a good time to get more familiar with the sparse code base.

Next plans focus on front-end work (e.g. Python support for the annotations) and back-end work (e.g. improve sparse codegen quality, new features, support sparse output). Also, we encourage other projects that target sparse compilation in one way or another to start using the newly added sparse tensor type, so that ultimately we all “speak the same sparse language” and swapping in different front-ends and back-ends for experimentation and comparison becomes much easier. And if you feel features are missing in the sparse tensor type encoding, we accept PRs!

1 Like

Thank you, Aart. This looks like an excellent implementation. I’m already in the process of breaking it. :wink:

I’d like to start modifying the SparseTensor attributes to implement a few interesting features, but I wonder if I should be modifying SparseTensorAttrDefs.td directly or if I should perhaps create a new attributes definition? I am concerned that adding attributes would set a precedent that would lead to bloat in the definition.

A little background might help. The TACO representation is great for most cases, but there are certain sparse data structures that do not lend themselves to the per-dimension format available. I would like to add functionality to the SparseTensor dialect while still providing backward compatibility to the TACO representation.

So, how can I do this without making a mess of your work?

When the functionality makes sense for sparse tensors in general, we are very willing to accept PR’s. In fact, we prefer that over fragmenting the type “right out of the starting blocks” :wink:

Extending the per-dimension level types is already planned, but it sounds like you want something different. So why don’t you propose your extension here, and we can have some discussion on whether this will be generally useful for the sparse community at-large!

Revision D102856 adds the required compiler and runtime support for arbitrary dimension orderings in sparse tensor types. Note that there is a minor subtlety on whether the dimension level types are interpreted before or after applying the dimension order. I opted for applying the dimLevelType to the reordered dimension, viz CSR and CSC are defined as follows:

#CSR = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],  // dense applies to i
  dimOrdering = affine_map<(i,j) -> (i,j)>
}>

#CSC = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ], // dense applies to j
  dimOrdering = affine_map<(i,j) -> (j,i)>
}>

The sparse compiler and runtime support now correctly account for the given order, generating the code in the required order, and reordering the indices automatically when reading in tensors from FROSTT or MatrixMarket formats.

As extra data point, this ordering choice seems to be consistent with Python syntax used by TACO:

csr = pt.format([dense, compressed])         # Compressed sparse row
csc = pt.format([dense, compressed], [1, 0]) # Compressed sparse column

The online TACO utility keeps the level format and dimensions nicely grouped together, though, avoiding the ambiguity. We could perhaps do the same in the syntactical format (e.g. (i,j) → (j:dense, i:compressed))?

Hi arrtbik,

Thanks for sharing. I’m a professor from Shanghai jiao tong university. My research group are working in this direction. We also have a paper in the submission to improve the conventional sparse storage schemes like CSR/COO for sparse DNN training. We also integrate our work into the TVM stack. We are currently looking into sparse tensor solution in higher performance and generality. Hope we can contribute to sparse tensor dialect.

1 Like

Thanks for your note, prof. Li Jiang!

Can you share the paper already here, or do you prefer to wait until after publication?

I am looking forward to your contributions!

Let me check the policy with my students. We can share the paper if the policy allows us to upload the manuscript to arxiv. Otherwise, I can share it as long as the paper is accepted.

Aart,

Forgive my late response. I have this nasty habit of writing responses and, when I think of a particularly good question to ask think to myself, “I’ve got the code, I can answer this myself.” I then proceed to spend time trying different solutions - in the process creating more questions that I run through my process.

I’ve rewritten this response five times already.

I would like to explore the possibility of creating an alternate attribute set to describe the data layout of a SparseTensor. From the look of things, I should be able to do this via the following steps:

  1. Define new attributes by creating a new definition file similar to SparseTensorAttrDefs.td.
    • This should allow me to create a new mnemonic such as mod_encoding.
  2. Copy an rewrite the parse, print, verify, etc. methods in IR/SparseTensorDialect.cpp
    • I am concerned there is a bit of reusable code here, but this is not my biggest concern
  3. Create an entirely new Transforms/Sparsification.cpp with whatever transform I expect to implement
    • This will be where the bulk of the difference is found from the original sparsification implementation
  4. Add or duplicate the sparsification pass to Transforms/SparseTensorPasses.cpp
    • Here is my biggest concern. I would like to avoid creating a whole new pass and allow the sparsification pass to detect the different attributes and call the appropriate rewrite implementation based on the attributes used.

In short, I want to keep a unified sparsification pass, but I would like the type of sparsification to be determined based on the type attributes of tensor being used. To make the pass simple the initial implementation should verify that the SparseTensor attribute types are all the same.

My overarching purpose is to provide a framework that allows for researchers (like myself) to experiment with different sparsification implementations without having to recreate a whole new sparsification pass. Eventually, if these experiments prove useful, the sparsification pass can provide various implementations that can be activated with a simple change in the IR.

I hope this makes sense. Please provide correction if I am misunderstanding how things are organized, or ask questions if I am not clear.

Again, thanks for your excellent work.

Thanks for your reply, John. Basically you are proposing (a) a new sparse tensor type, through a new encoding, your 1 and 2, and (b) an alternative sparse compiler implementation, either integrated or separated, your 3 and 4.

I am perfectly okay with (b) and in fact expected that to happen sooner than later (we can bikeshed on where such a pass should live, but that is for a later discussion). However, I would like to understand your reasoning for (a). Why not simply extend the existing sparse tensor type encoding to accommodate features you think are missing? Existing sparse compilers can be modified to ignore new features they don’t understand yet, and over time be taught how to generate code for these as well. That is much more in line with the picture I had in mind (copied below), where all projects at least “speak the same sparse tensor type language”.

annotated Python ---                     ---> sparse compiler 1 ---
|                    \                  /                          \
annotated TF -----------> MLIR with  -------> sparse compiler 2 ------> MLIR with -> LLVM IR
|                    /  sparse tensors  \                           /   coiteration
annotated XXX   ----                     ---> sparse compiler 3 ---

If at all possible, I would like to discuss if features you had in mind can be integrated into the existing type, or at least understand why that is not a viable route forward.

Aart,

Thank you for the open discourse on the subject. I enjoy understanding more clearly your vision of the sparse implementation.

This will probably work fine if I properly understand how a specific sparse compiler implementation will be selected. Creating a new type would implicitly choose the compiler implementation in my suggestion. Of course I can think of other methods, such as specifying the compiler implementation as an option at runtime. Is this what you’re thinking?

Please forgive my reticence for withholding details on the implementation, but the work is as-yet unpublished. Currently this work has a stand-alone implementation targeting C/C++ code generation, and I foresee it being implemented on top of MLIR after the research after it has been published. I am simply making preparations for this to be possible when the time is right.

Thank you for your understanding.

John

Exactly, this can be as simple as introducing a new pass (–sparsification2 ;-)) or adding conditionals in the code, as illustrated below:

if (tensor_type.getEncoding().dyn_cast_or_null<SparseTensorEncodingAttr>())
  case1
else if (tensor_type.getEncoding().dyn_cast_or_null<SparseTensorEncodingAttrByJohn>())
  case2

Fair enough. There is obviously tension between my desire to define a very open and reusable ecosystem for sparse tensors and the academic requirement to publish. No hard feelings, I have been there (long time ago ;-)). I hope you can publish your results soon, so we can continue the discussion on how to integrate your novel ideas into MLIR’s sparse tensor type!

Thanks to the solid foundation laid by Stella, Nicolas, Alex, and others, I have been making some nice progress with a Python interface to sparse tensor code generation. For example, one only has to loop over all possible sparse annotations for A and B (CSR, CSC, DCSR, etc.) using something like this

A = RankedTensorType.get([32, 32], f64, ANNOT1) 
B = RankedTensorType.get([32, 32], f64, ANNOT2)
C = RankedTensorType.get([32, 32], f64)
...
C[D.m, D.n] += A[D.m, D.k] * B[D.k, D.n]

to generate and run sparse code with just “the push of a button”. Such a setup is extremely useful for stress testing sparse code generation (since all versions should yield the same output) as well as searching the state space for the best performing kernel implementation (by simply picking the best performing one).

2 Likes