Greetings all. First time poster, so please - be gentle.
I want to add a sparse tensor representation into MLIR, but I am uncertain the best method to pursue. So I would like to obtain the wisdom of the hive mind. Allow me to explain what I feel are possibilities.
Modify memref with sparse markup
One of my goals is to have the sparse representation work seamlessly with Linalg, etc. just like a memref. For example, when a memref is defined, add an extra markup that would identify a specific dimension as sparse:
%0 = alloc() : memref<16x32sxf32>
Note the “s” after the 32 in the second rank. This would set a flag for the second rank to be sparse. This information would be carried all the way through to code generation where the appropriate structures would be generated and maintained.
Limitations to this idea is that the specific sparse representation cannot be specified. Is this OOC? CSR? CSL? DIA? Who knows. That could be given as a generation option (for example:
mlir-opt -sparse-to-ooc). In any case, this representation seems very limiting to me and not the best option.
Add sparse as a base type
In this case, I could see adding a new base type to MLIR for sparse representation:
%0 = alloc : sparse<16x32xf32>
This representation has the potential to be customizable through parameters to the type. Perhaps an
csr flag can be added to the type. Possible limitations are causing instability across MLIR for such a tightly integrated representation. This may be more work than anyone would like to encounter.
Sparse dialect with parametric type
This comes from the tutorial page Defining Attributes and Types. This method would be similar to the second option presented in this post, and would look something like:
%0 = alloc : !Sparse<16x32xf32>
Not as pretty, but more in line with how MLIR operates. As a separate dialect it shouldn’t interfere with the general operation of MLIR when Sparse is not used. I am concerned with interoperability with other dialects, but I believe any problems here would just be handled within the Sparse dialect specifically. This is the option I am leaning towards.
Call for comment
These are my thoughts. Nothing for me is set in stone, so I am excited to get comments, suggestions, opinions, etc. I want what’s best for the infrastructure, so whatever it takes me to add sparse representation in a way that the majority is comfortable with, I’ll do.
Thank you for reading, now - what do you think?