Linalg to LLVM Lowering

I want to lower Linalg to LLVM without using any affine or scf for loops .May I know how can I achieve this?

It depends if you start from the tensor domain or from the memref domain: the lowering pipeline will be different.

For going from tensor to memref you need to go through what we call “bufferization”, and there are two main strategy in development in MLIR right now.

There are few examples in here: llvm-project/mlir/test/Integration/Dialect/Linalg/CPU at main · llvm/llvm-project · GitHub

Thank You.Is linalg to loops are required to lower linalg.generic op to LLVM dialect?

Because When I tried bin/mlir-opt --linalg-bufferize --std-bufferize --tensor-constant-bufferize --tensor-bufferize --func-bufferize --finalizing-bufferize --buffer-deallocation --convert-linalg-to-llvm --convert-memref-to-llvm --convert-std-to-llvm --print-ir-after-all linalg1.mlir &>1

got
‘linalg.generic’ op expects regions to end with ‘linalg.yield’, found ‘llvm.return’

In principle this is designed to not be required. However in practice, too many pieces are missing to make this usable. These revolve mainly around 2 areas:

  • Only a few named linalg ops lower to proper library calls today. Linalg has the basic mechanisms available for this but coverage has not been a focus of ours. None of the other groups that have spent time implementing their own flavor of ops-to-library-call has contributed back AFAIK.
  • linalg.generic itself does not have enough information for this, what is needed is a rewrite pattern that does the inverse of the Generalization.cpp.

If you have interest in contributing to one of the 2 areas above I am happy to jolt a few starter tasks in that direction.

Hi Nicolas ,
“Only a few named linalg ops lower to proper library calls today” & None of the other groups that have spent time implementing their own flavor of ops-to-library-call has contributed back AFAIK.

As you mentioned some of the ops do seem to work and some groups have done some implementations for other own purposes.

Given said, is there any working sample for reference to start with or any pointer to some private check-in that you might have across, It would help, Basically any op which exercised flows out without any error. I mean that gives a better insight to analyze it.

Yes, we can take a look at Generaliztion.cpp and see how it can help do the reverse.

Cross-pasting from this thread that may also be relevant to you: Linagl VS Affine - #9 by nicolasvasilache

Here is a minimal test that we have to lower linalg to library calls.
Note the very primitive name mangling that produces e.g.:

func private @linalg_matmul_viewsxsxf32_viewsxsxf32_viewsxsxf32(
  memref<?x?xf32, {{.*}}>, memref<?x?xf32, {{.*}}>, memref<?x?xf32, {{.*}}>) 
attributes {llvm.emit_c_interface}

This can then lower to LLVM and be connected to a .so that implements the functionality you want in C++. It used to be a proper end-to-end integration test connected to the runner but we have scaled back from that since the orthogonal steps needed for the runner and linking are captured in this test.

Basically, connecting the 2 pieces above and having a simple stub for matmul written in C++ of dispatching to another library is easy; if you have it difficult I can revive some old code.

Here is the basic pass.

I would love some effort here, the lack of prioritization on this front has been unfortunate.
Here is a related older thread: Lowering optional attributes in Linalg StructuredOps to Standard dialect.

A lot of things are missing to get this polished and to a point where it can scale and we can properly mix codegen + library calls. They are all quite unambiguous but potentially a lot of work depending on the degree of usability and pluggability one is interested in.

If this sounds interesting, I am happy to jolt down a few starter tasks and push this forward.

For this particular issue, doing the reverse is literally solving an inverse problem that we may be better off leaving alone.

The simple alternative I was contemplating is to just compare a given linalg.generic to a database of known named ops. There is some trickiness around making attributes work with this as well as more generally supporting regions but it seems like the low-tech solution here should be quite simple to achieve.

We could also implement the inverse problem with a SAT solver but … meh :slight_smile:

I’m happy to also play with any other prototype or idea that you’d propose.

Hi Nicolas ,
Thanks for the pointers let me look through them.

Hi Nicolas,
This reminds me of the structural and access matchers we developed a long time ago (mlir/TestMatchers.cpp at 29063ca97a937e1ae8abde8655ca18defcdc8682 · LoopTactics/mlir · GitHub). We could assign a matcher to each named op (possibly generated using a declarative interface) and then use the matcher to go from the named op to the generic op. Would this be something worth pursuing?

Adding Alex @ftynse to the loop.

Hi Lorenzo,

I am not completely sure, I would say we want to avoid manual intervention for stuff that should be automated. If you see a way to automate the process then I’d love to see a prototype.

The thinking is that there are 3 different representations that we would like to be able to detect that they are the same:

  1. linalg named op + body + special attributes as specified in opDSL
  2. linalg.generic + body (special attributes have been specialized and folded as constants in the IR)
  3. linalg.generic + an isomorphic body (same computation but potentially different flow of IR)

I forget how far you guys got in the ability to represent things like AnyPermutationOf (e.g. a * b + c <=> c + a * b <=> c + b * a <=> b * a + c) and the various generalizations. If you have a good way to represent this then it would be a great starting point to do the 2. <=> 3. part.

The 1. <=> 2. part is likely more ambiguous and I would punt for now. It is also related to emitting proper LLVM descriptors to pass attribute information to library calls (e.g. convolution descriptor). There is a low-tech manual way of doing it for specific classes of ops (e.g. all convolutions) if this is pressing for anyone.

1 Like

Doesn’t 1 <=> 2 reduce to 2 <=> 3 if one can instantiate a canonical for of the named op as generic op?

Yes, this is why I separated into 3 pieces, we don’t have to solve 1. <=> 2. to make progress.
2. <=> 3. is useful by itself for ops that do not have special attributes.