Remove canonicalizer for memref.dim via ShapedTypeOpInterface

stellaraccident · June 12, 2021, 6:09pm

I would like to remove the canonicalizer DimOfShapedTypeOpInterface from the memref.dim op, finding a place to relocate this to as part of a pass. In conjunction with the overall problems with the current placement of the memref.dim op, I don’t think this this qualifies as a canonicalization, acting more as a lowering that should be optional to apply (and controllable as to when).

Between the various moves and refactors, I haven’t done enough archaeology to determine when this was added and for whom.

What this pattern does:

If the producer does not implement InferShapedTypeOpInterface or it is not for a constant index, fail.
Invokes the op’s reifyReturnTypeShapes to produce a tensor<?xindex> based on the op’s implementation of a shape transfer function (which can produce arbitrary IR).
Emit a tensor.extract_element to get the single dimension of interest.

The presence of this as a canonicalization pattern was recently surprising to me because we had correctly placed memref.dim ops in our pipeline, which at a point, had producers that were high level ops implementing InferShapedTypeOpInterface. Upon canonicalizing, what was a simple memref.dim was expanded into a non-trivial sequence of shape.shape_of, other shape dialect ops operating on index tensors, and tensor.extract_element. There are auxiliary passes in the shape dialect for simplifying some of these things, but I found cases where they duel with this canonicalization in non-obvious ways dependent on phase ordering.

Our lowering pipeline actually rarely needs to resolve shape transfer functions in this way, exercising a direct lowering to shape-carrying low level ops for dynamic tensor-ops that have a representation. The facility is still useful for custom ops and other things, but we would want to use it explicitly in a controlled fashion vs treating as an automatic canonicalization.

I also just feel that this specific pattern has a smell to it of not actually being a canonicalization but a lowering (i.e. it completely changes the level of abstraction, expands to ops from multiple dialects, does not clearly produce a form that I would identify as “canonical”, etc).

Without knowing how this is used, I don’t have a strong opinion on where it should go instead (presumably it is load bearing for someone). Possibly in ShapeToStandard (advantage: as part of dialect conversion, it could only be converted if the overall pattern application produces a legal lowering) or possibly in ShapeToShapeLowering.

I don’t want to break anyone’s flows and will work to accommodate, but as far as I can see, there are no in-tree live uses that depend on this and if no one speaks up, I think we should delete it outright. It will be necessary to untangle this in order to rehome/split the memref.dim op anyway, which should not be carrying this baggage, imo.

A few people who I know have worked in this area and may have context: @herhut @dfki-jugr @_sean_silva

Thanks - and sorry if this is already part of a plan somewhere or has an implication I’m not seeing. Feedback welcome.

MaheshRavishankar · June 13, 2021, 6:12pm

I am mostly the culprit on this pattern, but I totally agree that this shouldn’t be a canonicalization and must be moved into its own pass. The pattern itself has value and happy to work with you on finding a good place for it. A combination of different use cases/concerns were incorporated in the current implementation, and it probably resulted in suboptimal design.

So there is possibly an issue with InferShapedTypeOpInterface itself. It has two methods refiyReturnTypeShapes and reifyReturnTypeShapePerResultDim. The former is the more generic use case which works with unranked types as well. This is the one that returns a tensor<...x index> and then the shape of each dimension is obtained using tensor.extract_element. The latter is less general and is meant for ranked types. This will just return just a list of SSA values, one for each dimension. So no tensor<..x index> or tensor.extract_element. I encountered this when I wanted to add some shape inference interface to Lianlg ops, but maybe these two uses have been conflated.

Having a shape interface just for Linalg when there exists a general shape inference is not great
When you have ranked shapes, introducing a sequence of tensor.from_element and tensor.extract seemed unnecessary.

Coming to the pattern itself, the pattern resolves memref.dim that involves result of an operation in terms of shape of that operations operands. This is necessary during pattern rewrite in dynamic shape context. It is quite often the case that the operation is replaced or is dead and the only uses are in memref.dim operation. If resolved in terms of shape of its operands, they become dead once this is done.

We should move it out of canonicalization into its own pass. I added it into canonicalization cause this really needs to be a fixed point iteration to remove all operations that are transitively kept just for shape information. But this can be done by just using the pattern rewriter mechanism. Moving this into its own pass will hopefully solve most of the issues you ran into. I can help with where to use this pass in IREE.

mehdi_amini · June 13, 2021, 6:34pm

That what it looks like to me as well.

stellaraccident · June 13, 2021, 6:34pm

MaheshRavishankar:

So there is possibly an issue with InferShapedTypeOpInterface itself. It has two methods refiyReturnTypeShapes and reifyReturnTypeShapePerResultDim . The former is the more generic use case which works with unranked types as well. This is the one that returns a tensor<...x index> and then the shape of each dimension is obtained using tensor.extract_element . The latter is less general and is meant for ranked types. This will just return just a list of SSA values, one for each dimension. So no tensor<..x index> or tensor.extract_element . I encountered this when I wanted to add some shape inference interface to Lianlg ops, but maybe these two uses have been conflated.

Having a shape interface just for Linalg when there exists a general shape inference is not great

When you have ranked shapes, introducing a sequence of tensor.from_element and tensor.extract seemed unnecessary.

Ah, thanks for the reminder - this is related to that. I wouldn’t necessarily phrase this as a Linalg-vs-not thing, but that there may be needs at multiple levels. Having worked this on both the frontend and codegen level, it is pretty clear to me that there are two distinct cases, each with their own needs:

An ability to reify a generalized shape transfer expression from arbitrary ShapedTypes to a 1D vector of dimensions (currently tensor<?xindex>). This is a common thing to encode and use at frontend levels. reifyReturnTypeShapes does this.
At levels where the ops are defined in terms of ranked operands with symbolic associations between unknown dimensions in the inputs and outputs, the ability to access those associations. reifyReturnTypeShapesPerResultDim does this. Linalg operates at this level, but I don’t believe it is an exclusive property of Linalg (quite a few representations restrict themselves in this way).

With that framing, I think the spelling could be improved, but I don’t have a disagreement with the distinction that is being made. The issue is that the use of either of those facilities needs to be somewhat purposeful and not automatic as part of canonicalization.

Can we just carry this pass/pattern in IREE for the time being? I believe there is an upstream gap here, but I don’t think we are far from being able to enunciate a more unified/comprehensive proposal for reasoning about these things, and it would be better to upstream as part of a full story vs in pieces.

MaheshRavishankar · June 14, 2021, 4:20pm

Dont see a reason to move this into IREE. The pattern by itself is sound. It just shouldnt run as a canonicalization (my thought here was that having something use the result dimension is not the canonical representation. Instead the IR should always use the operands of the defining op, but clearly that was an overkill). We can just make this a pass that allows you to control when you run this pass.
If really needed, we could also have an option that controls when the pass uses the reifyReturnTypeShapes interface and when it uses the reifyReturnTypeShapesPerResultDim interface (dont think it is necessary though).
I’d be more than happy to take this off your plate (since its really my doing), but it will be a day or so before I flush my queue enough to get to this.

_sean_silva · June 14, 2021, 4:26pm

In npcomp we definitely like the cleanup that moves dim ops from results to operands. It’s not strictly needed because presumably at the backend (e.g. IREE) level these will get cleaned up, but it helps to make sure that npcomp is producing minimal and clean IR (even just for visual inspection / debugging – e.g. making sure that an error guard folds away as expected). We only need it for ranked shapes though (i.e. the reifyReturnTypeShapesPerResultDim variant).

I agree that the interface which expands to quasi-arbitrary sorts of ops (especially opinion-taking ops like representation of shapes as tensor<...xindex> – we never want that in npcomp) is highly undesirable and so a good first step is to remove that upstream.

Also, it seems like this did cross paths with something Tobias was doing in https://reviews.llvm.org/D103132 It sounds like he was somewhat relying on that.

I would be ok adding his canonicalization (which really is a canonicalization) of moving dim(linalg outs(%0), %dim) -> dim(%0, %dim), as that covers 99% of npcomp’s needs. Any subgraph composed of dim ops, LinalgOp’s and linalg.init_tensor trivially folds away to just SSA values for the individual dimension sizes under a fixed-point iteration of that pattern. This may be sufficient for IREE as well (I’m curious what use case does not work with just that pattern).

MaheshRavishankar · June 14, 2021, 4:51pm

The DimOfShapedTypeOpInterface pattern supercedes this. The particular pattern you are talking about might be enough for all Structured ops. Basically this redundant information (now with outs existing). Take a Matmul operation, where LHS is MxN, RHS is NxK and output is MxN. The DimOfShapedTypeOpInterface pattern gets the shapes from the ins if possible, and from outs if not. The pattern you are talking about gets it from the outs. Maybe on balance just that is enough for Structured ops. It is also cheaper. Getting it from the inputs does remove some index computations. If you are taking subtensors/subviews of both the inputs and outputs, with the DimOfShapedTypeOpInterface pattern, the shape computation used to compute the tile sizes for the inputs is reused to get the tile sizes of the outputs as well. So you get lesser shape related instructions. With the pattern that you mentioned that wont happen. So it seems like what you mentioned can be a canonicalization, and the DimOfShapedTypeOpInterface could just be in a separate cleanup pass, that further resolves the shape computation.
Caveat is that this doesnt work for all ops that implement the reifyReturnTypeShapesPerRsultDim variant of the interface. Some of those ops (like linalg.pad_tensor and linalg.init_tensor, linalg.collapse_*shape/linalg.expand_*shape) dont have an outs tensor.

jpienaar · June 14, 2021, 5:10pm

Moving to pass and out of canonicalization SGTM.

(and thanks for nice explanation Mahesh)

herhut · June 14, 2021, 5:13pm

First of all, I agree this pattern should be removed, it is not a canonicalization.

In our code generation flows (like for example kernel generator) we have a dedicated pass that does this kind of transformation (albeit using shape.shape_of as it is early in the pipeline). We made it a dedicated pass, as the rewrite is only useful if it enables other optimizations.

MaheshRavishankar · June 14, 2021, 5:15pm

I hope you meant must be removed as a canonicalization.

_sean_silva · June 14, 2021, 5:33pm

Thanks Mahesh! In that case, my only request is that we retain the ability to “maximally clean up / bypass shapes from result to operands” at the linalg-on-tensor level (LinalgOp, subtensor, pad, init_tensor, etc.) upstream.

MaheshRavishankar · June 15, 2021, 9:10pm

This patch (⚙ D104321 [mlir] Move `memref.dim` canonicalization using `InferShapedTypeOpInterface` to a separate pass.) moves the pattern into its own pass.

The pass above should have the same functionality. You need to add that to your pipeline (I am in the process of adding this to IREE). Let me know if you see any other issues.

nicolasvasilache · July 8, 2021, 11:24am

Cross-referencing this question: Rank-reducing memref.subview OffsetSizeAndStrideOpInterface interface issues - #10 by nicolasvasilache

Topic		Replies	Views
`tensor_to_memref` and its folding MLIR	23	1169	April 6, 2021
[RFC] Remove "bare pointer calling convention" MLIR	2	447	June 24, 2022
Better layering for `InferShapedTypeOpInterface` MLIR	18	663	July 16, 2021
memref.CopyOp for same size but different shape MLIR	1	202	May 9, 2022
[RFC] Changing interface of tensor/memref.reshape op MLIR	2	266	December 21, 2022

Remove canonicalizer for memref.dim via ShapedTypeOpInterface

Related Topics