PSA: run -reconcile-unrealized-casts after all -convert-*-to-llvm from now on

I have refactored most of the -to-llvm conversions to be independent of each other. In particular, conversions no longer subsume patterns from other conversions and can be run in an arbitrary order. Such partial conversions result in unrealized_conversion_casts being inserted by the infra. After you are done converting everything to the LLVM dialect, run -reconcile-unrealized-casts to get rid of the casts that became noop. (Previously, this was done, sometimes incorrectly, by the -convert-std-to-llvm) If this pass fails, this likely means some operations were not converted to the LLVM dialect by previous conversions.

3 Likes

I wish this was either discussed before the revision on this was committed or the revision stayed for at least a few hours. I notice the revision was posted and LGTM’d in 20 minutes and then landed in another 20 minutes. (I happened to comment in that window though but looks like that might have been missed.) This change is adding a -reconcile-unrealized-casts pass to the end of every codegen pipeline going through LLVM and touching 122 files (although many are test cases.) I really feel a revision like this should at least wait for a few hours for comments if not previously discussed here.

I’ve anyway gone ahead and posted more comments there. In summary:

  1. From the looks of it, a local folding hook can achieve the same and you really don’t need a new pass with a rewrite pattern nor the need to change any existing pass or pass pipeline (since the folding hook always and transparently runs as part of every rewrite driver API method).

  2. Even if a pass is needed: one wouldn’t like to see ops like these leak out in the first place wherever and if that’s possible. In general, if for some reason ordering is an issue, the rewrite driver can be called once again at the end inside of a pass after clearing out all patterns and with just a single pattern in it. (You already have the populate method for this pattern exposed.) This should be considered as an alternative to going around adding a new “cleanup” pass to the end of every pipeline.

This assumes that the pass pipeline is always running the canonicalize pass or similar, but that is sometimes undesirable because it can do a LOT of optimizations / changes, which can make -O0 modes to e.g. debug miscompiles not really avoid the miscompile. Having a targeted pass that just does this one transformation (which is needed from a functional correctness standpoint and isn’t really an ‘optimization’) avoids that.

1 Like

The folding hook is actually invoked by the greedy pattern rewriter independently of the canonicalize pass. But even then, I don’t think we need to oppose the “have a dedicated pass available in case it is needed” with “a folding hook make it work out-of-the-box for most pipelines”.

1 Like

Obviously, I don’t feel that way. In general, LLVM policy allows for post-commit review and I will address those as usual. The code was actually reviewed before commit. The change looks large, but most of it is in fact git grep -l -convert-to-llvm | xargs sed -i -e s/convert-to-llvm/convert-to-llvm -reconcile-unrealized-casts/.

You are welcome to try making the hook work with -convert-std-to-llvm :wink: The change description contains a useful hint: unrealized_conversion_cast are produced by patterns or by materializations and must be immediately legalized. There are the following possibilities:

  • unrealized_conevrsion_cast is legal – no folding happens, the op persists;
  • unrealized_conversion_cast is illegal – it cannot be folded in all cases, because the folding applies to the second cast in the chain and, in most cases actually, the newly injected cast is the first in the chain; the op cannot be legalized or folded away, the conversion fails immediately;
  • unrealized_conversion_cast is dynamically legal based on whether it can be folded or not – this duplicates the logic and requires a complex legality condition and, worst of all, the condition likely needs to be different depending on whether the cast is created by materialization or by a pattern itself because it may be using as operand the original unconverted value in the first case and the converted value in the second case.

Now, I haven’t found the way to set that legality check to work in all cases. Even if I had, it would have been a temporary (see below), partial and suboptimal solution. It would not solve two of the main issues it current solves: clearly indicating that something was not converted to LLVM and inadvertently changing operand types. The previous version would produce an obscure error about trying to recursively apply the pattern if there were uneliminatable casts. And it could remove the cast from standard type to the equivalent LLVM type even if the consumer of the op did not accept LLVM, it was not looking on pairs of ops. There are also cases of transformations where a mix of patterns is applied, so the legality condition would have to be replicated and adapted for those.

These ops are the natural effect of progressive lowering. The only way to avoid seeing them at the end of the pass is to make sure the resulting IR only uses the target type system. In practice, this means having one hugely monolithic “convert-everything-possible-to-llvm” pass, which is extremely undesirable from the layering, composability and maintenance perspectives. And will still need the casts in the case of out-of-tree dialects converting to LLVM.

You are welcome to run the cleanup pattern in a separate pattern application in your passes if this fits your use case. If we want to be able to lower progressively and in arbitrary order, every pass would have to run the cleanup, unless we somehow integrate it with the conversion infrastructure itself.

Finally, let me remind that the standard dialect is slowly going away and, naturally, so is the “standard-to-llvm” conversion. Adding anything new to it, or requiring users to depend on “standard-to-llvm” even if they don’t perform the actual conversion is counter-productive in this light.

2 Likes

Conversion uses the conversion driver, not the greedy one. It also calls the folding hooks, but that is insufficient given the order of op creation and how the folding can be specified.

I feel that if the need for a PSA was felt then it should have been sent before landing the change.

I also find the current state is a bit unfortunate: I want a conversion to happen (and a conversion that claims to be std to llvm say) but for that I have to somehow know to add a pass that seems completely unrelated (which I think is main reason for this PSA as it would have been difficult to understand why the breakage is occuring as these things). Looking back at when op was added I see: “is generally only intended to be
used to satisfy the temporary intermixing of type systems during the conversion
of one type system to another.” - but this change raises it to outside the conversions for me, it makes it something whose lifetime exceeds a type conversion.

1 Like

I can understand this argument, but past practice was exactly the inverse.

Let me emphasize the part of “during the conversion of one type system to another”. Dialects are not equivalent to type systems so, as long as we have conversions between dialects, the casts will appear as results of those conversions. This pass does not make the op exceed a type conversion, it actually marks the end of said conversion. If we want to never see the casts, the conversion pass must be able to handle all ops that can possibly operate in the original type system, as explained above.

I agree that it might not be obvious that this pass needs to be run, but I argue that it is even less obvious that one needs to run convert-std-to-llvm for the same reason, and do so even if they have no standard operations in the input IR and to convert builtin types! This is also something fixable with documentation, but I haven’t yet got the time to push it out given that I have to repeat the justification stated in the commit message.

⚙ D109605 [mlir] Update docs on conversion and translation to LLVM documents the overall conversion process.

1 Like

Some food for thought: this is an instance of Combining analysis, combining optimizations in disguise.

Looking a bit deeper at the evolution of dialects and conversions, one would realize that unrealized_conversion_casts op is an abstraction that leaks across pass boundaries. IIRC it used to be a tmp custom op with a special named hidden deep inside the conversion infra (at the time it was hard to print IIRC).

The various conversions can be interpreted are “somewhat artificially split along dialect lines”: convertXtoY instead of convertWorldToY. The reasons for those splits are themselves valid (independent dialects, do not include the world etc).

Still the state of the world is that we have a somewhat leaky abstraction today and IMO a sink cleanup pass is significantly better than adding yet more stuff to each conversion.

Note that we have something that is philosophically very similar with all the bufferization passes and similar questions often pop up each time one wants to pass state across pass or pattern boundaries.

I’d love to see some proposals / thoughts about this more general and recurrent problem and how MLIR can help mitigate it.

1 Like

This might just be a matter of naming/discoverability/framing. I believe that everything @ftynse is doing here is consistent with the principles outlined in Type Conversions the Not So Hard Way (slides, recording). If folks aren’t familiar with the best practices for ecosystem-scalable type conversion, I really recommend watching that talk.

Although it is implemented in a slightly different way mechanically, -reconcile-unrealized-casts is exactly the same as FinalizingBufferizePass. In general, all type conversions that happen over the course of multiple passes (usually for layering/incrementality/scalability reasons) will need something like this, and it results in great diagnostics when things go wrong :slight_smile: (no ops/types “slipping through”, etc.)

1 Like

Good point!
That said, since the unrealized_cast are really a part of the dialect conversion infrastructure, could they be handled more specifically by this infra? Could the dialect conversion framework be taught to recognize them and apply special treatment? I haven’t thought about this in as much details as you do, just throwing superficial ideas here :slight_smile:

We’re a bit off-topic here, but past practices have gone both ways (example: PSA: change in pass registration (please update your APIs!) and PSA: breaking change for C++ custom printer/parser and PSA: Change for FuncOp syntax ), we just don’t have discussed the expectations so it is left up to the judgement of the people involved.
It is always easy to address post-commit, even sometimes just revert for the time being to have more discussion when someone has strong concerns with something.

Thanks for pointing this out! I haven’t pushed my thinking that far. The only immediate comment is I wouldn’t necessarily consider unrealized_conversion_cast a leak here, we intend for it to be in the IR until it is cleaned up. Certainly, the op itself has no meaningful runtime semantics.

This reminds me of the slow-running discussions we have on phase ordering and pass granularity in context of (codegen) transformations. We were previously interested in pass granularity that is smaller than one pass per function/module. Here, we can go into another direction: there is a transformation of “type conversion” that consists of multiple passes but should be conceptually treated as one.

Indeed, I had the FinalizingBufferizePass in mind as a mechanism to declare “the type conversion should be complete”.

And thanks for posting the presentation, the partial+finalizing conversion is also the approach taken here. Maybe we should consider turning those slides into a document / tutorial.

It isn’t part of the infrastructure strictly speaking. We configure the type converter to emit unrealized_conversion_casts - llvm-project/TypeConverter.cpp at 99ea8ac9f1f374d2a124deb911e2c3ca4e5110dd · llvm/llvm-project · GitHub. I believe the infrastructure would just fail otherwise, at least for source and target materialization. I have also seen direct emission of unrealiezd_conversion_cast in conversion patterns in the wild, although I would argue those patterns weren’t doing the conversion right.

If we want to keep things extensible, we probably don’t want to hardcode the cast operation into the infrastructure itself, there may be more “natural” casts that can be inserted by users. What we could do is to provide some sort of finalization clean-up hook that would run after the normal legalization process and after materialization hooks. This starts resembling the staged pattern approach some Linalg passes use - llvm-project/Transforms.cpp at main · llvm/llvm-project · GitHub.

One thing that is unclear to me here is how to indicate the place in the pass pipeline where we are no longer expecting to have unrealized casts. Each pass could have an option for this that gets forwarded to the conversion infra, but this felt more complex and intrusive (out-of-tree passes would need to be adapted too, and pass infra doesn’t know that we run dialect conversion) than just having a pass for that purpose.

Slightly orthogonally, I also toyed with the idea of having “verification passes” in context of verifier/analysis dependency discussions. Such passes are only there to ensure that the IR can be consumed by following stages and cleaner error reporting and, e.g., can be omitted in no-asserts builds. In this case, we could have a “verification pass” that checks if the previous passes actually folded away the casts, but this still makes it a pass.

Please throw more ideas if you have them! :slight_smile:

1 Like

I haven’t read the article Nicolas linked (paywall), but I was thinking about this the other day in the context of lowering MHLO->Linalg. IREE has a linalg extension dialect where we have a few ops that are linalg-like, but don’t quite fit into linalg (yet, maybe :stuck_out_tongue:). We currently lower from MHLO->LinalgExt and MHLO->Linalg in two separate passes. As part of handling detensorization and signedness, we need type conversion and in this case only some of the ops get converted, but we’re left with some casts (e.g. casting an mhlo.constant of tensor<i32> to i32, but the constant is lowered later). So I marked unrealized_conversion_cast legal for this lowering. But unrealized_conversion_cast is also legal in MHLO->Linalg (which I hadn’t realized, actually). I like that the patch under discussion makes explicit something that we were already doing implicitly (for the same reason I was pro-unrealized_conversion_cast in the first place). In general, I find it much easier to reason about unconverted IR than about errors coming from the dialect conversion framework (which I was fighting with quite a bit when setting this up). I think the thing we need is better formalization on exactly how far these casts are allowed to persist and an explicit cleanup pass seems like a step in that direction. This is my recent PR working on the aforementioned example: Extend linalg_ext lowering to handle signedness and more ops by GMNGeoffrey · Pull Request #6987 · google/iree · GitHub

I also have a patch I sent a while ago to add a generic way of creating verification passes, as Alex mentions, but it got stalled ⚙ D96191 [MLIR] Add a generic VerifyFullyConverted pass. It’s also not quite as nice as I would have liked because it can’t be just a pass you create in your pipeline (as I had originally envisioned), as constructing a ConversionTarget requires a context. I will probably take another look at it, but if someone else is interested in picking that up, or has a better idea, then I’d be happy to cede the effort. Actually looking at it now, I see that ConversionTarget only needs a context in order to construct OperationNames. I wonder if it could switch to using strings and then you could construct it without a context (we did something similar with pass pipelines at some point, IIRC). Not sure if that’s worth it.

This is incorrect. You don’t need to run the -canonicalize pass to have the folding hooks to run. Any greedy rewriter call or the conversion driver will call the folding hooks irrespective of whether any canonicalization patterns are added. For eg. even if you call the greedy rewrite driver with zero patterns (empty pattern list) or any of your patterns, folding will still happen for all registered ops.

With dialect conversion, I thought it was only for ops marked illegal. And type conversions use dialect conversion. So typically only a small set of specifically chosen ops would be folded? Am I missing something here?

I really have strong concerns with this change, the thinking here, and the explanations provided. I would just strongly recommending reverting this patch.

I’m of course pro reverting, if someone has significant concerns, but I actually don’t really understand how your concerns here apply differently to the current state vs the previous state. Previously everything had to run the std-to-llvm and now it instead needs to run this pass. In neither case was this happening within each pass. I certainly understand concerns about where unrealized casts should persist and cross pass boundaries, but that was the case before and after this patch (and something we should continue to discuss). Also AFAIU Sean is correct that dialect conversion doesn’t attempt to fold ops that are already legal. Anyway, I’m signing off for the weekend, but just my 2¢

Strong +1 here. I was as well exactly making these points.

This is a really strange argument. If someone wants to convert std to llvm or whatever else to the llvm dialect, it’s what they would run. Requiring an extra pass for what should have been handled as part of the conversion driver naturally and running such a pass isn’t a change in the right direction I fee.

Folding is always considered a good thing to do — as early as possible by design. This is why the greedy rewrite driver from the very start has always applied the folding hook for every op in its inner worklist iteration irrespective of which patterns are in the list or on which ops the supplied patterns are. If the dialect conversion infra isn’t running the folding hook on all ops, that would be inconsistent with the general thinking and one should consider improving that infra and have the folding hook run so that this naturally happens as part of the conversion infra. We shouldn’t need to run an extra cleanup pass for what should have been logically and naturally handled inside of -convert-std-to-llvm or similar passes like @jpienaar and @nicolasvasilache also strongly echo.

Just to make it clear: it is fine that the maintainer here initially assumed that a post-commit review was sufficient – so this is in line with the LLVM policy. However, I think a major change like this one without exploring the right direction first is out of line and I would request a revert here in line with the policy. We shouldn’t be trying to fix this going further in the forward direction.