RFC: Join/meet type interface

Following the discussion at RFC: Allow shape dynamicism in SCF, here is a proposal to introduce a join/meet type interface: ⚙ D101859 [MLIR] Introduce join/meet type interface..

The description of the interface should cover the important aspects. It covers in particular how the interface supports extensibility. In short,::mlir::join checks both ty1.join(ty2) and ty2.join(t1). Type1 is allowed to ignore Type2 or not implement the interface. (Or the opposite.) If both types handle each other, commutativity is asserted. The Optional<Type> return value allows differentiating between “ignored other type”, and “null join”.

I could not see a way to absolutely enforce associativity across different implementations of the interface (join(x, join(y, z)) against join(join(x, y), z)). While it can be broken intentionally, I think it has to be the responsibility of the developers, and don’t expect it to be an issue.

I used the order “less than or equal”, leading to join corresponding to “broaden”, and meet corresponding to “narrow”. Any preference to have it the other way ?

We can discuss it as an RFC here. If and when we agree on the broad lines, I can update the patch for review.

Cheers!

1 Like

Really excited to see this! I will be needing this in npcomp!

Your implementation is very elegant too.

That is what we use in the dataflow analysis framework so that seems like a good choice for consistency.

What isn’t clear to me is how you would actually go about using this interface. How is this supposed to be used in analyses/transformations/etc?

I suppose this is mostly just useful for constructing control flow IR that has different types along the different edges.

Maybe you mean the purpose of the interface rather than how to use it ?
My intended two first uses of this would indeed be relaxation of the control-flow interfaces (or for custom interfaces), and type inference across control-flow.
This may be limited. But the main benefit of having an interface is users can extend it for their custom types. So is could implement TI for SCF, and it would still work with user types.

Generally both. The purpose of an interface should generally be informed by how it is intended to be used.

I think this interface would only be a very small piece of that. I haven’t seen any consideration for how you would actually make the transformations surrounding the control flow safe, i.e. you can’t just forward values that have different types something has to make that safe, and the type generally can’t do that for you and neither can the control flow operation/dialect. As it is, this interface would not be enough to actually satisfy either of your use cases (at least not without adding a lot of extra checks and limitations on the current control flow usages). I suppose I just want to understand the entire picture of how support is intended to play out, because I would want a compelling use case before landing anything in tree.

– River

I think this interface would only be a very small piece of that.

Yes. Let me give more details.

Maybe I introduced confusion by discussing both join/meet and potential
applications, and was a bit too bold in my earlier comment !

I’ll try to clarify my approach.

For background, I have been working on type inference for our dialect, in
particular with control-flow. While playing with SCF, I noticed it enforces type
equality along control-flow edges.
Some dialects may want to accept dynamic or unranked shapes, so type equality
may be an excessive constraint for them. Hence my original post, asking whether
we wanted to relax the contraint. Answers explained we likely wanted to keep
type equality in SCF; fine with me.

A couple of points came out of this:

  1. Types join/meet seem to be useful more generally.
  2. We might want to consider updating the control-flow interfaces to allow
    different constraints.

My previous comment was too bold:

My intended two first uses of this would indeed be relaxation of the
control-flow interfaces (or for custom interfaces), and type inference across
control-flow.

By “type inference across control-flow”, I was referring to the problem of type
inference across control-flow for our particular use-case (more below). (I did
ask @_sean_silva if there were any plans to have a generic type inference pass,
but I understand this is far out of scope here!)

Regarding the relaxation of control-flow interfaces, I should have said I want
to look at it. I simply don’t know what can be done about it.

At this point, I am not trying to address those here. Instead it seems to me
join/meet functions are valuable on their own, and may be small step toward 2.

I have found multiple times the need to implement some form of join or meet, or
some functions that would be significantly simplified by the availability of
join/meet.
That is at a high-level for the implementation of type inference with dataflow
analysis. Or at the op level, for example to refine output type/shape for ops
with SameOperandsAndResultType/Shape. In that case operating on Type instead
of shapes is nice, because we can simply use meet to refine the type(s), and
detect an issue when getting a null type; no need to dive into the details of
the types.
Note that today there are no join/meet functions for shapes available either. A
static helper was implemented in TensorOps.cpp.
From the discussion, @_sean_silva seemed to also think it would be valuable,
hence this proposal and patch.

I haven’t seen any consideration for how you would actually make the
transformations surrounding the control flow safe, i.e. you can’t just forward
values that have different types something has to make that safe, and the type
generally can’t do that for you and neither can the control flow
operation/dialect.

I am unclear about what you mean, but will try to comment.
As I noted I have not considered what would be needed to change the control-flow
interfaces. For now, join/meet seemed valuable enough independently.
What is legal will depend on the user, so changes would require allowing
some level of configuration.
To make it clear again, I am not advocating we just change the control-flow
interfaces or SCF to just use some constraints based on join/meet.

In my particular case, the join/meet functions would just be a tool used as part
of the type inference implementation, along with the dataflow analysis. It seems
to me it will give me what I need.
Of course the analysis could discover a broken program when a join is null. But
that would similar to having mismatching types with a type equality constraint.

To conclude I think we should see if we can agree on whether join/meet make
sense on their own, or if we want to discuss any plans further. Opinions ?

I think it could make sense for the interface to provide a function that can materialize a “static info cast”
op to cast between a type and its join/meet with another type (I already have a couple of them). Note that all such casts are simply casting away / adding static information, and imply no runtime code, so a subtype-aware RegionBranchOpInterface/BranchOpInterface can still have trivial dataflow transfer functions along edges. One exception to “no runtime code” is at ABI boundaries where there may be a calling convention difference (e.g. a tensor with one unknown dimension might be passed differently from one with two unknown dimensions if the size of unknown dimensions are passed as individual args: 1 vs 2 additional args), but that’s a lowering problem, not a program semantics problem.

I think the main thing is that these interfaces have a fairly precise formalism behind them, so we’re pretty sure this is a good building block. These interfaces also allow us a number of interesting future introspection use cases. One example is that we can define ODS type predicates as simply providing a “bounding type” for an operand/result which all valid types must be a subtype of. (a list of bounding types is possible too in some circumstances). This is a compact representation of the type verification, and allows trivial introspection (e.g. to know if an operation allows a particular operand/result to be refined).

I would be happy to push on this more in npcomp before merging this into core if desired. (It’s just really hard to work on these things out of tree, since I can’t out-of-tree implement this interface on the tensor/memref types, so I would be limited to types I define in npcomp, which might be enough to be motivating though).

The type itself is generally incapable of determining which cast should be used for it, e.g. if I have a mistmatched builtin tensor type, what cast should I insert? Well it very much depends on the IR, and there isn’t always an easy way to know which dialect should be inserting casts, e.g. it could be npcomp/TensorFlow/etc.

The potential use cases you mention here are more compelling, or at least more immediately satisfiable, than changing the type contract related to control flow. I’m fine with building this in tree, especially if we could say remove some of the special “compatible shape” related constraints in OpBase.td for something more general.

I agree. An interface for me only makes sense if it is going to be utilized for some analysis/transformation/verification/etc. I am sensitive to adding something just because it seems like something at some point might be able to use it. Many times this ends up with tech debt, have finished interfaces, etc. that end up needing to be redone when the user does materialize (with that cleanup often being quite tedious). This is especially applicable for anything being out in IR/ which has a direct impact on everyone in some way or another. Not that this applies to the interface as described here, this is a general statement on past experience and the reason why I have been pushing you on how it is going to be used.

I appreciate the clarification on your goal, and the other usages you describe (type inference, verification, etc.) definitely make sense to me. So from that perspective I’m +1 on having something like this, from the control flow transformation perspective there are many more nuances there outside of just merging type information (at least, from my experience of implementing a lot of upstream transformations dealing with this).

– River

I think the implementer of JoinMeetTypeInterface generally would know which cast to insert. In the case of a nun-null join/meet, it implies that the implementor of JoinMeetTypeInterface knows about the semantics of both types and knows that one is a subtype of the other, and so it seems fitting that they would know which “static info cast” is needed.

And dependency-wise, the the dialect containing the type with JoinMeetTypeInterface will always be able to define the necessary op since there is already a dependency on both types.

If necessary, in the coming months I will be literally implementing this interface for modeling in npcomp and we can smoke out any issues there and then upstream it. I don’t know about @arames 's timeline for getting this submitted, but it should happen naturally at that point at least (it is a little awkward having no in-tree users in core though).

How does that work with TensorType? MemRefType? or any other type that may be used by different dialects?

– River

TensorType and MemRefType are kind of weird because they are in the builtin dialect (tensor.cast and memref.cast can probably move there). Otherwise, the implementor of JoinMeetTypeInterface for the non-Tensor/MemRefType will have the op available, since they by definition depend on TensorType/MemRefType if they are dyn_casting it as part of the JoinMeetTypeInterface implementation. (note: JoinMeetTypeInterface allows either side of the join/meet to implement the interface).

For TensorType, tensor.cast does the necessary casts. For MemRef, memref.cast does (though I think there’s a bug: the documentation says it has “abort if illegal” runtime behavior, but it is marked NoSideEffect so I suspect there are really two separate ops needed here (and it is just a bug that they are combined into one), one of which is the one needed for JoinMeetTypeInterface casting).

It seems reasonable to me that “the ops which know the details about which type instances are subtypes of other type instances for a given Type subclass” should be in the same dialect as that Type subclass.

The fact that they are weird is what complicates this though, because we don’t enforce a dependency on the tensor/memref dialect to be able to use those types. Unless there is a reasonable solution to these edge cases, we can’t get around having a different interface that has to be used for these transformations. I would love to be able to have a single thing that can reconcile these things, and remove the existing special cases e.g. during inlining, but I can’t (yet) see this interface doing that alone.

– River

Wouldn’t moving tensor.cast and an appropriately cleaned up memref.cast to builtin dialect be sufficient? That seems doable even as part of this initial patch, if needed. Based on @arames 's description of their use case, it seems like they would be able to use those new methods immediately as part of what they are doing.

I am somewhat very strongly against adding those to the builtin dialect. Seems like the opposite direction that dependencies should flow. The builtin dialect is something that everyone depends on, and a cast there specifically for tensors/memrefs doesn’t make sense to me.

– River

Our timeline is quite soon, next month or 2, but we have a fork of LLVM, we would like to see it in upstream and have a “good” solution from MLIR core. Our purpose of upstreaming is mostly to have experts across industry agree on how to solve a common problem. But we will continue to make forward progress in our fork.

Unfortunately most of my uses likely will be out-of-tree. A few may be useful
upstream, but would require some cleaning moving from user code to be a generic
solution.

If possible, I would like to make our type-inference happen via upstream.
It would hopefully benefit others, and I would rather use an upstream
solution, so I don’t have to maintain it out of tree !

My plan was to work on our type inference immediately after this join/meet work
(with other work in parallel).

While the general direction seems clear, it is obvious I still miss pieces
(update to control-flow interfaces ? stuff I have not encountered, e.g. ops not
supporting type refinement ?). My feeling is we really need to start tackling
the problem to get a clearer picture.

I am happy to start on my end, and share my progress with upstream. But maybe
others (@_sean_silva ?) also plan to work on this and would progress faster.

1 Like

I agree, but also arguably the tensor type should be in the tensor dialect. I’m just saying that it’s the natural place for them given their current positioning. For example, we would consider it very strange if the op that casts from !mydialect.NoneType to !mydialect.optional<T> is not in mydialect. And it’s probably not a huge deal in practice (consider that all users are already paying for the tensor type itself in the builtin type parser, tensor literal parser, … it’s probably not a very large additional code size / complexity cost in the grand scheme of things).

If the type inference pass can be upstreamed as an example user, that might be quite compelling!

I probably won’t get to this for at least another month.

Ok. I’ll get started on it then.
I’ll start by focusing on a simple use-case (ours) as a first step.
When I have something concrete, I’ll create a thread to discuss sensitive areas (e.g. control-flow interfaces) or aspects that we want to sort out together.

Following all this I am unclear on whether we want to land join/meet now, or wait. @River707 , @_sean_silva , can you confirm ?