Open Meeting 1/14: Dialect Conversion and Type Conversion: the question of cast operations

We don’t have a presentation tomorrow, instead we’ll discuss the topic of partial lowering, type conversion, and the role of “cast” operations.
Below are some thoughts to prime the discussion, but I’d love to hear about everyone’s experience!

Composability of the lowering strategies in MLIR involves partial lowering, where different parts of a program are lowered through different paths in stages. The lowering of operations composes naturally and does not pose any particular friction. However type conversions are more challenging and require materializing explicit conversions in the IR to cast back-and-forth between partially converted types.

For example, look at this piece of IR:

%0 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%1 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%2 = "toy.mul"(%0, %1) : (!toy<"array<?x?xf32>">, !toy<"array<?x?xf32>">) -> (!toy<"array<?x?xf32>">)

Starting from this IR which is operating on toy array type, a partial lowering going through a hypothetical tensor type could look like:

%0 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%1 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%2 = "std.mulf"(%0, %1) : (!toy<"array<?x?xf32>">, !toy<"array<?x?xf32>">) -> (tensor<?x?xf32>)

This would break the verifier for the std.mulf operation which expects to operate on tensor but has toy array input. This requires, in general, the introduction of some sort of cast operation to fix the type system during partial lowering.

%0 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%1 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%0_cast = cast(%0) : (!toy<"array<?x?xf32>">) -> tensor<?x?xf32>
%1_cast = cast(%1) : (!toy<"array<?x?xf32>">) -> tensor<?x?xf32>
%2 = "std.mulf"(%0_cast, %1_cast) : (tensor<?x?xf32>, tensor<?x?xf32>) -> (tensor<?x?xf32>)

This kind of cast can be seen as a “promise” that the two sides of the cast will ultimately converge to the same type after all partial conversions have finished, and then the cast can be eliminated. Such an operation could be interpreted opaquely in the most conservative way (all possible effects, escaping pointers, etc.).

In the current state, the only such operation is llvm.mlir.cast, which allows for casting between an LLVM type and a builtin type in both directions. This operation isn’t helpful, however, when types other than the builtins are involved.

The TypeConverter class exposes hooks that can be extended by users to materialize such conversions. Unfortunately, the configuration of these hooks isn’t very convenient or easy to set up in a coherent way in a multi-pass lowering pipeline at the moment. (Note that this issue can be separable, but is still related to the discussion)

It also does not compose necessarily well when a single SSA value involves multiple types, like a container: !my.list<!other.type> where either the container type or the inner type (or both!) can require a conversion.

Additionally, we want this conversion to be configured on a per-module basis, so that multi-module MLIR modules (such as heterogeneous compilation) work cleanly.

So far, the lack of a general opaque cast means that users must provide these in an ad-hoc fashion. This is cumbersome, in some situations the cast operation is only required because of the way that dialect conversion works, and it isn’t always clear which dialect to add such ops. For example the llvm.mlir.cast operation only handles builtin types and any other dialect can’t directly reuse it. This does have the advantage, though, of providing immediate verification; as each of the ad-hoc user-provided conversion operations can enforce that the conversion happens only between two supported and “compatible” types. A generic cast would “delay” any lowering errors to the end of the pipeline, when the lack of convergence leaves out some unresolved cast.

Finally one last issue may be one of arity where 1->N or N->N conversions are required, however this isn’t a common case and the conversion framework already has limited support for making such cases Just Work.

Below is a summarization of some of the issues related to type conversions and cast operations, that have been encountered by real users:

  • The cast operation is often only required by the nature of dialect conversion.
    • In some situations, a cast operation is only temporarily necessary during a single conversion pass and isn’t intended to outlive the pass. Unfortunately, this still requires that users define a custom cast operation just for this step, creating a situation only necessary because they used the conversion infra.
  • TypeConverters are currently configured individually by each lowering pass, meaning that conversions and materializations are not shared.
    • This prevents partial lowering from being a viable lowering strategy in many cases, because operands/block arguments/etc. can’t be properly converted.
  • It isn’t clear where each cast operation should live.
    • The location in which the cast operation is placed is often decided arbitrarily. In some cases, it is added to the dialect that can accept a dependency on both input and output types. In other cases, it is placed in the dialect that is “less important”, i.e. the dialect that is “more willing” to take an additional operation/dependency.
  • Cast operations today either have low reusability or just “accept the world”.
    • llvm.mlir.cast operation takes a slightly principled approach in that it only handles builtin types. The cost of this is that the cast is only viable in certain scenarios. Some get around this, several dialects simply define “any_to_any” casts to satisfy the constraints of the framework.
  • It seems useful to have a concept of a cast-like op that “needs to be eliminated” by matching/folding with other cast ops but doesn’t have an independent executable semantics.
    • For example, folding cast_a_to_b(cast_b_to_a(x)) -> x would be the only valid way to lower those ops. This pairing of cast ops with other ops that undo them is always naturally satisfied when using the dialect conversion infrastructure.
    • See discussion in: :gear: D91967 [mlir][std] Mark tensor_to_memref as no side-effect
    • Having a unified cast op formalizes these requirements.

Another previous discussion on Discourse can be found in [RFC] Dialect type cast op.

1 Like

Great discussion today! There are definite advantages to having to define a cast operations (as @clattner has pointed out in the past) but they are an annoying, boilerplate pain.

I’ve got a specific proposal for the generic ‘cast’ operation. (I mentioned it in the call, but now that I’ve had a chance to think about it…) TL;DR: have a way to make them really easy to create while maintaining the advantages of having to define them as separate ops.

// C++-ish sketch.

template <typename From, typename To>
class UnrealizedTypeConversionOp : Op<...>;

auto toyConversionOp = UnrealizedTypeConversionOp<toy::ArrayType, TensorType>::build(...);
auto intToFloatCastOp = UnrealizedTypeConversionOp<IntegerType, FloatType>::build(...);

OpBuilder b;
Value tensorValue = b.create<UnrealizedTypeConversionOp<toy::ArrayType, TensorType>>(loc, toyArrayValue);

Said Op template would have folders which recognize the inverse operation. One could add validation functions (for checking type parameters) to particular template instantiations.

class ShapeConversionOp : UnrealizedTypeConversionOp<toy::ArrayType, TensorType> {
  LogicalResult verify(toy::ArrayType arr, TensorType tensor) { // verify dims are compatible.
}
OpBuilder b;
Value tensorValue = b.create<ShapeConversionOp>(loc, toyArrayValue);

Or

UnrealizedTypeConversionOp<toy::ArrayType, TensorType>::addVerifier(
    [](toy::ArrayType arr, TensorType tensor) {  // verify dims are compatible.});
// addVerifier is a static function. Not sure I like this way since it applies globally, but it is less code.

By having an op which can be defined in-line with conversion/lowering code, we significantly lower the friction. The advantage of using a C++ template is you get TypeSwitchs and all that magic so you could customize behaviors with the existing infra rather then new code which has to check types dynamically.

ConversionTarget t;
t.addIllegalOp<UnrealizedTypeConversionOp<toy::ArrayType, TensorType>>();
struct RealizeCast : public OpConversionPattern<UnrealizedTypeConversionOp<IntegerType, FloatType>> {
public:
  using OpConversionPattern::OpConversionPattern;

  LogicalResult
  matchAndRewrite(UnrealizedTypeConversionOp<IntegerType, FloatType>, ArrayRef<Value> operands,
                  ConversionPatternRewriter &rewriter) const final;
};

We could add a method to TypeConverter to register patterns to be run after the UnrealizedTypeConversion folders are run to realize those which are left.

TypeConverter conv;
conv.addRealizer([](OpBuilder &builder, 
      UnrealizedTypeConversionOp<toy::ArrayType, TensorType> op,
      Value input, Location loc) -> Value { ... });

I think this would end up saving IR creation/deletion by delaying the conversion until MLIR has established that it’s necessary. Delaying this would also avoid having to create potentially complex folders to detect the type conversion ops and their inverses which have already been realized during the type conversion, though I don’t have enough experience to know how common this case is in practice.

I realize this is more typing (like keypresses) than a generic ‘not-really-cast’ operator, but worth it IMO. Is this feasible? The right direction? Poorly explained?

I think you’re onto something, but there is one issue: what dialect does UnrealizedTypeConversionOp<toy::ArrayType, TensorType> belong to, and what is its OperationName?

Recall that FooOp classes are just “wrappers” / “views” around an underlying Operation. Usually (such as for ODS-generated Op’s), there is a 1:1 relationship between the mydialect::FooOp and Operation's with mydialect.foo` as their OperationName. But that doesn’t need to be the case.

For example, ConstantIndexOp is a view (+ associated builders/helpers) around an std.constant. It’s just a random C++ class though. For example, if you have an OpConversionPattern<ConstantIndexOp>, it will still match all Operation’s that have OperationName==“std.constant”. We can certainly define a custom classof that would make dyn_cast / TypeSwitch work, and, with a suitable extension, could be respected by the dialect conversion infrastructure (though I don’t know if that’s a desirable extension @River707).

There is also the problem of how to print an UnrealizedTypeConversionOp<toy::ArrayType, TensorType> (which again comes back to what the underlying Operation looks like).

One can imagine a representation like:

"dialect_conversion.unrealized_type_conversion"(%from) {
  from = !toy.array<2x2xf32>, to = tensor<2x2xf32>
} : !toy.array<2x2xf32> ->  tensor<2x2xf32>

Printing’s easy. Parsing is the tricky part since we’d need some way to “instantiate” a template from runtime input. I think the trick would be have all the template instantiations register themselves with a central parser which could generate an identifier and associate that identifier with the necessary information about how to construct it.

This would be easier to implement if they were all in the same dialect, to answer your first question.

Wordy!

Actually, if they all belonged to the same dialect (dialect_conversion in this example) I think we could infer the type based on the type signature alone in the common (custom type) case: dialect_conversion.unrealized_type_conversion %value : !toy.array<2x2xf32> -> tensor<2x2xf32>.

All the types written in the ODS typedef have a getMnemonic static function. (E.g. static ::llvm::StringRef getMnemonic() { return "array"; }.) In the UnrealizedTypeConversion parsing registration, we could use that method to get the necessary keywords to have the following syntax in the common or pretty case:

%tensorValue = dialect_conversion.unrealized_type_conversion %arrayValue : !toy.array<2x2xf32> ->  tensor<2x2xf32>

I didn’t realize this was string based. I guess that’s to support opaque ops (ops which aren’t linked in to a particular binary)…?

Actually, I think it’s easier than that. Just run the type parsers as per usual, then use the type cast metadata from the parsed type instances to look up the template instance to build. I think this would work for all cases.

There’s something about what you’re saying (like “instantiate” a template from runtime input) that makes me feel like we’re not on the same page about how ops are registered/parsed/printed and the distinction between Operation and Op. There are two things:

  1. The Operation (which is keyed off an OperationName), and how it is printed and parsed. There is exactly one OperationName per AbstractOperation registered for each Dialect.
  2. The FooOp classes in C++ that provide sugared views for C++ API’s into the Operation, builders, etc.
    a. Note: we currently lump printing/parsing strings <-> Operation’s into these classes, but that is not essential, it’s just convenient to template metaprogram this with the FooOp classes; it all boils down to a bunch of function pointers in AbstractOperation that are completely independent of the FooOp class. These are properties of the Operation that are independent of whether you are using Python, C++, Swift, or raw C API’s to access them – they are about the isomorphism between the Operation/Region/Block data structures and the textual representation.

These are completely separate. So there are two things to define for this op.

  1. What this looks like at the Operation level. I think the dialect_conversion.unrealized_type_conversion op I presented seems plausible. Specifically, this op has a TypeAttr to and a TypeAttr from which must match the operand and result types. It parses the two types and sets the attributes appropraitely. Also, there is a single dialect_conversion.unrealized_type_conversion OperationName registered in the dialect_conversion dialect. (We could put it in any other dialect though). The pretty IR form would just be unrealized_type_conversion %from : !toy.array<2x2xf32> -> tensor<2x2xf32> and the parser would set the attributes properly.
  2. The UnrealizedTypeConversionOp<T, U> class (which all inherit from the ODS-generated UnrealizedTypeConversionBaseOp with no template parameters), which wraps the underlying unrealized_type_conversion op, and makes isa/dyn_cast be sensitive to T and U.

I’m not familiar with the details of how things are implemented.

I was aware of this. What I was assuming is that one needed to parse and return a “sugared view” in order to have the neat C++ selection stuff which is keyed off those views. Like

conv.addRealizer([](OpBuilder &builder, 
      UnrealizedTypeConversionOp<toy::ArrayType, TensorType> op,
      Value input, Location loc) -> Value { ... });

would run that lambda on UnrealizedTypeConversionOp<toy::ArrayType, TensorType> operations only if those ops were parsed into the UnrealizedTypeConversionOp<toy::ArrayType, TensorType> sugared view. Now that I’m thinking about it, the sugared view is a value type pointing to the Operation so that type information isn’t stored there. facepalm

Now I think we’re on the same page. This is much easier than I thought.

Yeah, what you’re saying sounds reasonable. I would like to infer the to/from attributes when parsing and elide them during printing. (I was thinking you were proposing an assembly format, thus my comment.) Do we even need the attributes?

When you say “sensitive to T and U”, you mean not match if they’re different? The other way I interpreted that the time I read it was probably wrong. If so, this seems like a reasonable approach.

Another approach is not to have a base class at all and do everything in the template – each template instance would be a completely different, separate Op so you wouldn’t have to worry about isa/dyn_cast as it would just fall out, I think.

I don’t know enough about the internals of MLIR to reason about which is easier and what the implications of each are. I was imaging the approach I outlined since it would look like separate ops (I think) and thus have zero bizarre interactions with the internals of MLIR. But I’m really not familiar with MLIR innards.

I feel the discussion is getting side-tracked: making it easy to add a new cast op isn’t the hard part in my opinion. You can get there with attributes and achieve the same effect, for example if you have this op:

%to = cast(%from) { from : !some.type, to : !another.type } : (!some.type) -> !another.type

with a verifier that checks that the type signature matches the attribute, what’s missing from the C++ template generated op?

So when you’re there what do you get? How are you actually plumbing this in the framework? “Some piece of code” has to create these and has to interpret/update them all along the pipeline.

I’m sorry I missed the talk due to a conflict, I look forward to the video.

I agree with all of the problems you laid out. More fuel for the fire:

This also causes library dependency issues. In the FIRRTL/RTL world for example, I have some casts put into the “less important” FIRRTL dialect. However, they depend on types in the RTL dialect, which makes the FIRRTL dialect libraries depend on the RTL dialect library because of types. This is really bad – one dialect shouldn’t depend on another dialect just because there exists a lowering between the two of them.

The fix for this is to define a third dialect (not ok for just casts) or drop verification of the constraints (which is the least bad thing).

Yes, I now agree that a unified generic cast op (e.g. in the std or builtin dialect) is the way to go. I think the key here is to name it something explicit like “std.partial_lowering_cast” or something like that, not something appealing to “reuse” like std.cast.

I would recommend having a fold impl that turns (cast (cast x to T1) to T2) into cast(x to T2) unconditionally, and do no verification of the source/dest types. If someone wants something more specific, they can define their own ops.

-Chris

1 Like

Yes, this is a big motivation as to why I’m pro as well. I’ve seen way too many users end up in these situations and I cringe every time. I tried not to be too negative when writing that bullet, but I kept having flashbacks of code reviews adding awkward cast operations.

I have a moderate preference to not put this in the standard dialect. I think putting it there hurts reusability given that you then have to take a dependence on the standard dialect. Whether we should put it in the builtin dialect or not, I don’t know, but given how widely usable and applicable this operation is I’d like to make it as easily accessible as possible. For example, I would expect many users to just use this operation when using dialect conversion(I actually wouldn’t mind being able to use it as a default with Dialect Conversion in some cases).

+1 to this point. I’m not really sure what the templates are supposed to get you, or how they help solve the problems at hand.

– River

If we go with the “partial_conversion_cast” approach of an opaque op that just folds away (and is never expected to be “lowered”), then the template parameter sugar Op isn’t useful. All it would do is encourage folks to start assigning “special semantics”. E.g. “in my compilation flow CastOp<toy::ArrayType, TensorType> has these executable semantics that I’ll lower to” – we don’t want that.

(I was mainly helping @jdd formulate their idea more precisely)

Also, partial_lowering_cast %x : !my.T -> !my.T (that is, same source and dest type) will fold away. Without that one, we don’t have a way to ultimately eliminate partial_lowering_cast ops.

Since it seems that most are in favor, I’ll draft up a revision where we can discuss the exact description/semantics/folding/etc.

2 Likes

They get you the ability to specify legality via ConversionTarget without resorting to a dynamic legality which checks the operation’s types, if you want to specify specific casts which can persist. E.g.:

  ConversionTarget tgt(*ctxt);
  tgt.addIllegalOp<PartialLoweringCast<toy::ArrayType, TensorType>>();

Instead of

  ConversionTarget tgt(*ctxt);
  tgt.addDynamicallyLegalOp<PartialLoweringCast>([](PartialLowerinCast op) {
    return op.getType().isa<TensorType>() && op.input().getType().isa<toy::ArrayType>();
  }

They also give you the ability to use OpConversionPattern for cases where lowering the cast itself is well defined. This could enable less IR to be generated/erased since the PartialLoweringCast folder would eliminate the redundant casts and you only lower the necessary casts. It sounds like everyone thinks this is a bad idea to allow/encourage instead of people defining custom ops to do the same. I don’t have enough experience to have an informed opinion here.

I just sent out two revisions to use as a starting point for review. We can further refine semantics/descriptions there:

CastOpInterface: ⚙ D94831 [mlir] Add an interface for Cast-Like operations
PartialConversionCastOp: ⚙ D94832 [mlir] Add a new builtin `partial_conversion_cast` operation

Thanks
– River

1 Like

+1 I agree. Thanks all!