Exposing LLVM ops to the MLIR type system

A recurrent design point keeps on surfacing when trying to expose LLVM ops to MLIR types.

Most recently, this is exposed by:

  1. https://reviews.llvm.org/D92172 for sve.vscale : index -> llvm.vscale.i64() : i64
  2. https://reviews.llvm.org/D92216 for std.inline_asm -> llvm.inline_asm.

The general comment is that we should not bring LLVM representations to the non LLVM parts of MLIR without a deeper design discussion, otherwise this just becomes a copy of the LLVM dialect with the MLIR type system. This point is very valid and I would love if we could have a generic way of improving the situation here.

Now in practice, few people work on both LLVM and “something else” (SPIR-V, LLO, …) and being able to generalize abstractions such as “inline asm” to different targets requires practice and iteration. This only comes with building end-to-end systems that run: it is very easy to overlook deep fundamental issues that only arise when the SW rubber meets the HW road.

How can we proceed here to make the most efficient use of resources, I see a few options:

  1. add a) a matching abstraction to an MLIR dialect operating on MLIR types (e.g. std, vector or target-specific HW), b) a lowering to the LLVM representation, c) ensure the abstractions runs end-to-end. Then over time, build experience and refactor, rinse / repeat: this is an evolutionary process.
  2. have LLVM dialect ops also accept MLIR and have a generic conversion pattern to convert the types.
  3. have a dedicated LLVM “staging” dialect for ops that are 1-1 with LLVM and that we expect to generalize and graduate to higher-level dialects over time.

I have been operating on the assumption that 1. is good for velocity but it obviously comes with issues (copy-pasta of abstractions, bloat, YAGNI generalization, …).

Option 2. may be limited by circular dependencies between dialects and likely limits the early attempts at generalizing semantics. I.e. if one wants to generalize semantics, one first creates a new op so it’s almost back to option 1.

Option 3. seems to only shift the problem from Option 1. to a more self-contained dialect (which may still be viewed as a win).

Do people see other options?

Generally, how can we improve the “just get it connected for now” aspect and improve velocity?
I strongly believe we do not need to solve all generalization problems at the same time to make progress and that iteration speed is of the utmost importance: learning needs a gradient.

Thanks!

@River707 @ftynse @joker-eph @stellaraccident

What about the option of inserting casts back and forth that just fold away during lowering? Or one could effectively have a “island” op with region that does that, that one would merge and finally get nuked if there is no conversions needed.

“Matching” also doesn’t mean exact. If it is exact and only “equivalent” types need to be managed, then that seems like what casts do. It’s not really a new concept being modelled, it is a type conversion.

I’m not sure I follow this: if you have, say, addi in LLVM dialect also accept i32, then how would you introduce circular dependencies? Now if you want it to work on sparse tensors or some complicated data structures, then yes, but for those you’d have to lower them first to types & ops where there is a 1:1 mapping to LLVM types. So the dependency is on the lowering rather than on dialect, and would be part of pass to lower dialect to LLVM and I don’t see how it causes a circular dependency. If you generalize then yes you create a new op (as it is generalized :slight_smile:) and you’d change the lowering. But the old op still remains. Only works for types where we have exact 1:1 mapping though and would require relaxing dialect to take types that aren’t valid to be exported directly (as we do for TFLite too). It does mean it serves two purposes rather than just an export mapping but the conversion target can be used there still.

In the CIRCT world we have a somewhat analogous situation with SystemVerilog, the universal IR that all things must speak to (but which is also pretty yucky to deal with). While it is still very much a work in progress, we’re trending towards having an “sv” dialect that directly models system verilog to model it.

This raises the exact same question you raise, which is “how do you get it and how do we factor code that targets it”. There are a couple of things that are emerging:

  1. We have an RTL dialect, which serves a similar purpose as the standard dialect for imperative logic.
  2. We have a lowering pass that converts to the SV dialect. It is hard coded right now, but I expect it to be eventually parameterized through op interfaces (getting passed a structured type like the AsmPrinter hooks), to allow dialects to be independently defined but register their lowering to the SV dialect.

I think this pattern would work just as well for the LLVM IR world.

-Chris

Not to hijack the discussion, but I hope that we can leverage dialect hierarchy too, rather than forcing every dialect to lower to SV through such a hook mechanism. Maybe this is analagous to what people have suggested for the MLIR standard dialect: breaking it into more orthogonal pieces that all live at more or less the same level of abstraction. These ‘standard dialects’ would tend to implement similar interfaces for lowering to LLVM, in the same way that ‘RTL dialects’ would implement interfaces to lower to HDLs. This allows for higher level dialects to lower to LLVM/HDL through standard/RTL dialects.

I think we can answer this question, at least partially, if we clarify the goal of the LLVM IR dialect itself. If it intends to model LLVM IR inside MLIR as truthfully as possible, than it should not accept anything but LLVM IR dialect types and have strictly the same semantics. If it is more of an abstraction that is similar to LLVM IR but intended only as a convenient lowering vehicle, we can allow for more flexibility.

If we go with the second option, it may be considered at odds with MLIR’s design philosophy that encourages all non-trivial processing to happen within MLIR so that translations can be trivial. This can be solved by having the translation accept only a trivially translatable subset of the LLVM IR dialect, e.g. it could be fine for the LLVM IR dialect to work on a subset of standard types as long as the translator doesn’t need to care this type conversion. In practice, we already have a “legalize for translation” pass, which inserts extra blocks if a predecessor of some block appears more than once (this cannot be expressed with PHI nodes otherwise), that can accommodate such preparatory conversions.

I have been consistently opposed to having LLVM IR dialect diverge from LLVM IR. My main reason is being able to keep the same semantics as LLVM IR as much as possible rather than redefine it again in MLIR. Otherwise, we would have to keep track of semantic changes in LLVM IR (and also intrinsics that are included in the dialect), in the built-in types and check/update it every time one of the two changes. Imagine if MLIR decided to remove signless integer semantics. Or, more realistically, how should we treat add nsw nuw %0, %1 : index if we (a) don’t know the size of the index type and (b) don’t have poison semantics for it in MLIR. That being said, I am not fundamentally opposed to such relaxation as I expect both built-in types and LLVM IR to be stable enough, as long as there is some way of checking that the semantics are still okay that doesn’t involve the dialect maintainer checking it over and over again.

In the meantime, since you mention development velocity, I would consider Jacques’ idea of having casts that we already have as llvm.mlir.cast and/or type-cast-region operation and see what additional challenges does this pose. Today, the cast operation lives in the LLVM IR dialect. Arguably, it’s okay for it to depend on built-in types (“standard” types are not part of the standard dialect). It would not be okay for it to depend on the standard-to-llvm conversion, but that can be solved by factoring out the type conversion (which has nothing to do with the standard dialect) into a separate library. I can expect dependency issues in, e.g., canonicalization if we need to handle the llvm.vscale wrapped by llvm.mlir.casts in the SVE dialect canonicalization pattern, which would create a dependency from the SVE dialect to the LLVM IR dialect. This sounds like a problem that is not specific to the LLVM IR dialect but to the infrastructure in general. Some other potential problems?

@jpienaar’s suggestion to use type casts appears to be the “thinnest” solution to me while maximizing op reuse and allowing mixing of dialects. A successful dialect/type conversion would turn the higher to lower type casts into identity ops (making them disappear), and the lower to higher type casts would always get erased. If it’s not a partial conversion that’s being implemented, the type casts could be added to the illegal target set I think. The islands approach (used for the MLIR TF dialects) is more complex in comparison and may hurt debugging experience whenever the lowering fails.

Why can’t we just create llvm.vscale.i64 / llvm.inline_asm at the point where we are creating this IR? Why do we need to go through an intermediate dialect?

Because I think at that point you just have MLIR standard types and not LLVM dialect types. The LLVM dialect ops are restricted to the latter types.

It’s a little ugly, I think that functionally we only need a handful of cast ops to bridge the two. Maybe with a pseudo op that provides some sugar around this it wouldn’t be too bad. Or we can formalize something like Alex’s suggestion of a “trivially translatable subset” to make this easier. I think if we can soothe the ergonomics around basic integer, floating point, and vector types (which seem “trivially translatable”) I think the pain will be greatly reduced.

Even with that, one thorny issue is sve.vscale : index -> llvm.vscale.i64() : i64 which potentially requires committing to index==i64, which seems to make the translation not “trivial”. Not sure how to resolve that.

Thanks for your replies.

It seems that for ops that are meant to remain low-level such as the llvm.inline.asm, just using the LLVM form with casts is fine so let’s isolate that class of ops that do not need special MLIR behavior (i.e. OpInterface or special verification (e.g. on types).

There remain at least 2 classes of ops I can see:

  1. ops that need to be understood by other parts of the codegen stack (e.g. vscale)
  2. ops that benefit from better semantics at the MLIR-level than just MLIR type or attribute -> LLVMType

Generally, I am skeptical that we can define and use the op “at a distance” via only defining the LLVM instruction operating on LLVM types + extra casts. It seems to invite immediate issues with at least OpInterfaces: what do you attach it on, does it need to “traverse the type system”?

One thing I am wondering is whether we can have a single ODS op in the LLVM dialect that can support both MLIR-y ODS and LLVM-y ODS. To make it concrete, does it seem reasonable to turn these 2 ops:

def LLVM_vector_scale :
  LLVMArmSVE_NonSVEIntrUnaryOverloadedOp<"vscale">;

def VScaleOp : SVE_Op<"vscale", [NoSideEffect, VectorParametricTileOpInterface]> {
  ...
  let arguments = (ins);
  let results = (outs Index:$res);
  let assemblyFormat =
}

into a single op that would have both the MLIR and LLVM specification by literally merging the 2 together.
This would only be for types/attributes that convert trivially (I would put index in that box: given a target the index is determined and the conversion can do the right thing) so that conversion from MLIR form to LLVM form can be autogenerated. Such a mechanism could already carry ops that are almost 1-1 and that we don’t want to leak into other dialects. The price/generalization effort does not seem too steep.

I’ll assume the above is reasonable, now how about this case: can we just merge these 2 ops into a single op?

// pasted for context
//class LLVMNeon_IntrBinaryOverloadedOp<string mnemonic, list<OpTrait> traits = []> :
//  LLVM_IntrOpBase</*Dialect dialect=*/LLVMNeon_Dialect,
//                  /*string opName=*/mnemonic,
//                  /*string enumName=*/"aarch64_neon_" # !subst(".", "_", mnemonic),
//                  /*list<int> overloadedResults=*/[0],
//                  /*list<int> overloadedOperands=*/[], // defined by result overload
//                  /*list<OpTrait> traits=*/traits,
//                  /*int numResults=*/1>;
//*/

def LLVM_aarch64_neon_smull :
  LLVMNeon_IntrBinaryOverloadedOp<"smull">, Arguments<(ins LLVM_Type, LLVM_Type)>;

def SMullOp : Neon_Op<"smull", [NoSideEffect,
  AllTypesMatch<["a", "b"]>,
  TypesMatchWith<
    "res has same vector shape and element bitwidth scaled by 2 as a",
    "a", "res", "$_self.cast<VectorType>().scaleElementBitwidth(2)">]> {
  let summary = "smull roundscale op";
  let description = [{...}];
  // Supports either:
  //   (vector<8xi8>, vector<8xi8>) -> (vector<8xi16>)
  //   (vector<4xi16>, vector<4xi16>) -> (vector<4xi32>)
  //   (vector<2xi32>, vector<2xi32>) -> (vector<2xi64>)
  let arguments = (ins VectorOfLengthAndType<[8, 4, 2], [I8, I16, I32]>:$a,
                       VectorOfLengthAndType<[8, 4, 2], [I8, I16, I32]>:$b);
  let results = (outs VectorOfLengthAndType<[8, 4, 2], [I16, I32, I64]>:$res);
  let assemblyFormat =
    "$a `,` $b attr-dict `:` type($a) `to` type($res)";
}

If the answer is no and we really need 2 ops then we might as well also have them in 2 dialects.

If it seems reasonable to have a single op with all these attributes then the problem of auto-generating kicks in: the LLVM_aarch64_neon_smull can be pretty easily generated, but not the SMullOp part.
We can think of an extra level of Tablegen indirection so that we can specify these things better but it seems we would still have to specify quite a few things manually. At which point is it still worth it vs not ?

To sum up, in the 3 examples above, I’d lean towards:

  1. inline_asm, just use casts
  2. vscale-like 1-1 ops with need for OpInterfaces, extend the tablegen infra
  3. more advanced use cases: don’t see a clear way to achieve it and I’m really not willing trade the nice MLIR type error messages for runtime assertion errors :stuck_out_tongue:

I would consider vscale as kind of special, since that is only 1 or 2 ops.

I think for ops like smull where there are hundreds/thousands of such ops, we want to establish some conventions / single-point-of-truth for having LLVM_aarch64_neon_smull and a nicer wrapper SMullOp op. For npcomp, when we generate the ATen dialect (analogous to your SMullOp), we also add various op interfaces to it.

For the smull case, I think what we want is a single source of truth that generates:

  1. A raw LLVM wrapper op.
  2. An “mlir native” version of the op (possibly with certain domain-specific interfaces added as part of the generation process).

So it would still be 2 ops and 2 dialects instead of a single llvm.smull that can take both MLIR and LLVM types and adapt accordingly?

In that case, the “single source of truth” has to specify everything I wrote in def SMullOp and only saves typing this:

def LLVM_aarch64_neon_smull :
  LLVMNeon_IntrBinaryOverloadedOp<"smull">, Arguments<(ins LLVM_Type, LLVM_Type)>;

Additionally, note that all these ops can be autogenerated from LLVM’s definition (the actual source of truth), which seems preferable to me.

We can add a little more tablegen sugar to reduce the number of characters one has to type but the tradeoff seems to just be more indirection via tablegen.

Is it worth it?

A more involved (but more automatic approach) could be to just embrace and reverse engineer:
AArch64InstrInfo.td and AArch64InstrFormats.td and fully generate the 2 type of MLIR ops ?

If we are fine with very incremental progress (e.g. extended on a per-need basis), then this could be a workable solution?

I cannot follow what you mean here. There seems to be a use case you are referring to, and I have no knowledge of. Where do the interfaces come in? Could you elaborate?

Nothing precludes LLVM dialect ops from implementing interfaces. FWIW, we can actually have an interface that can let you “see through” the llvm.mlir.casts feeding the op to get the actual values if necessary. This interface can be attached to LLVM dialect ops for which this behavior is desirable.

Fundamentally, I don’t see a problem of having an LLVM op surrounded by casts. There are practical problems with dependencies and potential tripling of the IR size because of forward/backward casts.

I don’t understand what do you mean here either. LLVM dialect definition uses normal MLIR ODS, just makes it a bit less verbose by adding several helper classes. The only difference is the llvmBuilder field (the logic handling overloaded intrinsics is necessary to set this field up), which the regular -gen-op-definitions safely ignores. Do you want to generate two C++ op classes per definition in LLVM dialect ODS? This requires a different backend, and small changes to the ODS itself. Or do you want to generate a different ODS from the LVLM dialect ODS? This requires a complete new backend, but not changes to ODS.

Do you want separate ops definable with less pain or do you want one op that takes different types, after all?

Why aren’t you doing this to start with? The intrinsic generator is available and we should just plug it into the build system. It can be improved to, e.g., generate more specific type constraints and/or ops that operate on standard types.

I wouldn’t go as far. The relation between target instructions and intrinsics is not clear to me and, what’s worse, out of our control. So unless MLIR targets instructions directly, I would much rather build ops on intrinsics and let LLVM deal with the correspondence between the two.

I don’t have a concrete use case yet but I expect that vector ops exposed to the MLIR type system will grow interfaces to talk to vector transforms or lowerings. In the case of vscale, I can see it becoming an op interface that we would use to vectorize to scalable vectors and that would have additional properties (e.g. non-negative index).

I know we can do it, but should we?
Even without going to OpInterface but just thinking about traits, this seems to break locality properties e.g. of type checking, location info / error messages.
Do you foresee this to be generally usable and desirable?

I mean what I wrote: “To make it concrete, does it seem reasonable to turn these 2 ops into a single op that would have both the MLIR and LLVM specification by literally merging the 2 together.”

To spell it out more, for the vscale case, and assuming we were to bring it to the LLVM dialect (pardon the non-actual Tablegen compliance, add pounds of sugar as you see fit):

def VScaleOp : 
  // Generates relevant LLVM-related properties (e.g. llvmBuilder), only 
  // "used" if any of the types is an LLVMType.
  LLVM_Spec<[SomeLLVMTableGen<"vscale">]>,
  // Generates relevant MLIR-related properties, only "used" if any of the
  // types is a non-LLVMType.
  MLIR_Spec<[
    SomeMLIRTablegen<"vscale", [NoSideEffect, VectorParametricTileOpInterface]>]> {
    ...
    let arguments = (ins);
    let results = (outs Index:$res);
    let assemblyFormat = ...
  }

The op could then be used either as:

%1 = llvm.vscale() : index
// or 
%42 = llvm.vscale() : // llvm.i32 or llvm.i64 depending on target

with the proper parser, printer, verifier and traits/operands being “activated” depending on the type of some operand/result (of course all have to be consistent LLVM or non-LLVM form).

Unclear, depends on tradeoffs and usage.

For ops one would traditionally mirror from LLVM to std, it could be nice to have a single op with different types. This would allow using new LLVM ops without having to argue whether they really need to exist in std or its upcoming spawns. Tentatively, I would say we could put the “trig” or “shift” ops in that category. I am unclear whether it is worth building the infra for this now, if vscale is compelling enough as a first use case that’s good, otherwise it will slip again to the next “justify this op in std” instance.

For backend-specific ops that match 1-1 with intrinsics, I am leaning towards the simple route first, build usage and then think of refactoring when we have compelling evidence that it improves the system.

The part it currently automates is 2 lines, didn’t feel like adding extra complexity to a PR that is already large enough.

That could work, the tradeoff is still about hiding more stuff behind more tablegen indirection. I’m not looking to drive innovation on this topic, if you have ideas how to capture the SMullOp definition more elegantly, I’ll happily use it.

LLVM dialect ops are also ops, they can implement interfaces and be processed by passes. We are not doing much with them right now because, historically, LLVM dialect is a lowering sink and we expect LLVM to do the job, but there is no fundamental reason why we should not.

Now, if you want additional semantics restrictions on an op that is not present in the LLVM instruction/intrinsic semantics, I would argue you most likely want a new op. The cost of having a different op with similar-but-different semantics + conversion that may reject some ops is arguably lower than having LLVM-like ops with semantics “same as LLVM, but also supports X, in which case the difference is Y” and having to handle them.

I still don’t see a problem. Each op checks their types – the “payload” op checks, e.g., that types match, and the casts check that the source and target types are connected by a conversion. They already do.

I don’t have a use case for the see-through interface, but I do expect more casts to show up as we make the conversion more progressive.

vscale is a special case to me because it’s tied to the vector type. As implemented in LLVM IR, the op makes sense. We don’t have scalable vectors as a built-in type in MLIR. If we add scalable vectors as a type, it totally justifies also adding operations to work with them, in std, vector or whatever other dialect. The equivalent operation would need to support and evolve with the rest of MLIR. Maybe we’ll want multiple scaling factors because we have nD vectors. Maybe we’ll want vector.vscale to work on tensor<*x v ector<...> and return tensor<* x index>. We should not push additional MLIR-specific stuff onto an LLVM dialect op. It’s the same reason as why we have addf instead of reusing LLVM’s fadd.

Do you mean add, addf/fadd, br and other “core” ops?

I wouldn’t expect many new LLVM IR instructions in the foreseeable future.

Here’s my counter-proposal. The built-in type compatibility feature seems necessary for ops that mirror target-specific intrinsics in LLVM IR. The goal, as I understand it, is to be able to emit those intrinsics early in the pipeline, before the rest converted to the LLVM dialect. In this case, define a new dialect for such target-specific intrinsics that uses LLVM dialect types as well as built-in types. Such ops should go to a new dialect anyway (like NVVM, ROCDL and LLVM_AVX512). This removes most of the issue for me, where what has been proposed is overhauling half of standard and almost all of the existing LLVM dialect, by restricting this into a new dialect that the rest of standard and LLVM need not to care about.

Well, if you have 2 ops, it’s probably not worth investing into infra.

With another angle: could we unify more some of the builtin type with the LLVM dialect?

While LLVM has some specific construct (struct/union/array/…) on the other hand floats, integer, and likely vectors aren’t different.
We use to have a hermetic LLVM dialect on the type system because we were re-using the actual LLVM type and wrapping a LLVMContext. This is no longer the case and I wonder if we could just use the builtin float/integer/vector types in the LLVM dialect?
Would that solve most of the use cases @nicolasvasilache is looking for?

If we s/vector/1-d vector and unified the types then I think it would simplify a bunch of things indeed!

Good point on the historical design restriction! I don’t see a reason for maintaining llvm.i32 vs i32. Using the builtin MLIR vector type for all fixed-size vectors in LLVM also seems palatable, and could assuage a large part of the concerns here.

Well, I actually discussed this exact point in the RFC on reimplementing LLVM dialect types. I am still of the opinion that trying to use builtin types in the LLVM dialect creates significantly more complexity than it removes. Here are some examples:

  • MLIR integers can be signed, unsigned and signless; LLVM IR integers are singless only. This is a strict superset and can be handled by the verifier. Well, op verifier because we don’t have type verifiers.
  • LLVM IR supports floating point types that MLIR does not - x86_fp80, ppc_fp128, etc. So we will end up having llvm.fadd %0, %1 : f32 and llvm.fadd %0, %1 : !llvm.x86_fp80 and needing to remember if we should use a builtin type or a dialect type for a specific bitwidth. This starts getting awkward.
  • MLIR has nD vectors but no scaling, LLVM IR has 1D vectors and scaling. We will have to mix LLVM dialect vectors and MLIR vectors + additional restrictions on the latter, unless somebody is willing to propose scalable vectors as a built-in type. This is getting worse.
  • LLVM types can contain other types. Now, we have !llvm.struct<(i32, float, x86_fp80)> and all the inner types are LLVM dialect types. If, instead, i32 was an builtin type, we would need extra logic in many places where container types are used, and in the types themselves, to differentiate whether a builtin or a dialect type is used. Types also don’t have verifiers, so almost any construction of a container type will need to be guarded against nullptr result (even if we could create a wrapper LLVMOrBuiltInIntegerType, signedness semantics is not encoded in the C++ type and LLVM container types would reject non-signless types by producing nulls).
  • The syntax for containers will explode, it currently uses the fact that nested types are implicitly !llvm.
  • The translation to LLVM IR will have to handle both kinds of types, and we explicitly wanted the translation to be simple. (We could have a big “legalization before export” pass that changes types to LLVM, but that would mean we also keep all LLVM types). This was actually the entire point of creating the LLVM dialect rather than translating Standard directly, if you want a historical perspective.

I am open to considering builtin type values in a dialect based on intrinsics, which mostly use simple types, as suggested above, but not to overhauling the type system.

FWIW, I think we should just extend the MLIR builtin types to cover these, it is inevitable that someone will care about these some day. Alternatively, just say that LLVM dialect doesn’t support them at all until someone cares and use that as the forcing function.

-Chris