[RFC] Extend Linalg named operations for arbitrary element types

This is an early-stage RFC whose purpose is to find out whether
support for arbitrary element types in Linalg named operations
is wanted by the community at all (or if the limitation to the currently
supported types is intentional) and to discuss possible directions to
add support for arbitrary types.

Background

Linalg named oeprations operations are currently limited to tensors
and memrefs composed of floating point, integer or index elements
and using any other element type triggers an assertion.

Checks for these types are hardcoded, but thanks to the abstractions
used for the definition and implementation of Linalg named operations, only
few places need to be modified to extend the set of supported element
types. In particular, supporting a new type requires a modification of
the helper methods from RegionBuilderHelper in
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp, which are invoked by the
region builders generated from the abstract YAML representation for
proper creation of appropriate scalar operations.

However, while this might be a viable approach for a small set of
built-in types, modifying Linalg itself does not seem reasonable for
users of MLIR that wish to add support for their own, externally
defined types.

Implementation goals

A better approach would allow users of MLIR to add support new types
without modification of the MLIR code base. Ideally, the
implementation supports:

  • Modularity: allow users of MLIR to add support for their own types
    without modifying Linalg itself.

  • Extensibility: allow for the support of any element type for any
    Linalg named operation, as long as the scalar operations to implement the
    Linalg operation with the element type exist (e.g., if an addition
    and a multiplication operation exist for a scalar type, then it
    should be possible to add support for the type for linalg.matmul).

  • Selective extensibility: Since there is a variety of Linalg named
    operations, supporting a new element type for all of them would
    result in a substantial set of required scalar operations. This set
    might not be available for a given type (e.g., the max operation
    missing for complex values). Therefore, it should be possible to add
    support only for a subset of Linalg named operations for a new type.

Proposed direction for implementation

Add a new type trait for each arithmetic operator required by Linalg
and modify the helper methods of RegionBuilderHelper, such that they
create the type-specific operation via a method of the type trait. For
example, RegionBuilderHelper::arithfn__add would check that the type
implements the Addition type trait and call a method that creates
the appropriate operation for the given operands.

The drawback of this implementation is that it requires a substantial
amount of new traits, which all need to be implemented for the
built-in floating point, integer and index types.

1 Like

Can you clarify if you are mainly discussing support for named operation of if the generic op is currently limited as well?

Thanks Mehdi for the quick feedback. This is about the limitation of Linalg named operations. To my knowledge, linalg.generic does not restrict the element type.

I edited the title and contents of the post for clarification.

1 Like

Thanks for the proposal, it sounds like an interesting OpDSL/named op extension.

Using type interfaces sounds like a viable approach. I would probably implement one type interface that can generate addition, multiplication, etc. and return a failure if a certain operation is not supported. Extending the existing casting logic to support conversion between arbitrary types may be hard but is probably not required.

One thing to consider is that OpDSL currently has two lowering paths. One is used to generate a yaml file that defines the named operations (this lowering path uses RegionBuilderHelper). On the other hand, there is also a lowering path in Python itself that directly generates a generic operation. That means we always need to consider both code paths when doing these changes (llvm-project/emitter.py at 3cf86c36112fd1b059c8aead3d04656c542195ce · llvm/llvm-project · GitHub implements the Python RegionBuilderHelper).

Do the custom types you have in mind always consist of two, as in case of complex, or more built-in types. If this is the case, we may also think about making OpDSL itself extensible in the sense that a user can inject custom types assembled from multiple built-in types. OpDSL could then emit multiple built-in operations for single custom operation. The difficulty is probably accessing the built-in types within a custom type. For example, if an operation takes a tensor of complex values as an input, we need to know how to access real and imaginary parts.

Using type interfaces sounds like a viable approach. I would
probably implement one type interface that can generate addition,
multiplication, etc. and return a failure if a certain operation is
not supported.

This certainly reduces the number of required traits. However, the
downside is that this mixes potentially unrelated operands in a rather
large trait (there are already 10+ operators used by Linalg named
ops), which is also likely to be extended with the arrival of new
named operations. The latter might turn out to be problematic, as this
burdens either the developer contributing the new named operation or
the maintainers of the scalar types with the modification of all types
shipped with MLIR. A default value for each arithmetic operator,
indicating that it is not supported, takes away the pressure of
immediate implementation, but would require an extra state “not
implemented” in addition to “supported” and “not supported” for clear
semantics. However, my feeling is that “not implemented” should be
indicated with the absence of the implementation of a trait rather
than in an additional layer of abstraction.

Extending the existing casting logic to support conversion between
arbitrary types may be hard but is probably not required.

The casting logic to convert the operands of a scalar expression to a
common type attempts to cast into a single direction defined by the
YAML specificain of the operation (e.g., for matmul, it attempts to
cast both operands of the addition to the element type of the output
operand). This might be addressed with a cast trait providing a method
attempting to perform a cast to the desired type and simply failing if
the cast cannot be performed.

One thing to consider is that OpDSL currently has two lowering
paths. One is used to generate a yaml file that defines the named
operations (this lowering path uses RegionBuilderHelper). On the
other hand, there is also a lowering path in Python itself that
directly generates a generic operation. That means we always need to
consider both code paths when doing these changes
(llvm-project/emitter.py at 3cf86c36112fd1b059c8aead3d04656c542195ce
· llvm/llvm-project · GitHub 1 implements the Python
RegionBuilderHelper).

I’ll look into this, thanks for pointing this out!

Do the custom types you have in mind always consist of two, as in
case of complex, or more built-in types. If this is the case, we may
also think about making OpDSL itself extensible in the sense that a
user can inject custom types assembled from multiple built-in
types. OpDSL could then emit multiple built-in operations for single
custom operation. The difficulty is probably accessing the built-in
types within a custom type. For example, if an operation takes a
tensor of complex values as an input, we need to know how to access
real and imaginary parts.

The types I have in mind are completely opaque to MLIR and manipulated
only through library calls emitted upon lowering to the LLVM
dialect. Dealing with types only through a well-defined interface
implemented with traits would allow for a very generic solution
supporting such use cases.

Furthermore, I fear that exposing the details required to deal with
compound types composed of built-in types adds unnecessary
complexity. Encapsulating that logic into an operation exposed through
the respective trait looks like the best solution to me.

CC @nicolasvasilache @pifon2a @MaheshRavishankar based on top contributors to mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

This certainly reduces the number of required traits. However, the
downside is that this mixes potentially unrelated operands in a rather
large trait (there are already 10+ operators used by Linalg named
ops), which is also likely to be extended with the arrival of new
named operations. The latter might turn out to be problematic, as this
burdens either the developer contributing the new named operation or
the maintainers of the scalar types with the modification of all types
shipped with MLIR. A default value for each arithmetic operator,
indicating that it is not supported, takes away the pressure of
immediate implementation, but would require an extra state “not
implemented” in addition to “supported” and “not supported” for clear
semantics. However, my feeling is that “not implemented” should be
indicated with the absence of the implementation of a trait rather
than in an additional layer of abstraction.

I was thinking of replacing the arithfn__add, arithfn__mul, etc in the RegionBuilderHelper by something similar to the following code snippet:

auto arithOpBuilderInterface =
  lhs.getType().dyn_cast<ArithOpBuilderTypeInterface>();
if (!arithOpBuilderInterface)
  // unsupported type
FailureOr<Value> result = 
  arithOpBuilderInterface.create(builder, "add", lhs, rhs);
if(failed(result))
  // unsupported operation

An enum instead of passing the operation type by string would trigger an earlier error in the generation process and may be nicer. If a new operation is added, only the types that want to support it need to update their type interface implementation.

Alternatively, we may generate one interface for every operation/function as you suggest. I think the code would look very similar? The main difference between the two solutions seems to be where the operation/function dispatch happens?

Let me know if I missed an important point here and feel free to post some pseudo code to show your approach.

I’ll look into this, thanks for pointing this out!

It probably requires supporting type interfaces in Python. The documentation indicates they are not yet supported.

The types I have in mind are completely opaque to MLIR and manipulated
only through library calls emitted upon lowering to the LLVM
dialect. Dealing with types only through a well-defined interface
implemented with traits would allow for a very generic solution
supporting such use cases.

Ok, then the type interface approach makes much more sense.

An enum instead of passing the operation type by string would
trigger an earlier error in the generation process and may be
nicer. If a new operation is added, only the types that want to
support it need to update their type interface implementation.

Thanks for pointing me to Type interfaces. I was only aware of bare
type traits, which, AFAIU, cannot be defined externally. My intial
concern with the bare type traits implementation was that each type
would have to implement the traits explicitly via the type definition,
resulting either in a large list of traits when using one trait per
arithmetic operator, or one big trait that mixes operators which are
only related by their use in linalg named operations.

However, having a type interface for Linalg scalar operations (e.g.,
LinalgArithOpBuilderTypeInterface), which can be added externally for
the builtin types, solves all these issues, since it makes clear
how/why the operators are related and keeps the maintenance for the
builtin types local to the Linalg code.

I also agree that using an enum is preferable here. In addition to
that, I’d suggest to have three different outcomes for a call to
LinalgArithOpBuilderTypeInterface::create:

  • A default value “Not implemented”: a sensible implementation of the
    operator might exist, but hasn’t been implemented (e.g., due to an
    recent extension of the set of operators in
    LinalgArithOpBuilderTypeInterface that hasn’t been taken into
    account yet for the type implementing the interface)

  • A value “Not supported”, indicating that there is no sensible
    implementation of the operator for the type)

  • An mlir::Value representing the outcome of the operation if the operators
    is supported and implemented

Alternatively, we may generate one interface for every
operation/function as you suggest. I think the code would look
very similar? The main difference between the two solutions
seems to be where the operation/function dispatch happens?

Using type interfaces I prefer indeed using a single interface. As pointed
out above, I was reasoning in terms of bare type traits when I suggested
using separate traits.

It probably requires supporting type interfaces in Python. The documentation
indicates they are not yet supported.

Thanks for the heads-up! Hopefully that’s not a showstopper for now.

Thanks for pointing me to Type interfaces.

I don’t have extensive experience with them myself but I think they would be the method of choice here :).

I’d suggest to have three different outcomes for a call to
LinalgArithOpBuilderTypeInterface::create

Yes having three return values sounds fine. I used FailureOr because it is convenient. We could also have additional methods on the interface to check if an operation is not supported / not implemented. That is a design detail though.

Thanks for the heads-up! Hopefully that’s not a showstopper for now.

I don’t think so. I guess a good start is to go step by step and maybe start off with an interface first, using it on the C++/yaml side, etc.

Sounds good. I’ll experiment a bit with Type Interfaces and then put a patch together.

Crated and submitted the patch. The review is at âš™ D118022 [mlir][linalg] Add support for arbitrary element types for named operations.

Thanks for the RFC and proposed PR.

After going over the proposed impl, I have some concerns over the sheer complexity involved for something that seems quite simple on the surface, so maybe I am missing something more fundamental.

First, any new contribution related to named ops should start being included in a Frontend subdirectory.
In practice this is really useful to improve the programming model abstraction (and orthogonally to match patterns).
It is however less useful when coming form a higher-level programming model such as XLA or TOSA.

Second, it seems that what you really want is a frontend-oriented LinalgNamedOpInterface or LinalgFrontEndOpInterface which exposed attributes (e.g. add).
The attributes would capture the operation name you want at the instance level.

You would have an IR resembling:

linalg.matmul add="my_dialect.my_fancy_add" mul="my_dialect.my_fancy_mul"
  ins(%A: tensor<!my_fancy_type> ...) outs(...)

This should allow to be extensible etc without a single line of C++ once the basic flow is set up.
You should be able to use the generic op creation from a state + operation name to build your ops.

Further issues with the current proposed PR are that it moves the traditional builder logic to a subset through an enum. This is not extensible in general.
In the very general extensibility case with control-flow etc, the probably best solution would be just a function symbol. But we are not there yet.

Thanks @nicolasvasilache for sharing your thoughts on the RFC.

After going over the proposed impl, I have some concerns over the
sheer complexity involved for something that seems quite simple on the
surface, so maybe I am missing something more fundamental.

First, any new contribution related to named ops should start being
included in a Frontend subdirectory. In practice this is really
useful to improve the programming model abstraction (and orthogonally
to match patterns). It is however less useful when coming form a
higher-level programming model such as XLA or TOSA.

I assume this should be a subdirectory of
mlir/{include/mlir,lib}/Dialect/Linalg.

Second, it seems that what you really want is a frontend-oriented
LinalgNamedOpInterface or LinalgFrontEndOpInterface which exposed
attributes (e.g. add). The attributes would capture the operation
name you want at the instance level.

You would have an IR resembling:

linalg.matmul add="my_dialect.my_fancy_add" mul="my_dialect.my_fancy_mul"
  ins(%A: tensor<!my_fancy_type> ...) outs(...)

This looks like an interesting approach, which would also support some
of our odd use cases, where scalar operations are applied to operands
with different types. At first, I was a bit concerned about the
verbosity in the textual representation of the IR, but this should not
be an issue with attribute value aliases.

Can you elaborate a bit on what you mean with exposed attributes?
This sounds as if one could define an OpInterface with attributes,
which can be specified in the IR for the operation implementing the
interface. I searched through the documentation and grepped a bit
through the sources, but couldn’t find anything in that direction.

Also, it is yet unclear to me how operations should be instantiated
from the attributes. AFAIU, the only option to specify an operation is
to store its name in a string attribute. What is the idiomatic way to
create an operation from a string?

So in summary, the implementation of the solution consists mainly of:

  • Adding a new OpInterface named LinalgNamedOpInterface or
    LinalgFrontEndOpInterface with exposed attributes for all scalar
    operations (i.e., add, mul, sub, etc.) to a new file in
    mlir/include/mlir/Dialect/Linalg/Frontend.

  • Implementing the OpInterface for all Linalg named operations, e.g.,
    by adding appropriate output to linalg-ods-yaml-gen.

  • Modify the helper functions in RegionBuilderHelper, such that
    either the custom operation specified in the respective attribute is
    instantiated or, if the attribute has not been specified and the
    operands are built-in types, the default operation for that type is
    created.

Further issues with the current proposed PR are that it moves the
traditional builder logic to a subset through an enum. This is not
extensible in general. In the very general extensibility case with
control-flow etc, the probably best solution would be just a function
symbol. But we are not there yet.

I am not sure I got this last part correctly. Could you give a short
example of this use case?

All in all, this sounds good and I am willing to work on the proposed
solution. I’d just like to get the ideas straight before starting the
implementation.

Thanks,
Andi

Basically the interface would have methods such as getAddOpName that query the proper attribute is part of the op.
The verifier would ensure that these are present.
The tablegen definition of the op would need to specify this attribute (could be a unique dictionary attr).
Note that you want to avoid putting all attributes on everything so the interface should specify which subset of ops it expects (e.g. NamedAddMulOpInterface or NamedOpInterface<“add_op”, “mul_op”> where add_op/mul_op could well be “arith.max” / “arith.addi”).

You can builder.create(OperationState) and explicitly pass the name in OperationState.
This is how local ops that aren’t registered with a dialect can be created locally.

Yes, and you prob want a “default” that can be elided to avoid increasing verbosity in the common case.
Parsing and printing may be more involved here but it would be a good thing to better separate the auto-generated named ops form the load bearing generic.

I was just thinking that if you want to configure a much more advanced region than a simple fma + a few unary ops, the general case will likely be:

func @some_impl(%a: !my_fancy_type) {
  %b = another_op(%a, %a)
  %c = call @some_other_function_(b) : (!my_fancy_type) -> (!my_fancy_type)
  return %b: !my_fancy_type
}

linalg.my_fancy_op impl="some_impl fun" (%O: memref<?x!my_fancy_type>)

whenever you need to lower to loops you can just inline @some_impl where it is needed.
This would be the more general setup for which we don’t wand special attrs name.

I also wanted to give some background on how we may want to evolve OpDSL.

OpDSL already supports attributes to define strides and dilations:

 %1 = linalg.conv_2d_nhwc_hwcf {
    dilations = dense<1> : tensor<2xi64>,
    strides = dense<1> : tensor<2xi64>}

We may now want to extend the attribute mechanism to functions to reduce the number of operators. For example, we currently have different pooling operators for max, min, unsigned pooling and we would like to have only one with a configurable reduction function. Attributes would allow us to get there:

def pooling_nhwc_sum(
    I=TensorDef(T, S.N, S.OH * S.SH + S.KH * S.DH, S.OW * S.SW + S.KW * S.DW, S.C),
    K=TensorDef(T, S.KH, S.KW, index_dims=[D.kh, D.kw]),
    O=TensorDef(T, S.N, S.OH, S.OW, S.C, output=True),
    fun = ArithFnAttrDef(default=ArithFn.add)):
  O[D.n, D.oh, D.ow, D.c] = Reduce<fun>[D.kh, D.kw](I[D.n, D.oh + D.kh, D.ow + D.kw, D.c])

On the C++ side the reduce fun needs to be set to an enum value (the enum should be tablegen generated):

linalg.pooling_nhwc_sum {fun=linalg.add} ...

These changes alone do not yet support arbitrary types. Yet they may open up opportunities. For example, if the operand types are defined by an unknown_dialect, we could try to create unknown_dialect.add, we could use a type interface to create the add, etc.