LLVM Discussion Forums

RFC: Declarative Op Assembly Format

RFC: Declarative Op Assembly Format

(Or Auto-Generating Custom Parsers and Printers for Operations)

Hi all,

I’d like to propose a specification format for declaratively defining the custom assembly format of operations. I believe that this can bring about many different benefits:

  • Remove a bunch of boilerplate c++
    • Defining a custom format, even a simple one, usually involves at least 20-30 lines of c++.
  • Better verification
    • Having a declarative format means that we can provide better verification of the completeness of the syntax with respect to round-tripping, e.g. make sure that the attribute dictionary is present.
  • More uniformity with operation syntax
    • By having a centralized format, the syntax of operations becomes more regular and uniform.
  • It’s a long standing TODO
    • See the bottom of the existing ODS documentation.

Before going into the exact details of the proposed format, I’d like to state upfront a major non-goal: this format is not intended to capture every use case. We should focus on capturing the major, commonish, use cases but leave the craziness/irregularity to those that really need/want it. With that being said, let’s jump into the proposal itself:

Op Asm Format

To illustrate the format, let’s look at an example of the format for std.call:

def CallOp ... {
  let arguments = (ins FlatSymbolRefAttr:$callee, Variadic<AnyType>:$operands);
  let results = (outs Variadic<AnyType>);
}

Below is the format that std.call is currently defined as and the equivalent in the declarative form:

static ParseResult parseCallOp(OpAsmParser &parser, OperationState &result) {
  FlatSymbolRefAttr calleeAttr;
  FunctionType calleeType;
  SmallVector<OpAsmParser::OperandType, 4> operands;
  auto calleeLoc = parser.getNameLoc();
  if (parser.parseAttribute(calleeAttr, "callee", result.attributes) ||
      parser.parseOperandList(operands, OpAsmParser::Delimiter::Paren) ||
      parser.parseOptionalAttrDict(result.attributes) ||
      parser.parseColonType(calleeType) ||
      parser.addTypesToList(calleeType.getResults(), result.types) ||
      parser.resolveOperands(operands, calleeType.getInputs(), calleeLoc,
                             result.operands))
    return failure();

  return success();
}
static void print(OpAsmPrinter &p, CallOp op) {
  p << "call " << op.getAttr("callee") << '(' << op.getOperands() << ')';
  p.printOptionalAttrDict(op.getAttrs(), /*elidedAttrs=*/{"callee"});
  p << " : " << op.getCalleeType();
}

vs:

$callee `(` $operands `)` attr-dict `:` `(` type($operands) `)` arrow-type(results)

Looking at the above, the format itself is comprised of three major components:

Directives

directive ::= identifier (( arguments ))?

  • A directive is a type of builtin function, with an optional set of arguments.

An initial set of directives, which will be expanded as needed, are listed below:

attr-dict

  • The attribute dictionary of the operation.

arrow-type | colon-type | type

  • Type of the given entity, which is either an operand or result.

operands

  • Represents all operands of the operation.

results

  • Represents all results of the operation.

Literals

literal ::= ` (keyword | punctuation) `

  • A literal is either a keyword or punctuation surrounded by ``.

Variables

variable ::= $ identifier

  • A variable is an entity that has been registered on the operation itself, think arguments, results, etc.

One Thorny Bit: Inferred Types

This section focuses on a particularly thorny bit of defining a declarative format, and the thing that I would most like opinion on(as this affects everyone, and I’m not perfect). One interesting aspect of the custom format for many operations is that the type for certain operands/results is often inferred from the types of other operands/results in the format. For example,

  • AddIOp
    • This is a binary arithmetic operation where all operands and results have the same type.
  • ExtractElementOp
    • This op infers the result type from the element type of the aggregate operand (this is one example of many).

The thorny bit is how we express this in the format, and there are many different options.

// Assume we have a format directive used to get the element type of the given
// input type.
def FormatGetElementType : FormatDirective<"$0.getElementType()">;

def AddIOp … {
  let arguments = (ins IntegerType:$lhs, IntegerType:$rhs);
  let results = (outs IntegerType);
}

def ExtractElementOp … {
  let arguments = (ins AnyTypeOf<[AnyVector, AnyTensor]>:$aggregate,
                       Variadic<Index>:$indices);
  let results = (outs AnyType);
}

With the base format being shown below, how should we inject the necessary constraint?
AddIOp:           $lhs, $rhs attr-dict colon-type($lhs)
ExtractElementOp: $aggregate `[` $indices `]` attr-dict colon-type($aggregate)

Some potentials:
// Inline as an argument:
$lhs, $rhs attr-dict colon-type($lhs, $rhs, results)
$aggregate `[` $indices `]` attr-dict colon-type($aggregate, results=FormatGetElementType)

// Inline as a trailing list:
$lhs, $rhs attr-dict colon-type($lhs)[$rhs, results]
$aggregate `[` $indices `]` attr-dict colon-type($aggregate)[results=FormatGetElementType]

// Inline with a colon list:
$lhs, $rhs attr-dict colon-type($lhs : $rhs, results)
$aggregate `[` $indices `]` attr-dict colon-type($aggregate : results=FormatGetElementType)

// Out-of-line.
$lhs, $rhs attr-dict colon-type($lhs)
  where type($rhs)=type($lhs), type(results)=type($lhs)
$aggregate `[` $indices `]` attr-dict colon-type($aggregate) 
  where type(results)=FormatGetElementType(type($aggregate))

For some of these we can likely get away with detecting specific traits, but this doesn’t really cover anything outside of standard/builtin types.

Any thoughts?

– River

3 Likes

Awesome! Thanks River for taking on this! It is certainly a long awaited feature that is super nice to have!

Agreed that we should support the common cases while leave the craziness to C++ where we can always have full control. That is consistent with other layers we have like DRR.

Regarding setting types, I vote for the following format

$lhs, $rhs attr-dict colon-type($lhs, $rhs, results)

for those operands/results that we can have direct type from the assembly. This form looks consistent as how $lhs, $rhs and results (we really should name the results with a symbol here…) are represented compared to other formats like ($lhs)[$rhs, results].

For those values whose type that must be computed somehow, the computation can be arbitrary. There is a decision to be made as for whether we want to support it at all. The strategy we’ve been following is basically have general fallback hooks to C++ (NativeCodeCall in DRR) so folks can plug in the specific logic. I assume your FormatGetElementType in the above means something similar. I’d suggest we put this kind of information out of line given they are really not a part of the assembly form (op asm format); they are just derived information. Being out of line makes that clear.

I agree with you Lei. I’ve been tossing some things around in my brain during holiday, and I think we can encode these using a more general type trait constraint mechanism.

I’ll work on posting an initial patch series for review in the next day or two so that people can see how this impacts operations in the main repository.

This would simplify adding new “pretty” ops a lot :slight_smile:

I think my concern with the inline ones is that they obfuscate the expected format and that we are now introducing a type trait constraint mechanism that is specific to parsing that does not seem reusable (e.g., this should fit in with the type inference and shape dialects, it should not need to be specified multiple times but having these shared might also mean we are complicating the parser/generator needlessly). And do we need FormatGetElementType? Given the goal is the common case, why not restrict it to cases of interest to start with instead? E.g., one can either you specify the full type, the result type, or result type of region of op (say). We can then decouple parsing and type constraint checking too [createChecked equivalents could still run verify].

This is a fairly common case.

My current intention is to try and use traits for all of these additional constraints.

Hi River,

I’m (as you know) very excited you are pushing on this. This has the potential to eliminate a ton of boilerplate!

I think you’re on the right path here. Having a specifier like type that can be parameterized is a good way to go. I’d recommend specifying a small collecting of “standard” specifiers that provide behaviors for common cases, but also making it extensible, by using some syntax that specifies the name of a C++ function/method to call. This function can be passed in an ArrayRef<Type> along with a raw_ostream to write into, or something similar.

-Chris

Hi all,
I’ve uploaded an initial patch series that adds the basics(without type inference stuffs yet) rooted at reviews.llvm.org/D73405. Just the initial functionality already enables the removal of ~600 lines of parsing code.

– River

The patches looks very nice to me; I approved them. I’m fine to land this batch and then iterate. :slight_smile:

Re: Inferred Types

Some years back at IBM I built a strongly typed dataflow IR. Operator templates were specified as to arity rules and typing rules. Arity rules allowed both fixed and variadic inputs, and fixed and variadic outputs. Type rules supported type checking and simple inference.

There were distinct rules for input connections and output connections:

Inputs rules:

  • Literal: this input’s source must have type LIT
  • Adoptive: this input adopts the type of the source driving it
  • Constrained: a constraint relative to an earlier input (typically input 0, input 1 or the preceding input; e.g. the type of the source driving this input must match input 0’s type)

Output rules:

  • Literal: this output has type LIT (e.g. irrespective of any input type(s) this output is Bool)
  • Adoptive: this output adopts the type of a specified input (typically input 0, input 1 or for variadic inputs matched to variadic output, the type of the matching input)

This was not based on any particular theory or published work. The set of rules were ad hoc and developed as needed. Often adding a arities shape required a new type inference rule.

My point here is that by specifying a small set of rules I was able to describe and type check all of the operators we invented. Those included traditional arithmetic and comparison, mux and demux, reduction operations, heterogeneous tuple construction and destruction, etc.

I attempted to use this today, but found that regions were not (yet?) supported… Any reason why?

It is still in the todo list, feel free to add the support if you’d like (sync with River first to avoid duplicated concurrent effort)

There are various different levels of support for regions(arguments, implicit-terminators, constraints on arguments, etc.). I have an in-progress patch that adds support for most of the basic stuff, but I’ve been busy doing other things. What level of support do you need for your use case? I’ll push it further up my queue.

Looking forward to it, this would also be useful in the context of “named Linalg ops”.

I do not have a clear idea how to use it yet but my intuitions is that if a region could be specified in a separate string and referred to in the assemblyFormat (e.g. like a local function with a name that does not escape Tablegen) that would be useful.

I just wrote C++ to solve the problem. I just have a simple op with one region, but I couldn’t use the declarative format because of the region, something like:

 let regions = (region $region)
  let assemblyFormat = [{
    	`(` $arg0 `,` $arg1 `)` attr-dict $region
  }];