LLVM Discussion Forums

Ease of defining new types

Hi All-

I’m new to both MLIR & LLVM development, so forgive me if I’m missing something fundamental.

As far as I can tell (and in all the examples), in order to define a new type a boatload of C++ has to written. Type storage, hashing for any C++ data types used in the type storage key, the actual type class façade, and a parser/printer. In my experience, it’s customized boilerplate in most cases. The relative difficulty is a shame as it discourages development of large, rich type systems. (I’m big on type systems.)

Question: would it be appropriate to enhance tablegen to automate this? Particularly the storage class and parser/printer for common cases? (I think it’s already possible in tablegen to write the type façade in tablegen.)

I’m really not familiar with tablegen, so I don’t know if extension is necessary or if there’s a way to do this already. (Or if this has previously been discussed.)

~John

Hi John,

You are correct in that much of it is boilerplate and could be reasonably generated by tablegen. We already have lots of examples doing similar things with Operations. The main difference is that the generators would need to be smarter than those for operations. There are many different things that need explicit handling and care. I’m only okay with auto-generating code when it can be reasonably clean and performant. For example, we should not encourage putting any kind of vector/SmallVector/etc. data structure inside of attributes/types and instead use ArrayRefs. Depending on the tablegen backend is structured, this has to be exposed to the user somehow. That is more of an implementation detail, but something to consider.

Having thought about it myself, it is reasonable but also more difficult than the things we have done before. Operations have a defined interface, whereas attributes/types do not.

Extending tablegen would be necessary.

–River

Forgot to say. If this is something that you would be interested in contributing, I would be happy to help guide/review/etc. anything.

– River

Hey,

This has been discussed before and would be a welcome addition - at least for the simple case it wouldn’t be too bad and indeed remove a lot of boilerplate. As River mentioned it can be very hairy in general.

The tablegen backend (ODS) would need to be yes.

Thanks,

Jacques

Ok… I’ve looked through the code and I think I understand the tblgen (C++) code. I also see the .td files for the tblgen tests, but it don’t see a place where the output from those tests are checked. Could you send me a pointer plz? Do I just have to add FileCheck tests in the td test file?

… Still not committing to any thing, just gauging how hard this would be.

Also, my reading of the code implies that the attribute stuff in the existing tablegen files are not for defining new attributes. They’re just for use in other tablegen files as constraints, which end up getting translated to code in the generated OP C++ code referring to the backing attribute which is written in C++. Am I correct?

I’ve put together an example tablegen which I think would be sufficient for simple types: https://github.com/teqdruid/llvm-project/blob/tblgen-types/mlir/test/mlir-tblgen/type.td

I’d appreciate feedback.

Is there anyway to get a formal review of a new feature and proposed syntax before I start coding?

If you’re looking for specific feedback, I think it’s easiest to do that in Phabricator after the concepts are discussed here. I tend to mark a patch [InProgress] so that people review it with that in mind (for instance, the patch might be obviously incomplete). I took a look at your proposal and it seems reasonable to a first order, but maybe others with more experience designing Tablegen formats would feel differently. How does this correspond to the c++ classes you intend to generate?

This would be the place, but it helps to have more information on what you are doing. It is difficult from your example tablegen to surmise what the various bits do. I looked at it for a bit, but struggled to see how this would work in practice. Could you please write a bit more detailed description of how you would expect the tablegen interface to look and then we can go from there?

You don’t need to have the entire thing scoped out or implemented, but it’s easier for people to comment and contribute when you put the content on discourse or phabricator instead of git links.

Thanks
– River

Hi Steve and River-

For now, I’m just sketching out what this would look like and do. I’ll submit a patch to phabricator once I’m a little further on.

Yeah, that proposal link which I had previously posted was quick ‘n’ dirty. It was more-or-less to just to ascertain if I should be building on any tablegen code which already exists. Here’s a revised proposal. I’ve expanded the comments a bit, but I’m not familiar with all the terminology.

This tablegen code:

include "mlir/IR/OpBase.td"

// *****
// This would go in OpBase.td


// 'TypeMember' represents a parameter which goes in the type storage.
//    'fieldName' is the member name in the storage struct, and various function
//    parameters. The type of this member can be defined multiple ways (below)
class TypeMember<string fieldName>;

// Define a new type belonging to a dialect and called 'name'
class TypeDef<Dialect dialect, string name> {
    // Name of storage class to generate or use
    string storageClass = name # "Storage";
    // Namespace (withing dialect c++ namespace) in which the storage class resides
    string storageNamespace = "detail";
    // Should we generate the storage class? (Or use an existing one?)
    int genStorageClass = 1;

    // What is the enum containing 'kinds' called?
    string kindsEnum = "Types";
    // The symbol for the 'kind' entry
    string kind = name;
    // Use the lowercased name as the keyword for parsing/printing
    string keyword = name;

    // This is the list of fields in the storage class (and list of parameters
    // in the creation functions). If empty, don't use or generate a storage class
    list<TypeMember> contents = [];
}

// Directly use thie c++ 'Type' class name
class CppType<string cppTypeName, string fieldName> : TypeMember<fieldName>;

// Reference to another tablegen'd typedef
class TypeDefName<TypeDef type, string fieldName> : TypeMember<fieldName>;

// Generate a new struct to use in a storage field. In this case, the fieldName
// doubles as the c++ struct name.
class NewStructType<list<TypeMember> members, string fieldName> : TypeMember<fieldName>;

// A list of other TypeMembers
class ListOf<TypeMember, string fieldName> : TypeMember<fieldName>;

// ************
// Test defs
def Test_Dialect: Dialect {
    let name = "TestDialect";
}

// Base class for other typedefs. Provides dialact-specific defaults
class TestType<string name> : TypeDef<Test_Dialect, name> {
    // Override the default enum name
    let kindsEnum = "TestDialectTypes"
}

// SimpleTypeA is a simple type. A storage class will not be generated.
def SimpleTypeA : TestType<"SimpleA"> { }

// A more complex parameterized type
def CompoundTypeA : TestType<"CompoundA"> {
    // Override the default mnemonic
    let keyword = "cmpnd_a";

    // What types do we contain?
    let contents = [
        // A standard c++ int
        CppType<"int", "widthOfSomething">,
        // The simple type defined above
        TypeDefName<SimpleTypeA, "exampleCustomType">,
        // Define a new C++ struct in which to store elements
        NewStructType<[
            // Only contain one member which is the 'CustomTypeDefInCPP', a type
            // previously defined in C++
            CppType<"CustomTypeDefInCPP", "firstMember">
        ], "generatedStruct">
    ];
}

Would produce something like this C++ code (relevant td code copied as comments). This code won’t compile and may have some issues, but I think it’s close enough to to demonstrative.

// tablegen code:
// def SimpleTypeA : TestType<"SimpleTypeA"> {
//     let name = "SimpleA";
// }
// This would generate the following c++ header code:

    class SimpleTypeA : public Type::TypeBase<TestType, Type> {
    public:
      using Base::Base
      static bool kindof(unsigned kind) { return kind == TestDialectTypes::SimpleA; }
      static SimpleTypeA get(MLIRContext *context) {
        return Base::get(context, TestDialectTypes::Test);
      }
      StringRef getKeyword() { return "simplea"; }
      static Type parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser);
    }
    
// and the following for the .cpp file:
    Type parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser) {
      return get(ctxt);
    }
    // printer code omitted (it would be the inverse) as I assume it would be obvious from the parser
    

// tablegen code:
// def CompoundTypeA : TestType<"CompoundTypeA"> {
//     let name = "CompoundA"
//     let keyword = "cmpnd_a";
//     // What types do we contain?
//     let contents = [
//         CppType<"mlir::IntegerType", "countOfSomething">,
//         TypeDefName<SimpleTypeA, "exampleCustomType">,
//         NewStructType<[
//             CppType<"CustomTypeDefInCPP", "firstMember">
//         ], "generatedStruct">
//     ];
// This would generate the following c++ header code:
    namespace detail {
      struct CompoundTypeAStorage;
    }
    class CompoundTypeA : public Type::TypeBase<TestType, Type, details::CompoundTypeAStorge> {
    public:
      using Base::Base
      static bool kindof(unsigned kind) { return kind == TestDialectTypes::CompoundA; }
      static CompoundTypeA get(MLIRContext *context, int countOfSomething, SimpleTypeA exampleCustomType, GeneratedStruct generatedStruct) {
        return Base::get(context, TestDialectTypes::CompoundA, countOfSomething, exampleCustomType, generatedStruct);
      }
      StringRef getKeyword() { return "cmpnd_a"; }
      static Type parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser);
    }
//
// for the .cpp file:
//
    namespace details {
    struct CompoundTypeAStorage : public TypeStorage {
      struct GeneratedStruct {
        CustomTypeDefInCPP firstMember;
    
        static llvm::hash_code hashKey(const GeneratedStruct& generatedStruct) {
            return llvm::hash_combine(firstMember);
        }
        LogicResult parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser);
      };
    
      CompoundTypeAStorage(int countOfSomething, SimpleTypeA exampleCustomType, GeneratedStruct generatedStruct)
          : countOfSomething(countOfSomething), exampleCustomType(exampleCustomType), generatedStruct(generatedStruct) {}
    
      using KeyTy = std::tuple<int, SimpleTypeA, GeneratedStruct>;
    
      bool operator==(const KeyTy &key) const {
        return key == KeyTy(countOfSomething, exampleCustomType, generatedStruct);
      }
    
      static llvm::hash_code hashKey(const KeyTy &key) {
        return llvm::hash_combine(std::get<0>(key), std::get<1>(key), std::get<2>(key));
      }
    
      static KeyTy getKey(int countOfSomething, SimpleTypeA exampleCustomType, GeneratedStruct generatedStruct) {
        return KeyTy(countOfSomething, exampleCustomType, generatedStruct);
      }
    
      static CompoundTypeAStorage *construct(TypeStorageAllocator &allocator,
                                           const KeyTy &key) {
        return new (allocator.allocate<CompoundTypeAStorage>())
            CompoundTypeAStorage(key.first, key.second);
      }
    
      static Type parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser);
    
      int countOfSomething;
      SimpleTypeA exampleCustomType;
      GeneratedStruct generatedStruct;
    };
    }
    
    Type CompoundTypeA::parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser) {
      return details::CompoundTypeAStorage::parse(ctxt, parser);
    }
    Type CompoundTypeAStorage::parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser) {
        int countOfSomething;
        Type exampleCustomType;
        GeneratedStruct generatedStruct;
    
        if (parser.parseLess()) return Type();
        if (parser.parseInt(countOfSomething)) return Type();
        if (parser.parseComma()) return Type();
    
        if (parser.parseType(exampleCustomType)) return Type();
        if (!generatedStruct.isa<SimpleTypeA>()) {
            parser.emitError("Expected SimpleTypeA");
            return nullptr;
        }
        if (parser.parseComma()) return Type();
        generatedStruct.parse(ctxt, parser);
        if (parser.parseGreater()) return Type();
    
        return CompoundTypeA::get(ctxt, countOfSomething, exampleCustomType.cast<SimpleTypeA>(), generatedStruct);
    }
    ParseResult CompoundTypeAStorage::GeneratedStruct parse(mlir::MLIRContext* ctxt, mlir::DialectAsmParser& parser) {
        Type firstMember;
        if (parser.parseLBrace()) return failure();
        if (parser.parseType(firstMember)) return failure();
        if (!firstMember.isa<CustomTypeDefInCPP>())
            return emitError(parser.getCurrentLocation(), "Expected CustomTypeDefInCPP");
        this->firstMember = firstMember.cast<CustomTypeDefInCPP>();
        if (parser.parseRBrace()) return failure();
        return success();
    }


// For parsing, the following snippet would be generated:
//
if (keyword == SimpleTypeA::getKeyword())
  return SimpleTypeA::parse(getContext(), parser);
if (keyword == CompoundTypeA::getKeyword())
  return CompoundTypeA::parse(getContext(), parser);

//
// for use in the following Dialect type parser:
// 
Type TestDialect::parseType(DialectAsmParser &parser) const {
  llvm::StringRef typeKeyword;
  if (parser.parseKeyword(&typeKeyword))
    return Type();
  #define GET_TYPE_PARSER_SELECTION
  #include "Dialects/TestDialect/Types.cpp.inc"
  return Type();
}

Does this help?

(These files are also on GH: https://github.com/teqdruid/llvm-project/blob/tblgen-types/mlir/test/mlir-tblgen/type.td, https://github.com/teqdruid/llvm-project/blob/tblgen-types/mlir/test/mlir-tblgen/type_output.cpp)

Thanks!

Thanks for the update!

I think the simple type functionality is easy enough to follow, and looks good. My major concern with the complex type is that it seems to be attempting to encode the structure of C++ in tablegen, which is something I would rather strongly avoid. I would look at tablegen as something that is useful when the sugar is nice, and not terribly difficult to follow. IMO this sugar is one of the more difficult aspects of getting this right. I would suggest that you strip away the ability to define new structs inline, at least at first, and focus more on getting the user interface as nicely sugared as you can. Could you use of the standard types, LLVM types, or any other existing type as an example of how you would map it to tablegen?

– River

Come to think of it, defining the potentially complex struct/list types to get put in the storage class isn’t the part which is painful. I only put it in my last proposal to automate parsing, but it’s less complex to require those C++ structs/classes to have their own parsing methods/functions.

Here’s an updated proposal.


include "mlir/IR/OpBase.td"
include "mlir/Dialect/StandardOps/IR/Ops.td"

// *****
// This would go in OpBase.td


// 'TypeMember' represents a parameter which goes in the type storage.
//    'fieldName' is the member name in the storage struct, and various function
//    parameters. 'cppType' is the C++ symbol to use for this member.
class TypeMember<string fieldName, string cppType> {
    string fieldName = fieldName;
    string cppType = cppType;
}

// Define a new type belonging to a dialect and called 'name'
class TypeDef<Dialect dialect, string name> {
    // Name of storage class to generate or use
    string storageClass = name # "Storage";
    // Namespace (withing dialect c++ namespace) in which the storage class resides
    string storageNamespace = "detail";
    // Should we generate the storage class? (Or use an existing one?)
    int genStorageClass = 1;

    // What is the enum containing 'kinds' called?
    string kindsEnum = "Types";
    // The symbol for the 'kind' entry
    string kind = name;
    // Use the lowercased name as the keyword for parsing/printing
    string keyword = name;

    // This is the list of fields in the storage class (and list of parameters
    // in the creation functions). If empty, don't use or generate a storage class
    list<TypeMember> contents = [];

    // Customization settings

    // If set, don't generate parsing definition, just the declaration
    int customParsePrint = 0;
    // If set, generate accessors for each TypeMember
    int generateAccessors = 1;
    // Generate the verifyConstructionInvariants declaration and getChecked method
    int verifyInvariantsDecl = 0;
    // Extra code to include in the class declaration
    code extraDecls;
}

// ************
// Test defs
def Test_Dialect: Dialect {
    let name = "TestDialect";
}

// Base class for other typedefs. Provides dialact-specific defaults
class TestType<string name> : TypeDef<Test_Dialect, name> {
    // Override the default enum name
    let kindsEnum = "TestDialectTypes";
}

// SimpleTypeA is a simple type. A storage class will not be generated.
def SimpleTypeA : TestType<"SimpleA"> { }

// A more complex parameterized type
def CompoundTypeA : TestType<"CompoundA"> {
    // Override the default mnemonic
    let keyword = "cmpnd_a";

    // What types do we contain?
    let contents = [
        // A standard c++ int
        TypeMember<"widthOfSomething", "int">,
        // The simple type defined above
        TypeMember<"exampleTdType", "SimpleTypeA">,
        // Some C++ type
        TypeMember<"exampleCppType", "SomeCppStruct">
    ];
}

// Base class for standard types
class StdType<string name> : TypeDef<StandardOps_Dialect, name> {
    // Override the default enum name
    let kindsEnum = "StandardTypes::Kind";
}

def IndexType : StdType<"IndexType"> {
    let keyword = "index";
}

def IntegerType : StdType<"IntegerType"> {
    let keyword = "int";
    let customParsePrint = 1;
    let verifyInvariantsDecl = 1;
    let contents = [
        TypeMember<"signedness", "SignednessSemantics">,
        TypeMember<"width", "unsigned">
    ];
    
    let extraDecls = [{
  /// Signedness semantics.
  enum SignednessSemantics {
    Signless, /// No signedness semantics
    Signed,   /// Signed integer
    Unsigned, /// Unsigned integer
  };

  /// This extra function is necessary since it doesn't include signedness
  static IntegerType getChecked(unsigned width, Location location);

  /// Return true if this is a signless integer type.
  bool isSignless() const { return getSignedness() == Signless; }
  /// Return true if this is a signed integer type.
  bool isSigned() const { return getSignedness() == Signed; }
  /// Return true if this is an unsigned integer type.
  bool isUnsigned() const { return getSignedness() == Unsigned; }

    }];
}

def TupleType : StdType<"TupleType"> {
    let keyword = "tuple";
    let contents = [
        TypeMember<"types", "ArrayRef<Type>">
    ];

    let extraDecls = [{
  /// Accumulate the types contained in this tuple and tuples nested within it.
  /// Note that this only flattens nested tuples, not any other container type,
  /// e.g. a tuple<i32, tensor<i32>, tuple<f32, tuple<i64>>> is flattened to
  /// (i32, tensor<i32>, f32, i64)
  void getFlattenedTypes(SmallVectorImpl<Type> &types);

  /// Return the number of held types.
  size_t size() const;

  /// Iterate over the held elements.
  using iterator = ArrayRef<Type>::iterator;
  iterator begin() const { return getTypes().begin(); }
  iterator end() const { return getTypes().end(); }

  /// Return the element type at index 'index'.
  Type getType(size_t index) const {
    assert(index < size() && "invalid index for tuple type");
    return getTypes()[index];
  }
    }];
}

I haven’t updated the C++ code since I haven’t had time. Let me know it you think it’s necessary to review this proposal and I’ll give it higher priority.

~John

Thanks for iterating and driving this! This is a really integral piece of infra.

For these you could actually just use normal tablegen, I did the same with interfaces which have the same need to use the raw C++ type:

let contents = (ins "int":$widthOfSomething, "SimpleTypeA":$exampleTdType, ...);

If you are going to have a possible mnemonic, I would likely make it optional. Opt-out could be fine. There are classes of types/attributes that don’t have a leading mnemonic.

Slight nit, but I prefer that boolean type values in tablegen use bit and are prefixed with something like has(e.g. hasCustomParsePrint). Otherwise it can be ambiguous as to whether this is an overridable code block, or a flag.

One unfortunate aspect is that this prevents members from being merged together, unless tablegen did that automatically? For IntegerType it likely wouldn’t matter that much, but it is something to seriously consider for attributes/types with a large number of instances. We should encourage more optimized representations when possible.

Before post edit: After looking further, it seems like you allow for providing a custom storage class. That should cover the cases I was concerned about, thanks.

How do you intend to handle reference parameters should as ArrayRef/StringRef? Given their lifetime, the data needs to be copied into the allocator when the storage type is constructed. I may have missed it, but do you require that the user provide the construction function for the storage when tablegen is generating it? I think that would be a very reasonable thing to require given the many different situations that can arise.

I would structure this similarly to how OpBuilders are structured for operations. More specifically, it would be nice if the user could enumerate the different signatures for the verify method that they want and optionally provide a code block.

What about providing a code block instead? You could then effectively generate the block of code that
dispatches between types in Dialect::parse/print(Attribute|Type). This would also match the behavior of the corresponding tablgen Op methods.

I don’t think I need to see the generated C++ at this point. I can follow along(I think) with where you are going, and it looks good. I mostly have questions about various corner cases and what exactly tablegen is responsible for generating. IMO if we can generate a majority of the boilerplate and leave it up to the user to define the rest(e.g. storage class construction methods) that would already be a huge win.

– River

Sure thing! I’m still not promising any code to implement this, but that’d be the easier part vs fleshing out the tablegen syntax/semantics. This would make my life easier, though, so I’ll probably do it.

let contents = (ins "int":$widthOfSomething, "SimpleTypeA":$exampleTdType, ...);

I didn’t know this syntax existed! What would the type (in the class) be for contents? Is this just the literal for a class (which ins is)?

If you are going to have a possible mnemonic, I would likely make it optional. Opt-out could be fine. There are classes of types/attributes that don’t have a leading mnemonic.

I’ll do that. Having the mnenomic default name almost never makes sense IME.

One unfortunate aspect is that this prevents members from being merged together, unless tablegen did that automatically?

I don’t understand ‘merged together’.

How do you intend to handle reference parameters should as ArrayRef/StringRef? Given their lifetime, the data needs to be copied into the allocator when the storage type is constructed. I may have missed it, but do you require that the user provide the construction function for the storage when tablegen is generating it? I think that would be a very reasonable thing to require given the many different situations that can arise.

Shit. I’m not used to C++ – I’m used to garbage collected languages. I’d like to be able to auto-generate the construction method in the common case as this is one of the PITA boilerplates. ArrayRefs and StringRefs are fairly common IME. Got any ideas? I’m thinking adding a per-element flag indicating that an allocation is necessary for this copy. If so, how would I modify the above let contents = ... syntax to support this? I’ll add an option to let the user provide a custom construction function.

I would structure this similarly to how OpBuilder s are structured for operations. More specifically, it would be nice if the user could enumerate the different signatures for the verify method that they want and optionally provide a code block.

Are you talking about this sort of thing?

  let builders = [OpBuilder<
    "OpBuilder &builder, OperationState &result, MemRefType memrefType", [{
       result.types.push_back(memrefType);
     }]>,
    OpBuilder<
    "OpBuilder &builder, OperationState &result, MemRefType memrefType, " #
    "ValueRange operands, IntegerAttr alignment = IntegerAttr()", [{
       result.addOperands(operands);
       result.types.push_back(memrefType);
       if (alignment)
         result.addAttribute(getAlignmentAttrName(), alignment);
     }]>];

If so, I personally don’t think this saves much while adding complexity. I also don’t like having too much C++ code in tablegen (since it’s not compatible with my syntax highlighter and code completion). But since it’s an existing paradigm, there’s significant value in continuing that. I am concerned about trying to jam too much into this first version. Would you object to keeping the way it is in my proposal and adding your suggestion in subsequent versions?

What about providing a code block instead? You could then effectively generate the block of code that dispatches between types in Dialect::parse/print(Attribute|Type). This would also match the behavior of the corresponding tablgen Op methods.

Good idea, though I would also like to support generating just the declaration so users can implement it in a C++ file for complex cases. Also, not generating anything should be supported. Null for nothing, an empty code block for just the decl, and a non-empty for the code block for the def? Or should I provide an explicit option to control this behavior? Actually, this could be a convention for this sort of thing (the custom construction function above being a good example).

~John

No worries, and no pressure. A lot of stuff that I’ve done is just because it annoyed me enough.

It’s just a dag. This is how the arguments for interface methods are defined(and used with the example I gave):

Sorry about that. Merged here refers to things like bitpacking. For example, IntegerType has two elements: width and signedness semantics. IntegerType has a maximum width, meaning that it doesn’t need the full 32-bits of an unsigned so we pack the signedness in the unused bits.

I think I agree with you here in practice, but I haven’t seen what we would be able to generate. The main win that this has over just using extraClassDecls, is that if you don’t need anything else in the extraClassDecls you don’t have to define it. For operations that is extremely useful, for attributes/types its not likely going to be that much of a win.

Right now we just have a convention of calling a user defined method, e.g. verify(*this)/parse(*this), to allow for defining in the .cpp file. That ends up being fine for the most part because you can put it into the base tablegen class and be done with it. I don’t really have that big of a preference here though, whatever ends up being the easiest/cleanest.

This has always been one of the sticking points when I’ve thought about this. I haven’t given it enough thought to have a great suggestion. This is also something that can be solved afterwards though, given that auto-generating the other things is a pretty big win IMO.

– River

I thought about this over my OOO, and I think we can just allow for type elements to optionally define a code block that should be used during allocation. This would let the ArrayRef/StringRef elements specify let allocator = "$allocator.copyInto($self)".

– River

Good idea – this would also allow container fields which require allocations for stuff they contain via custom function calls. Given that it should be on a per-member basis, I’m thinking:

class TypeBase;
def members : TypeBase;
def member : TypeBase;
...
    let contents = (
        members
        "int":$widthOfSomething,   // Simple case
        (member "ArrayRef<Type>":$types, [{$allocator.copyInto($self)}]),
        (member "ComplexCppClass":$bizarreField, [{$self.allocate($allocator)}])
    );

Am I using dag properly? I don’t really understand what purpose the dag operator serves.

While we’re talking about it, what do you think of ‘contents’? I’d prefer a more specific word, but nothing comes to mind. Naming stuff is hard.

~John

Yeah, that is how you can structure a dag. The operator is only really useful if you need it to, e.g. when declaring pattern matches the root is often a specific operation type.

For this I think we could let the user define a TypeMember<> or just use the string for simple types. This is similar to one the classes you had originally:

class TypeMember<string type> {
  code allocator = ?;
  string cppType = type;
}
class ArrayRefMember<string type> : TypeMember<type> {
  let allocator = "$allocator.copyInto($self)";
}

def MyType : ... {
  let members = (ins
    "int":$widthOfSomething
    ArrayRefMember<"Type">:$types
  );
}

I don’t really have a good suggestion, contents, members, elements, fields all seem fine.

– River

Hi River-

Sorry about the radio silence. I was diverted to a different (less fun) project for the last six weeks.

Here’s my latest proposal. I’ll be working on implementing it (if I’m not diverted again) this week and next. I think it incorporates all your feedback. I kept the code blocks for print/parse since I’d like to accommodate very simple print/parse functions, which I suspect are common.

// RUN: mlir-tblgen -gen-op-defs -I %S/../../include %s | FileCheck %s

include "mlir/IR/OpBase.td"
include "mlir/Dialect/StandardOps/IR/Ops.td"

// *****
// This would go in OpBase.td

// Define a new type belonging to a dialect and called 'name'
class TypeDef<Dialect dialect, string name> {
    // Name of storage class to generate or use
    string storageClass = name # "Storage";
    // Namespace (withing dialect c++ namespace) in which the storage class resides
    string storageNamespace = "detail";
    // Should we generate the storage class? (Or use an existing one?)
    int genStorageClass = 1;
    // Should we generate the storage class constructor?
    int hasStorageCustomConstructor = 0;

    // This is the list of fields in the storage class (and list of parameters
    // in the creation functions). If empty, don't use or generate a storage class
    dag members;

    // Customization settings

    // If null, don't generate any methods.
    // If an empty code block, generate just the declarations.
    // If a non-empty code block, just use that code as the definition code.
    code printer;
    code parser;
    // Use the lowercased name as the keyword for parsing/printing. Only used in
    // generated parse/print methods
    string mnemonic = ?;

    // If set, generate accessors for each TypeMember
    int generateAccessors = 1;
    // Generate the verifyConstructionInvariants declaration and getChecked method
    int verifyInvariantsDecl = 0;
    // Extra code to include in the class declaration
    code extraDecls;
}

// 'Members' should be subclasses of this or simple strings (which is a
// shorthand for TypeMember<"C++Type">)
class TypeMember<string type> {
  code allocator = ?;
  string cppType = type;
}

// For standard ArrayRefs, which require allocation
class ArrayRefMember<string type> : TypeMember<type> {
  let allocator = [{$allocator.copyInto($self)}];
}

// For classes which require allocation and have their own allocateInto method
class SelfAllocationMember<string type> : TypeMember<type> {
  let allocator = [{$self.allocateInto($allocator)}];
}

// ************
// Test defs
def Test_Dialect: Dialect {
    let name = "TestDialect";
}

// Base class for other typedefs. Provides dialact-specific defaults
class TestType<string name> : TypeDef<Test_Dialect, name> { }

// SimpleTypeA is a simple type. A storage class will not be generated.
def SimpleTypeA : TestType<"SimpleA"> { }

// A more complex parameterized type
def CompoundTypeA : TestType<"CompoundA"> {
    // Override the default mnemonic
    let mnemonic = "cmpnd_a";

    // What types do we contain?
    let members = (
        ins
        // A standard c++ int
        "int":$widthOfSomething,
        // The simple type defined above
        "SimpleTypeA": $exampleTdType,
        // Some C++ type
        "SomeCppStruct": $exampleCppType
    );
}

// Base class for standard types
class StdType<string name> : TypeDef<StandardOps_Dialect, name> { }

def IndexType : StdType<"IndexType"> {
    let mnemonic = "index";
}

def IntegerType : StdType<"IntegerType"> {
    let parser = [{}];
    let printer = [{}];
    let mnemonic = "int";
    let verifyInvariantsDecl = 1;
    let members = (
        ins
        "SignednessSemantics":$signedness, 
        "unsigned":$width
    );
    
    let extraDecls = [{
  /// Signedness semantics.
  enum SignednessSemantics {
    Signless, /// No signedness semantics
    Signed,   /// Signed integer
    Unsigned, /// Unsigned integer
  };

  /// This extra function is necessary since it doesn't include signedness
  static IntegerType getChecked(unsigned width, Location location);

  /// Return true if this is a signless integer type.
  bool isSignless() const { return getSignedness() == Signless; }
  /// Return true if this is a signed integer type.
  bool isSigned() const { return getSignedness() == Signed; }
  /// Return true if this is an unsigned integer type.
  bool isUnsigned() const { return getSignedness() == Unsigned; }

    }];
}

def TupleType : StdType<"TupleType"> {
    let mnemonic = "tuple";
    let members = (
        ins
        "int":$widthOfSomething,
        ArrayRefMember<"Type">:$types,
        SelfAllocationMember<"ComplexCppClass">:$bizarreField
    );

    let extraDecls = [{
  /// Accumulate the types contained in this tuple and tuples nested within it.
  /// Note that this only flattens nested tuples, not any other container type,
  /// e.g. a tuple<i32, tensor<i32>, tuple<f32, tuple<i64>>> is flattened to
  /// (i32, tensor<i32>, f32, i64)
  void getFlattenedTypes(SmallVectorImpl<Type> &types);

  /// Return the number of held types.
  size_t size() const;

  /// Iterate over the held elements.
  using iterator = ArrayRef<Type>::iterator;
  iterator begin() const { return getTypes().begin(); }
  iterator end() const { return getTypes().end(); }

  /// Return the element type at index 'index'.
  Type getType(size_t index) const {
    assert(index < size() && "invalid index for tuple type");
    return getTypes()[index];
  }
    }];
}

This is, of course, subject to changes during implementation since I’ll doubtless discover things which don’t make sense or that are infeasible for a first take.

~John