Changing operation attribute storage

Various patches are going in to improve IR efficiency, which is super exciting. I just saw this one which is another nice step forward. At this point, my favorite workload is bottlenecked on three things (listed in order of “hotness”):

  1. Non-empty StringAttr::get - there are some minor optimizations that are probably possible here, but this really needs a concurrent hash table to make major gains with. These things are for string attributes on operations, so these inherently need to be StringAttr’s.

  2. Non-empty DictionaryAttr::get - This is predominantly for the attribute dictionary at the top level of an operation. For example, the std.constant op has a “value” attribute that stores the value of the constant. This is represented with an extra level of DictionaryAttr that holds a single entry {"value"=<whatever>}, and that DictionaryAttr needs to be allocated and uniqued.

  3. OperandRange is very hot as are getOperand calls because of how Operation is laid out. There has been some discussion about rearranging this so that Use records are at a constant offset from “Operation::this” and successors/regions are at variable offsets.

I’d like to talk about the second one here: it is really wasteful in both time and memory to create and store the extra level of DictionaryAttr, and it is also inefficient to query this thing “by string”, and there is a fixed arity of these attributes. Has anyone thought about ways of flattening this into an Operation and making it indexed by integer instead of looked up by string? We could keep the DictionaryAttr around for non-defined attributes like dialect attributes.

-Chris

I like this optimization! We should look into this indeed :slight_smile:

Some challenges I foresee will be around the Operation::getAttrs() and Operation::getAttrDictionary(), these offer a uniform access to the attributes, and to preserve them as they are we’d need to make them “expensive” (materializing a dictionary on the fly potentially).
These API can be useful: I’ve been relying on this for propagating attribute from one operation to another in a cheap way when doing transformations.

I agree we need something that provides uniform access to the declared and non-declared attributes, but I don’t think we need DictionaryAttr to do it. We can provide “a new thing” that handles the out of line case. It doesn’t need to be part of the Attribute’s hierarchy.

In fact, with that, the “out of line” attributes wouldn’t need to be a DictionaryAttr either.

I think there are two main open questions here:

  • How does the generic API work?
    Not just getAttrs/getAttrDictionary, but also getAttr.

  • How do we handle optional attributes?
    Numeric indexing get complicated in the presence of optional, how do we handle that here?

I think one step could be just inlining a NamedAttrList in Operation instead of uniquing, and then see how much of a benefit we get by separating the attribute types. After that we could evaluate how much benefit we get from splitting (vs the cost).

– River

Mehdi and I discussed this topic briefly, I still think it would be a great thing to do, for both performance and memory usage reasons. Here’s a braindump of some thoughts in case someone is interested in picking this up:

a) the win should be a big compile time win: the YourOp::getFooAttr() accessors will be much faster because you can use an integer index into an array stored on the Operation, rather than doing Identifier matches on a thing in an attr dictionary. This saves a lot of indirections. It should also be a memory win, because we’re not creating/uniquing/immortalizing the attr dicts needlessly: for example, DimOp stores an integer index attribute, but we currently also have a AttrDict with one element for each of these.

b) Given that AbstractOperation already has attrNames with a list of known attributes, I think the approach here is to allocate attrNames.size() Attribute values inline in the operation, using the existing TrailingObjects structure. Optional attributes would be null Attribute's.

c) Given this, we’d need an OperationAttrSet Operation::getAttrs() method, replacing the one that returns ArrayRef<NamedAttribute>. OperationAttrSet would just wrap an Operation* and provide iterators that skip over missing-optional attribute and synthesize the NamedAttribute by combining the inline attributes with their names in the AbstractOperation.

d) OperationAttrSet can have an explicit “toVector()” method that returns a SmallVector<Attribute> for cases that really really ArrayRef<NamedAttribute> explicitly. This will help phase in the new design in, and we can wean the compiler off it over time.

e) We also need the ability to start “random extra attributes” on the attribute. This can be done by either maintaining the existing attrs DictionaryAttr on the operation (but rename it to “extraAttrs” or something) or change that too to be a bespoke data structure. The former approach is easier (so I’d start by staging that in) but the later is more efficient in the normal case, because these don’t need to be uniqued.

f) The Operation::getAttrDictionary method needs to stick around for awhile due to source compatibility and to allow phasing things in, but it should lazily create the dict on demand. This will make it more inefficient than it currently is, and we want to move the codebase off it and onto OperationAttrSet directly. It isn’t widely used right now - only 28 uses in the MLIR tree. Some of them can probably be moved to using getAttrs() ahead of the bigger push.

Anyway, I think such a change would be a great step forward for the compiler.

-Chris

1 Like

So I’ve messed around with this and here are my findings so far:

These cases are quite rare, and when they do arise, they are most often trivially refactored to use some iterator range (within and outside the MLIR codebase). They are also very important to remove because materializing an op’s attributes as a dictionary will wipe out any memory gains by getting rid of it in the first place.

The most egregious example was in IR/SymbolTable.cpp with the function that found and replaced all nested symbol uses. It was a little tricky to refactor, and it’s the most significant: a DictionaryAttr might end up getting materialized by every op in the IR.

I think the easiest start would be to ditch DictionaryAttr and just stick some list structure inside Operation. We get a quick performance boost and generally don’t use too much more memory (except in degenerate cases: e.g. 1000 ops with the same attributes). In some cases, we use less.

The challenge with splitting the attributes is: how do you determine whether an identifier belongs to an intrinsic attribute or an extra attribute? Inline attributes can be stored without their identifier (Attribute instead of NamedAttribute) to save memory, so will the lookup have to traverse the intrinsic attribute names stored inside AbstractOperation at the same time? So much for avoiding indirection. Do we deprecate getAttr(StringRef/Identifier) and replace it with fine-grained methods like getIntrinsicAttr? Do we try to store the inline attributes in some sorted order?

Do we sort by identifier (pointer value) or string value? String lookups are slower but they occur much less commonly than identifier lookups (at least maybe until most of the identifier lookups are replaced with indices).

Yeah I agree with you. I think that refactoring SymbolTable would be good for other reasons as well. I hope it could be split out as a incremental precursor patch.

Right, the generic getAttr(StringRef) method is going to be more complicated.

I haven’t dug into the tradeoffs here, but one approach would be to add a function pointer to AbstractOperation, something like: ssize_t (*getAttributeIndex)(StringRef) which returns a non-negative index for inline attributes, or -1 for unknown/out of line ones. This function could be synthesized by ODS to use a simple comparison in the case where there are a small number of attributes (e.g. ConstantOp) and could go all the way up to perfect hashing + StringRef compare (with the corresponding entry in AbstractOperation::attributeNames) for a theoretical op with thousands of attributes. StringRef compares are reasonably fast because they carry the length which helps filter out many of the strcmps in the negative comparison case.

Completely unrelatedly to this topic, I really wish we didn’t use Identifier for keys of DictionaryAttr or in general in MLIR. It is redundant with StringAttr, various things need to be converted back and forth between the two of them, and doing so causes the same string data to be uniqued twice. Introduction of Identifier predated building out the attribute system.

-Chris

2 Likes