Improving the efficiency of Operation::getOperand

Diving into the third item in this list of things that can be done to improve compiler performance, I’d like to discuss improving the efficiency of the getOperand calls.

For background, Operation is currently defined like this:

class alignas(8) Operation final
    : public llvm::ilist_node_with_parent<Operation, Block>,
      private llvm::TrailingObjects<Operation, BlockOperand, Region,

The operands of an operation are stored as a consecutive array of OpOperand values, which are allocated as part of the aptly named OperandStorage object. The consequence of this is that very common calls like op->getOperand(0) or op.lhs() has to load the # block operands, # regions, and do a bunch of math to get the OpOperand and the Value’s in question. It may also surprise you that things like: Value lhs = thing.lhs(), rhs = thing.rhs(); does all this arithmetic twice: it can’t be CSEd because it is large and out-of-line.

It is inherent to the design of Operation that some thing will need this arithmetic, but BlockOperand and Regions are much less common and commonly accessed than operands, so I’d like to see operands come first.

The challenge with this is that detail::OperandStorage itself has trailing objects:

/// This class handles the management of operation operands. Operands are
/// stored either in a trailing array, or a dynamically resizable vector.
class OperandStorage final
    : private llvm::TrailingObjects<OperandStorage, OpOperand> {
  /// The current representation of the storage. This is either a
  /// InlineOperandStorage, or a pointer to a InlineOperandStorage.
  union {
    TrailingOperandStorage inlineStorage;
    llvm::PointerIntPair<TrailingOperandStorage *, 1, bool,

I think that can be addressed by refactoring this to flatten the operand list into Operation itself, something akin to:

class alignas(8) Operation final
    : public llvm::ilist_node_with_parent<Operation, Block>,
      private llvm::TrailingObjects<Operation, OpOperand, BlockOperand, Region>

And then give Operation a pointer to OpOperand for the out-of-line case, and have separate “number of inline operands” vs “number of operands” integers. Does this seem like a sensible approach to this?



+1 Chris!

This seems like a big efficiency win.

Every time I’ve attempted this, the result has been negligible change to performance in all of the benchmarks I have, but an increase in the size of every operation (not ideal). I can upload my patch for others to benchmark with, and if it has a worthwhile performance improvement I’d be on board. If the performance isn’t a decent improvement though, I’d rather keep Operation smaller.

– River

1 Like

Why would it increase the size of Operation?

Right now we don’t track the original number of operands, because we don’t have to (the operand storage is at the end of the trailing objects, so need for additional book keeping). If we move the operands to earlier in the trailing objects list there will need to be some kind of additional field:

^^ e.g. adding those will increase the size of Operation, as right now the only cost of the operand storage on the actual Operation class is 1-bit (“do you have an operand storage?”).

⚙ D111695 [mlir] Move the Operation OperandStorage to the first trailing object is the revision that I’ve been playing around with.

– River