LLVM Discussion Forums

[RFC] Debug Actions in MLIR: Debug Counters for the Modern World

Hi all,

Recently I’ve been wanting to use LLVM’s DebugCounter (the LLVM equivalent of “Optimization Fuel”) infrastructure in MLIR, but it isn’t possible due to a few limitations (e.g. the use of global constructors which aren’t allowed in MLIR). This, combined with a few other desires, led to the idea of building a “Debug Action” framework in MLIR. This is a DEBUG only (i.e. no cost in Release) API that would allow for controlling various aspects of compiler execution. More details below:

Debug Action

A debug action is essentially a marker for a type of action that may be performed within the compiler. There are no constraints on the granularity of an “action”, it could be as simple as “perform this fold” and as complex as “run this pass pipeline”. An action is comprised of the following:

  • Tag: A unique string identifier, similar to a command line flag or DEBUG_TYPE.
  • Description: A short description of what the action represents.
  • Parameter Types: The types of values that are passed to queries related to this action, to help guide decisions.
/// A debug action that allows for controlling the application of patterns.
/// A new action type can be defined by inheriting from `DebugAction`.
/// The parameters for the action are provided as template arguments
/// when inheriting from `DebugAction`.
/// The tag and description are specified via static `getTag` and
/// `getDescription` methods.
struct ApplyPatternAction : public DebugAction<Operation &, const RewritePattern &> {
  static StringRef getTag() { return "dialect-conversion-apply-pattern"; }
  static StringRef getDescription() {
    return "Control the application of patterns within dialect conversion";
  }
};

Debug Action Manager

The DebugActionManager orchestrates various different queries in relation to debug actions, and is accessible via the MLIRContext. These queries are the injection point for external entities to control various aspects of compiler execution. The set of initial queries are shown below:

class DebugActionManager {
public:
  /// Returns true if the given action type should be executed, false otherwise.
  /// `Params` correspond to any action specific parameters which may be used to 
  /// guide the decision.
  template <typename ActionType, typename... Params>
  bool shouldExecute(Params &&... params);
};

Building on the example from the previous section, the following query may be used:

/// A debug action that allows for controlling the application of patterns.
struct ApplyPatternAction : public DebugAction<Operation &, const RewritePattern &> {
  static StringRef getTag() { return "dialect-conversion-apply-pattern"; }
  static StringRef getDescription() {
    return "Control the application of patterns within dialect conversion";
  }
};

…

bool shouldApplyPattern(Operation *currentOp, const RewritePattern *currentPattern) {
  MLIRContext *context = currentOp->getContext();
  DebugActionManager &manager = context->getDebugActionManager();

  // Query the action manager to see if `currentPattern` should be applied to the
  // given `currentOp`.
  return manager.shouldExecute<ApplyPatternAction>(*currentOp, *currentPattern);
}

[

For the purposes of simplicity for this RFC, I’ve kept the number of initial actions limited to the simplest; “shouldExecute”. I can already envision other actions that could be useful, such as a “shouldUndo/Revert”, but I’ll keep those separate from here for now to focus more on the overall structure of things.

]

Debug Action Handlers

A debug action handler provides the internal implementation for the various action queries within the DebugActionManager. Action handlers allow for external entities to control and inject external information into the compiler. Handlers can be registered with the DebugActionManager using registerActionHandler. There are two types of handlers; action-specific handlers and generic handlers.

Action Specific Handlers

Action specific handlers handle a specific action type, and the parameters to its query methods map 1-1 to the types on the action type. An action specific handler can be defined by inheriting from the base class defined at ActionType::Handler where ActionType is the specific action that should be handled. An example using our running pattern example is shown below:

struct MyPatternHandler : public ApplyPatternAction::Handler {
  /// A variant of `shouldExecute` shown in the `DebugActionManager` class above.
  /// This method returns a FailureOr<bool>, where failure signifies that the 
  /// action was not handled (allowing for other handlers to process it), or the 
  /// boolean true/false signifying if the action should execute or not.
  virtual FailureOr<bool> shouldExecute(Operation &op,
                                        const RewritePattern &pattern);
};

Generic Handlers

A generic handler allows for handling any action type. These types of handlers are useful for implementing general functionality that doesn’t necessarily need to interpret the exact action parameters, or can rely on an external interpreter (such as the user). As these handlers are generic, they take a set of opaque parameters that try to map the context of the action type in a generic way. A generic handler can be defined by inheriting from DebugActionManager::GenericHandler. An example is shown below:

struct MyHandler : public DebugActionManager::GenericHandler {
  /// The return type of this method functions exactly the same as the 
  /// action-specific handler. The parameters to this method map the concepts
  /// of an action type in an opaque way. These are the tag and description of the
  /// action, as well as the action parameters formatted as string values. These
  /// parameters are provided in such a way so that the context of the action
  /// is still somewhat user readable, or at least loggable as such. 
  virtual FailureOr<bool> shouldExecute(StringRef actionTag, StringRef actionDesc,
                                        ArrayRef<StringRef> actionParameters);
};

Usages

With some of the finer details out of the way, then comes what it could/would/will be used for.

  • DebugCounter(“Optimization Fuel”) equivalent
    • With the above it is possible to define a generic handler that implements DebugCounters as known and used in LLVM. We could provide the exact same API for familiarity and simplicity.
  • Opt-Bisect
    • While essentially equivalent to DebugCounters in terms of functionality requirements, the only thing that isn’t present or designed is how to tell if a pass is an “optimization” pass. Given that queries on debug actions can provide parameters, this is more of a design question and not an infrastructure question (which is what is being proposed/discussed here).
  • Interactive Compiler Debugging
    • As a toy for myself when debugging dialect conversion, I implemented an interactive action handler that allowed for selectively applying patterns based on user input. If there is wide enough desire for something like this, it could be built out into a proper option that is in-tree as it contains nothing conversion specific.
  • Your Thing Here
    • The point of all of this is to have something like DebugCounters, but not limited to debug counters. Ideally, there are plenty of interesting ways that we could control compiler behavior that could generalize well.

Thoughts?

– River

2 Likes

I don’t quite grok how the proposal gets to DebugCounters. Or maybe this is just the mechanism by which passes are dynamically enabled/disabled? My sense is that this still isn’t going to be very ‘interactive’. Maybe the trick is that in these cases it doesn’t make sense to have a static list of passes which is executed unconditionally, instead, the pass manager has an algorithm by which it can dynamically decide what pass to apply next. This would subsume your use case and also subsume other ‘dynamic’ mechanisms to select a pass to run, I think. As for the interactive side, I think this is where a scripting interface (like python) really shines. We’ve linked something like ‘opt’ into a scripting language and it makes it relatively easy to dynamically control the compiler, isolate breakages, script debug flows, etc. And our environment didn’t even bother to expose access to the IR.

Debug counters can work at a much much finer granularity than passes, see https://llvm.org/docs/ProgrammersManual.html#adding-debug-counters-to-aid-in-debugging-your-code. For things like debug counters you often need to be able to inject control directly within the middle of a pass, or even a utility. In LLVM, it is very common to selectively disable very specific transformations within a pass. For example, GVN can control which instructions get numbered, CSE can control which instructions get erased, etc. This piece of functionality is mostly orthogonal to the pass manager. The pass manager could opt-in to use it, but I would not use that as the driving source.

This proposal is focused more on the underlying infra the an MLIR DebugCounters would use. With what I’ve proposed above, DebugCounters in MLIR is simply a generic handler of debug actions.

As noted above, my use case does not apply to passes at all. That level of granularity is much higher than I want to target personally. Another example is the inliner, often when debugging a crash I want to selectively inline specific functions until the bug I’m tracking down is tickled. If my granularity is “run the inliner or not”, that is already achievable for the most part by the crash reproducer in the pass manager.

I’ve done something a bit similar, but the problem with targeting “opt” as a whole is that you are stuck with the level of access and granularity that it gives you. Which is fine for many things, but when you are tracking down something within a single passes(or set of dependent passes) that quickly breaks down. Recently when debugging dialect conversion I wanted to step debug the decision making process to track down how a bug manifested, which is something not conceivably possible with what is available today.

– River

OK, I better understand what you’re suggesting now. Is there a reason it has to be ‘debug only’? It seems to me like other aspects of reporting and statistics could leverage this too, and most passes would have to be architected to support it. Maybe instead of ‘DebugAction’ just call it “PassAction” and fold the DebugActionManager into the PassManager? Basically, this inverts the control instead of the PassManager executing things top-down, the transformations are constantly querying the PassManager to say “I have this thing I want to do, should I”? My sense is that maybe the action object should passed back to the PassManager, representing a ‘unit of work’. Perhaps there should also be a concept that some actions depend on some other actions?

I see the Pass / PassManager as a fairly independent orchestrating layer from most of the actual transformation and infrastructure. There is little code in the MLIR libraries that needs to be aware or have any knowledge of the surrounding Pass/PassManager.
Inversion of control is interesting here, but that seems like a major change in the design of the orchestration layer if it breaks the “pass” model?

+1 to getting debug-counter/opt-bisect functionality.

Seems like a nice modern way of doing it, with interesting future possibilities!

+1 on Interactive Compiler Debugging. This will be an incredibly valuable tool for debugging rewriting!

As a fan of clang-query, such tools have helped in both debugging and development, so definitely seems useful.

It also seems useful for testing policy layers (e.g., only constant fold if X), but would not be usable to implement policies I believe. Do you see this as something that could evolve to implementing such policies too? Or would that be a different framework that this could also hook in to?

I think it really depends on how we want to evolve it. One could look at the set of queries for debugging as a certain “policy”, with the handlers being a particular implementation of the policy(with some potential filtering). Right now my main focus is on improving the debuggability of the compiler. There is often friction between something intended for debugging, and something that is intended to be run consistently even in release(e.g. optimization policies). While “policies” as a general mechanism can support both, the desired behaviors are often very different. For example, an optimization policy may be applied at a high level, where as a debugging policy may be applied within the very inner loop of a transformation. In release mode, we don’t want to incur the cost of the framework in such a performance critical part of the transformation. After the debugging infra is in place, we can take a look at how this might generalize into something usable for optimization policies. In the worst case of where we end up with different user facing mechanism, I would be surprised if we couldn’t make the underlying mechanisms share a majority(all) of the necessary code.

(Little rambly there, but hope that makes sense)
– River