Duplicate successors in LLVM dialect

Recent patches attempting to introduce the equivalents of indirectbr and switch LLVM IR instructions into the LLVM dialect highlighted a design choice in the LLVM dialect that we had postponed: how should we deal with duplicate successors in the LLVM dialect?

Unlike LLVM IR, MLIR does not have explicit PHI nodes. Instead, it uses basic block arguments. The latter allow a terminator to list the same successor more than once with different arguments.

^bb0:
  cond_br %cond, ^bb1[%0: i64], ^bb1[%1: i64]
^bb1(%arg0: i64):    // preds: ^bb0

This is not directly expressible in the LLVM IR using phi instruction since it uses block labels of the predecessor to specify to which value in the block it refers. (phi i64 [%2, bb1], [%3, bb1] is not valid. Same applies to other LLVM IR terminators with more than one successor.

Currently, we disallow conditional branches in the LLVM dialect to have identical successors. The reason is partly legacy (this was implemented before we supported custom terminator operations) and partly the overarching idea to keep the translation from the LLVM dialect to LLVM IR as trivial as possible. Furthermore, one can argue that the LLVM dialect should not support cases that are not supported by LLVM IR.

The problematic situation above is currently resolved in the lowering from the standard to the LLVM dialect by introducing an additional block that can be used in PHI to differentiate predecessors.

^bb0:
  cond_br %cond ^bb1[%0: i64], ^bb2[%1: i64]
^bb2(%arg0: i64):
  br ^bb1[%arg0: i64]
^bb1(%arg1: i64): // preds ^bb0, ^bb1

An alternative lowering could be to use select and unconditionalize the branch.

^bb0:
  %2 = select %cond, %0, %1 : i64
  br ^bb1[%2: i64]

Note that both solutions should be seen as relying on the semantics of the operation being transformed, cond_br in this case: they need to know that block arguments are forwarded without modifications to the blocks. This will get more important after the special treatment of successor operands is removed.


If we follow the same reasoning as above, we should also disallow repeated successors with identical arguments for indirectbr, switch, callbr and potentially catchswitch operations.

Alternatively, we may decide to allow repeated successors with different arguments and perform the block duplication/select injection at a later stage. This embedding will fit better into the MLIR’s flavor of SSA, and is more concise. However, depending on the place of repeated-successor-removal legalization, may get (A) some constructs in the LLVM dialect are not convertible to LLVM IR (legalization necessary before calling the conversion); or (B) the translation from the LLVM dialect to LLVM IR becomes less trivial and may inject blocks or operations that were not present in the input.

My order of preference is (1) disallow constructs that are not expressible in the LLVM IR; (2) use (A); (3) use (B). The rationale for the first priority is that the LLVM dialect models LLVM IR closely and if we accept this divergence from the LLVM IR, we are opening the way to other divergences, which creates a danger of having multiple subtle mismatches that users will have to constantly keep in mind.

Any preferences from other interested parties?
Ping @clattner, @mehdi_amini, @River707 specifically.

It seems to me that disallowing constructs in the LLVM dialect which are not expressible in LLVMIR simply pushes the problem to higher level dialects. This means that the std->llvm conversion (and all other conversions to LLVM dialect) now need to handle this subtlety/complexity. My preference would be A), and providing an LLVM Dialect to LLVM Dialect transformation which performs successor deduplication separate from conversion to LLVMIR. One thing that I see here (which probably could be a more general concept) is that properties of composed operators can be important. Should this be a "sub-dialect’? The slippery slope here is that as these kinds of properties become less syntactic, they become harder to verify.

I agree with Stephen here: the LLVM dialect is not only supposed to model LLVM IR but also intend to facilitate any lowering from MLIR dialects to LLVM. Pushing the complexity from LLVM IR subtleties into higher level lowering is exactly what I expect the LLVM dialect to shield us away from!

We already deviate from LLVM IR to align with MLIR-style on other aspects (like symbolic references for globals for instance).

A “codegen prepare” kind of pass that “legalize” the LLVM dialect before export to LLVM is also my preference.

For switch, you may be able to define away duplicate successors by making switch “patterns” more sophisticated. For example, consider C code like this:

switch (i) {
case 0: case 1: case 2:
  block0();
  break;
case 3: case 4:
  block1();
  break;
}

Instead of giving switch three separate case-edges for 0, 1, and 2, you could give it a single case-edge with multiple values. That solves your higher-level representational issue while still being both easy to generate and easy to lower into LLVM IR. It would honestly be a better representation for LLVM IR.

indirectbr causes so many thorny representation issues for such an uncommonly-used feature that you should strongly consider just not supporting it in MLIR unless you feel it’s absolutely imperative.

I completely agree. Forcing this LLVM limitation on all MLIR producers would be annoying. An even bigger problem is that it means that the LLVM dialect won’t compose well with generic MLIR passes that aren’t aware of this limitation.

-Chris

Yep, this would make a lot of sense - reducing the outdegree of CFG nodes can improve compile time as well. MLIR attributes make this very easy to model too!

That said, this shouldn’t be a requirement of canonical form. You don’t want the IR to become invalid if something decides to merge two blocks who happen to be the successor of a switch.

It sounds like it already has to handle that for conditional branches, so that shouldn’t be a problem for switches. It really shouldn’t be a burden, it’s not like there’s a million different passes doing block merging.

If you absolutely need to support indirectbr, I would suggest “pre-splitting” it: just force all indirectbr successors to have a unique incoming edge, defining away the need to ever carry arguments on those edges. In general, pre-splitting can require introducing extra blocks if there are other ways to reach one of the branch destinations, but that doesn’t typically happen in indirectbr’s most important use case (threaded interpreters). It also forces the indirectbr to be unique in the function, but that’s also important here because otherwise you get a quadratic explosion of CFG edges.

In both cases, the key point is that it’s okay to use a stricter representation than LLVM IR, since the goal is to make it trivial to lower to LLVM IR rather than trivial to lower from LLVM IR. LLVM IR frequently makes the mistake of allowing unnecessarily general constructs that introduce major complexity in every pass that might see those constructs rather than accepting a small amount of complexity in the handful of passes that manipulate them.

Why? What problem are you solving here? The MLIR BB argument design (just like the SIL design) allows naturally modeling this. It is really important to keep the control flow representation orthogonal and eliminate special cases like this - the special cases manifest as complexity throughout the stack.

Part of the point of using basic block arguments as the fundamental inter-block data-flow mechanism (other than just dominance) is to make it much more straightforward to alter edge-specific data flow by e.g. changing the argument value and inserting instructions along the edge. Any kind of critical edge interferes with that by requiring explicit splitting, but indirectbr creates a completely new tier of problem because the edge is unsplittable without undermining the value of the indirectbr optimization — you have to insert explicit conditional branches on the indirectbr’s destination address value before the jump. And this is a completely artificial problem, because there’s nothing about how people actually want to use indirectbr in practice that depends on the indirectbr destination blocks being directly reachable through other control flow.

In case anyone in this thread is curious, the recent CGO talk includes a few slides talking about the downsides of LLVM-Style PHI Nodes and give examples of how invoke and other things are better modeled with bb arguments.

I agree, this is a good argument.

It’s a slippery slope. We don’t want to end up having the MLIR flavor of LLVM roughly reminiscent of LLVM IR. That should be a different dialect. I would argue that if there are some improvements, they should be made to LLVM IR and then reflected in MLIR, rather than made locally in MLIR. Unless these improvements require significant breakage in LLVM IR, like replacing phis with block arguments.

This reminds me of a recent discussion on a patch that introduced memory orderings for atomics in the LLVM dialect, and somebody was asking for the dialect to follow C++ standard names rather than LLVM spec names. We agreed that it should be done in LLVM IR.

The problem is at a lower level: should the follwing be valid IR in the LLVM dialect

llvm.switch %val, ^bb0(%0) [0, ^bb0(%1), 1, ^bb0(%2)]

regardless of what higher-level construct this might represent. Assume it is written by hand as input. We can certainly extend it to be

llvm.switch %val, ^bb0(%0) [any_of [0,1,42], ^bb0(%1), 56, ^bb0(%2)]

but it won’t solve the underlying validity issue.

We already do block duplication for conditional branches. The question isn’t whether we should do it but where in the pipeline.

We ultimately need to support all of LLVM IR. One of the ideas that folks circulated was to provide a “compatibility IRBuilder” that looks like LLVM’s but builds MLIR instead as means to progressively introduce MLIR abstractions into pipelines that use LLVM IR but could benefit from some higher-level abstractions.

Agreed with making it stricter + others on “prepare for export” phase.

Why? :slight_smile: And in which direction? And even if we do that doesn’t mean 1:1 though. I could see us doing a bit more extra work during translation to enable the same behavior but without requiring that we have exactly the same instructions. It is a trade-off yes. But one that could be considered once the modelling is cleaer.

Lets not have this potential future feature direct the design, else we end up just duplicating LLVM IR in the dialect as that is the simplest way to achieve this. Yes, the opposite side would be that one would need to do more work, but some of those refactorings might be useful in the current code too (e.g., perhaps having a library function that makes it easier to construct how it gets modeled in the dialect might also makes the current code easier to read).

Because multiple people have repeatedly asked me to support all of LLVM IR, e.g., to experiment with additional constructs around it (including regions, symbols that aren’t trivial on LLVM proper). In both directions.

A bit more – yes. The main question is the definition of “a bit”. (And also who is going to do the “more work” part). If I push this far enough, why bother having the LLVM dialect to start with? We can “solve all problems of LLVM” in, e.g., standard and go directly to LLVM IR from there.

The way I view this: the LLVM dialect greatly reduces the complexity of translating std to LLVM IR - making it a two step process. Now, on your way from the std dialect to LLVM IR, you have the choice of eliminating the duplicate successors either during std-> llvm dialect conversion (the way it is now) or post conversion on the LLVM dialect. But doing the former would mean that ‘the duplicate successor elimination code’ isn’t available (easily) to LLVM dialect generators or transforms that need to ensure distinct successors. Is there a use case for generating the LLVM dialect other than the conversion from std dialect? Transformations on the LLVM dialect are probably more effectively done / subsumed by transformations available on the LLVM IR itself?

It’s fundamentally a question of priorities. If you want MLIR to be a better IR, I would encourage you to structurally define away critical edges. If you want MLIR to be able to faithfully embed LLVM IR exactly as it would appear in LLVM, except using basic-block arguments instead of phis, then you don’t really have a choice. These two goals are in tension.

As a frontend author, I think it would be quite straightforward to make a few tweaks to Clang/Swift IRGen to never generate critical edges. It’s not going to be a drop-in replacement anyway, and I think it would be unwise to try.

I don’t think we are not talking about “improvements” necessarily: we trying to smooth aways the difference between what is natural in MLIR vs LLVM IR. If a change can be made in LLVM IR, sure, but that is also much higher bar.
The important aspects to me to keep are:

  • we want to make it as easy as possible to target LLVM (lower to LLVM dialect) from other dialects / frontends. Ideally the LLVM dialect is as composable as possible with the rest of MLIR (using the LLVM context / types is a problem right now for example).
  • we can export to LLVM IR fairly easily (if necessary with a few lowering passes, like proposed here).
  • we can import from LLVM IR reasonably easily (which is supporting the 1-1 aspect of the construct in general)

Any deviation that makes it more usable in MLIR (the C++ naming you mention isn’t), but supports these properties should be OK.

I don’t think it is not easily available. It suffices to call a function, https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h#L70.

There are use cases, but none of those I’m aware of generate control flow directly in the LLVM dialect.

Provided you can express what you want in LLVM IR, which you cannot in case of repeated successors. But yes, so far we did not attempt to transform anything in the LLVM dialect.

MLIR != LLVM dialect. If we want a better IR, we can define it regardless of LLVM choices :wink:

I don’t disagree with this. What is important is the boundary where we consider it smoothing away the difference vs. new feature / “improvement”. The criteria of “usable” and “natural” are subjective. I would prefer some quantifiable metric.

There is such a metric for this specific case. If we want to remove duplicate successor before going to the LLVM dialect, we will have to do that for all possible terminators with successors that are lowered to the LLVM dialect. If we do it within the LLVM dialect, we only ever need to care about LLVM dialect terminators. In the long run, we expect there to be more non-LLVM terminators than LLVM ones.

Well, not entirely. The base MLIR spec dictates rules of dominance that prohibit classic phis regardless of dialect. If the base MLIR spec banned critical edges, which I am strongly strongly encouraging you to do, then that would similarly limit all dialects. Conversely, if you want LLVM dialect to be able to represent critical edges, you will not be able to ban them in base MLIR.