Speeding up canonicalize?

Hi Chris - thanks for continuing to push on this! It will be nice to get this cleaned up and more performant. On the Google side, I think we’ll be a bit better off the second time around :slight_smile: and thanks for checking.

I don’t have any objections, but maybe leave a weekday or two comment period so folks who are tracking head closely can patch/check/prepare?

Yeah definitely. This patch shouldn’t be scary at all. I’ll investigate turning on just the constant prepass first without changing the traversal order, then see about changing canonicalize traversal order. I won’t touch the myriad other clients of applyPatternsAndFoldGreedily, but the corresponding owners of any of them can adopt top down if it makes sense.

In any case, I’ll communicate any plans very vocally and provide time for others to kick the tires. I’d like to get this basic patch in soonish, but the adoption patches don’t have any rush to them.

-Chris

Hi all,

Coming back to this, I’d like to get consensus on what to do next. To recap where we are on this:

  1. the code has been landed in trunk, but is opt-in and no-one (outside the CIRCT project) opts into it yet.

  2. There are two different changes proposed as part of this work:

    2a) the first is a “constant prepass” which collects obvious constants and puts them in a table for reuse and does an initial pass of folding obvious constants. This adds an additional pass over the
    2b) the second is the change to seed the worklist in a “top down” order instead of “bottom up” order within each block.

  3. We discovered upthread that there are multiple different use-cases for the GreedyPatternRewriter and that Canonicalize is sort of a special case that should be treated differently.

  4. I didn’t have amazingly great data because I did spotty evaluations while building the code instead of being a scientist about it. I am still more of an engineer than a scientist, but I at least now have good data (below).

  5. The initial patch caused a bunch of out-of-tree pain and suffering that none of us want to repeat.

Let me jump to the data. For my experiments I’m using three different testcases: “B” which is a real world 47M fir file, “M” which is a real world 270M fir file, and “test2” which is a machine generated 27M fir file. The firtool program in CIRCT runs two passes of canonicalize, one on the FIRRTL dialect, and one on the lower level HW/Comb/SV dialect representation. In order to reasonably stabilize and isolate other effects, I am reporting wall time with multithreading and verifier disabled, specifically -mlir-timing -verify-each=false --mlir-disable-threading.

I tested all of “current canonicalize”, “canonicalize with just the constant preprocessing enabled”, “canonicalize with top down worklist seeding + constant preprocessing” and “canonicalize with just top down but no ConstantPP”. The data looks like this:

Original BU+ConstantPP TD+ConstantPP TD/noConstantPP
M FIRRTL 2.92 2.88 2.58 2.46
M HWCombSV 1.53 1.52 1.19 1.14
B FIRRTL 0.24 0.21 0.21 0.22
B HWCombSV 0.39 0.35 0.32 0.32
Test2 FIRRTL 3.05 3.09 2.85 2.76
Test2 HWCombSV 122.85 3.69 1.4 1.29

TL;DR: at least for these two dialects, just the top-down processing without the extra constant folding pass is the best. However, the ConstantPP pass is useful for fixing an egregious algorithmic issue when you have a lot of unfolded constants and are using the bottom up pass. This makes logical sense and I think this should generalize to other dialects.

There are several things we could do w.r.t. the GreedyPatternRewriter and the Canonicalize pass. For example, we could:

  • :stop_sign:Force all clients to use the TD + Constant PP approach: Verdict: no reason to do this, proven to be a bad idea.
  • :white_check_mark: Check the functionality into GreedyPatternRewrite and let general clients choose whether they want TD or BU. Force ConstantPP on when TD is on. (This is current state in trunk).
  • We could expand the set of boolean flags passed to applyPatternsAndFoldGreedily to be a struct and decouple these two settings so clients can choose one or the other. We could then make RegionDCE optional as well, allowing me to delete the copy of the canonicalize pass in CIRCT.
  • We could delete the ConstantPP stuff as “complexity that is not worth it”.
  • We could default Canonicalize to using TopDown-without ConstantPP to get an across the board 20-30% win (also replicated by IREE folks in previous discussions)

Thoughts and opinions?

-Chris

I guess I should give my opinion. I think we should do those bottom three bullets:

  • We should expand the set of boolean flags passed to applyPatternsAndFoldGreedily to be a struct and decouple these two settings so clients can choose one or the other. We could then make RegionDCE optional as well, allowing me to delete the copy of the canonicalize pass in CIRCT.
  • We should delete the ConstantPP stuff as “complexity that is not worth it”.
  • We should default Canonicalize to using TopDown-without ConstantPP to get an across the board 20-30% win (also replicated by IREE folks in previous discussions)

I’m also happy to do the work as I have time if people agree.

-Chris

2 Likes

These all sound reasonable to me.

Can we have a Config struct instead of more booleans?

– River

Yes, that’s what I meant. This patch takes care of the first two bullets without changing the behavior of canonicalize.

PTAL when you have a chance, thanks!

-Chris

Ok, that patch landed. Here is the switch to change the behavior of canonicalize. Note: this is the big scary one.

I’d love it if people who have large out of tree codebases could kick the tires and let me know if it is a concern. The major change to this is that it permutes the order of constants at the top of the block, which is easy to address by changing things into a CHECK-DAG (as you can see in the patch). You can also change a test case to pass -pass-pipeline='func(canonicalize{top-down=false})' on the mlir-opt command line if you don’t want to update the tests.

Unless the previous iteration of this work, this only affects the canonicalize pass. Please let me know if you have any concerns. Once this lands, I plan to walk away from canonicalize for a good long time :slight_smile:

-Chris

Thanks for the heads up. I’ll try running this on all of the internal google projects and see how bad it gets.

– River

With your patch I only see about 6 internal google failures, and all seem fairly easy to fix. LGTM from my perspective.

– River

Thank you for checking River! I’ll plan to land this sometime tomorrow unless there is an objection,

-Chris

I landed this just now, and don’t plan any more changes to canonicalize in the near future. Thank you all for the support.

-Chris