MLIR for RVSDG

Some of us at Mathworks are just starting to look at MLIR as a vehicle for a RVSDG prototype. Wondering if anyone else is doing the same.

A recent work mentioned MLIR but I think It is not a MLIR related impl:

A previous poster by the same Team has the Google logo but I don’t see any Google authors so I don’t know if was just co-sponsored by some Google program:

Good question, I’m really not sure why the Google logo is there… Maybe a faculty grant or internship of one of the authors?

I talked a lot with Nico (and also Magnus) in the past (HiPEAC events mostly), and see a lot of value in RVSDG. I am not aware of any plans in this direction, but would be interested in learning more about your plans and motivations. Note that some of the motivations of RVSDG are can be partly addressed using nested regions in MLIR, although the latter generally isolate regions as far as CFG and SSA-based analyses are concerned. Some of the current limitations of the loop/affine dialect, like returning a value from a loop, may be better addressed using a design like RVSDG. I’m not sure how you will deal with the continuation-passing style block arguments however. No avalanche of greek letters in MLIR please :slight_smile:

Albert

1 Like

I suppose that Helge Bahmann was in the Google Research Zurich facility untill Aug 2019:

1 Like

You ask of my motivation so just a bit of history.

I lead the compiler group at Apollo Computer and built the backend for Apollo’s response to Sun’s SPARC, the DN10K for which I had designed the ISA.

Without realizing so at the time (because the ideas were just beginning to appear in the literature) I stumbled into a representation of dataflow information that was morally equivalent to persistent SSA. Enough so that I later found it quite easy to implement Wegmann and Zadeck’s Constant Propagation with Conditional Branches. Once I realized that we were actually commercializing SSA I invited Ken Zadeck - who, while at Rice, had been a classmate of one of my team members - to come visit us.

Initially I designed my dataflow representation to model only values that would be managed by the register allocator. Operators in my IR were exactly the DN10K’s instructions. I had included addressing modes with base register post-modification in the ISA. Therefore my dataflow representation modeled read-modify-write behavior whereby an input port could be marked as the mandatory last-use of a value and then bound to an output port with which it had to share storage. (When Ken visited my group he expressed distaste for this coupling of values and storage as it violated SSA orthodoxy.) Here to I did not realize at that time what I had devised but three decades later I have come to understand that it was effectively the State edges the the VSDG.

By serendipity I hired Bill Ackerman from the MIT dataflow community which, by that time, was falling apart. I assigned Bill the task of writing the instruction scheduler. He immediately discovered that, while I had made register dependencies trivially available in the IR, memory, condition codes, the cross-processor lock token, etc. were not. It was at that point that he shared what proved to be possibly the most formative idea in my compiler career, namely that there is no such thing as a side-effect or action at a distance, there is only unmodeled dataflow. If you model all dataflow, even if you have to do so conservatively, then the rest of the system can work reliably with operators and dataflow edges as the entire representation of the program’s semantics. Net, we re-used the read-modify-write framework mentioned above to model all of our missing dependencies. Subsequent experience bore out the truth of Bill’s claims and made me a believer.

With that background it should not be surprising that I feel great affinity for the (R)VSDG notions. I had been looking for a while for a way to eliminate notions of control flow. Addition to the VSDG of the region notion (not to be confused with MLIR’s region) completed the picture for me, giving me something I felt I could actually build and use.

When, as Bill Ackerman preached, all operators are pure functions one does not need control flow; dataflow already accounts for all semantics. With an adequately abstracted/parameterize model of operators and ports one can regard an RVSDG region as simply an ad hoc operator with a collection of input ports and output ports. Any code that deals with the graph only syntactically treats a region as no different from any other operator.

1 Like

This is very interesting! I’d love to learn more (might even be a good topic for the open design meeting … I might be only thinking of my own intellectual curiosity here but if already being pursued then I’d say a case could be made for being appropriate) and follow along.

This is also good to push the modelling capabilities of MLIR (see what works, where extensions are needed - maybe it just works :wink: ).

Some extra info: Numba community has begun to use RVSDG to reconstruct their compiler frontend, e.g., GitHub - numba/numba-rvsdg: Numba compatible RVSDG (Regionalized Value State Dependence Graph) utilities.. And the community seems to formally decide to switch to this new IR for the next release.