Calyx talk and next steps

Thanks to everyone who could make my talk on Calyx today. Relevant links:

Next Steps

I apologize for the lack of reference links. Discourse limits new user posts to 2 links.

The document references the source code documentation. When you see something like calyx::passes::Papercut, you can navigate to it through that link by clicking passes > Papercut.

While the design is fresh in everyone’s mind, I also want to get started on steps to integrate Calyx into the CIRCT ecosystem. A quick pitch for why we should integrate Calyx:

  1. Calyx already has a frontend for Dahlia which is an imperative language. This means Calyx (probably) already supports enough things to enable compilation from NPComp.
  2. Calyx supports mixed latency-insensitive & latency-sensitive compilation. This has the potential to inform the design of StaticLogic and Handshake.
  3. Calyx already lowers to synthesizable RTL and is relatively well tested.
  4. Calyx implements a bunch of useful optimizations (calyx::passes) and analyses (calyx::analysis).
  5. Calyx has a systolic arrays frontend. This might not have any short tem importance beyond demonstrating that we can represent a variety of architectures in Calyx.

CIRCT-Calyx

V1

The fastest route to a useful MVP would be enabling generation of Calyx from CIRCT. This can either be done by defining the Calyx AST as a dialect, or by directly exposing Calyx’s IR builder (calyx::ir::Builder).

Once CIRCT can emit Calyx code, the Calyx compiler can take it all the way to RTL.

V2

A followup PR can work on implementing the lowering step from purely structural Calyx (no groups or control) down to the RTL dialect. This will enable Calyx to feed code back into the tools below it.

V3

Port lowering passes to CIRCT. This would enable CIRCT-Calyx to go all the way to RTL without leaving CIRCT. The core lowering pass (calyx::ir::TopDownCompileControl) is pretty short and the tested using the existing frontends.

A Calyx Dialect

Calyx programs are structured as components, each of of which contains cells, wires, and control. The interesting representation choices are for groups (contained within wires) and the control sub-language.

Groups

Groups are Calyx’s main form of structural abstraction. They give a name to a set of assignments which can then be used by the control sub-language to define a schedule:

group incr_count {
  reg.in = add.out;
  add.left = reg.out;
  add.right = 1'd1;
  reg.write_en = 1'd1;
  incr_count[done] = reg.done;
}

At this point, I’m not quite sure what MLIR concept should be used to represent groups. Graph regions are a tempting choice but I don’t fully understand what the trade-off space is here.

@clattner had some thoughts about groups relating to basic blocks. Maybe we can flesh this out in more detail here with examples.

Control Sub-language

The control language is used to define the execution schedule for a group. Currently, it has the following constructs (cn are control programs,):

  1. seq { c1; c2; c3 ... }: Execute in sequence. Done when the last control statement is done.
  2. par { c1; c2; c3 ... }: Execute all in parallel. Done when all control statements have executed once.
  3. if <port> with <group> { c1 } else { c2 }: After executing group, use the value on port to either run c1 or c2.
  4. while <port> with <group> { c1 }: After executing group, use the value on port to either run c1 or exit loop.
  5. invoke <cell>(in1 = p1, in2 = p2)(out1 = p3, out = p4): “Call” cell with input-output port mapping and return when the done signal on cell is high.

One consideration for future Calyx development: We’d like the control language to be easily extensible so that we can add more operations. One idea we’ve been playing with is “level-2” operations which are operations that can be represented using other, more primitive operators. pipeline is an example of such an operation. A pipeline is “just”:

while <ready> {
  par {
    if <ready> { s0 };
    if s0.valid { s1 }; 
    if s1.valid { s2 }; 
  }
}

The additional structure of a pipeline operator let’s us implement specific optimizations. Once lowered, the rest of the Calyx compiler can use all of its other optimizations.

Go/done interface

The go/done interface I showcased in the talk is pervasive: For example, each Calyx component gets additional ports to represent the go/done signals (this also makes them invoke-able).

@stephenneuendorffer and other mentioned that this interface can be limiting (and I agree). However, for an MVP, I want to understand if we can’t build upon this interface in the future. If there is possibility for extension to support more paradigms, I’d propose merging this interface first and working on extensions in the future.

More Information on Calyx

We’ve attempted to document a lot of the source code in the Calyx compiler. Plus, we have documentation on getting started:

https://capra.cs.cornell.edu/calyx/

Finally, if you’re looking for examples of Calyx code, I’d recommend the test suite in the repository root (we use expect tests and have checked in the golden files).

Getting Involved

I work on graduate student hours which means I often get submerged with deadlines. I’d love to get other people involved. I can see a few places where people can help:

  1. Helping with the design of the Calyx dialect.
  2. Building frontends for Calyx
  3. Playing with the compiler and breaking it in fun ways (if you do, please file an issue!)

How to integrate Calyx is an interesting and hard question. I think the first question is do you want to make it “part of” CIRCT or “use” CIRCT? The former is going to be hard. I don’t want to pick up a rust dependency and I doubt you’re interested in rewriting everything.

Using circt (either as a frontend or a backend) is simpler, and I agree that starting with defining an MLIR dialect that corresponds with it would be a good way to go. The ESI dialect is a reasonable example to look at since it is roughly the same level of abstraction here.

-Chris

Thank you for the talk and the detailed proposal. I think the steps V1, V2, and V3 you laid out make sense. In terms of Chris’ question, I think of those steps as “using” CIRCT as a frontend, “using” CIRCT as a backend, and finally becoming “part of” CIRCT. Is that fair?

I have one question about Calyx using CIRCT as a frontend. If a Calyx dialect existed, what dialect(s) and operations would convert into Calyx?

I think the path in the other direction is clear from what you laid out above. This is the direction I am most interested in, since that would enable using CIRCT’s tools for RTL optimization, System Verilog emission, simulation, etc.

I’m happy to get involved, so feel free to tag me on any issues or pull requests.

Thanks for the comments @mikeurbach and @clattner. I was going to respond to Chris’s question in the same way Mike did :).

Initially, my hope is that we can be a “user” and “consumer” of the CIRCT. I was thinking we can start off with a Calyx dialect that can be generated by higher-level dialects, perhaps looping into the NPComp -> RTL effort, and provide an alternative path to RTL that the NPComp -> Handshake/StaticLogic -> RTL route can be tested against.

If this proves to be fruitful, we can start thinking about making Calyx a part of the MLIR infrastructure. While the Calyx compiler already has a lot of complexity, I don’t think an MLIR rewrite is inconceivable. As of today, it is ~8,000 SLOC and we’ve rewritten it a few times and gotten good at at porting it. Regardless, that remains a discussion for future.

Personally, I’d love to see this implemented in CIRCT. In particular, it seems to me that there is alot of overlap between Calyx and what I’ve been calling the “FSM + Datapath” model that we use to represent statically scheduled HLS implementations. I’d be very interesting in targetting Calyx from higher level dialects, under the assumption that there is some path to generate RTL from there. One of the big questions I have in my mind is whether this becomes a standalone dialect, or whether it is a composition of a specialized ‘FSM’ dialect and the RTL dialect.

Coming back to this thread, because the switch statement thread popped up in the MLIR forums and the switch statement RFC got me thinking about Calyx and “FSM + datapath” modelling in general.

How much of the Calyx control sub-language overlaps with the MLIR SCF dialect?

I don’t think this is a 1:1 mapping, but the following seem pretty similar

seq <> scf.for
par <> scf.parallel
if <> scf.if
while <> scf.while

I think there is also a loose correspondence between

invoke <> std.call

which isn’t part of the SCF dialect, but is also already upstream in MLIR.

Anyway, I had this thought and wanted to share it. I’ve been focused on the handshaking side of things, but it would be great if we could reuse some of the SCF dialect on the statically scheduled side of things.

Ah, thanks for the pointer Mike! Calyx is designed to be relatively close to the abstractions in imperative languages (like Dahlia) so this isn’t too surprising.

invoke does seem like it corresponds to std.call. However, Calyx’s invoke is very restrictive right now. It doesn’t assume anything about how long after a “call” the return values are available. For example:

x := foo(10);
y := foo(20);

If foo refers to the same hardware instance, we have to careful about the availability of the values returned from foo. Right now, Calyx assumes that the frontend language manages this encoding. We’re looking into a stronger interface so that Calyx generated from different language can invoke each other’s components.

I think attempting to share the same primitives in staticlogic is a good idea. Calyx’s "static" annotation just tells the compiler to generate statically scheduled FSMs when the control operator is marked:

par {
  @static(10) seq { ... } // <- generates static logic 
  seq { ... } // <- dynamically scheduled
}