[RFC] Representing pipelined loops

I think the proposed representation captures the nature of a pipeline really well!

How would you proceed with the lowering once the stages are formed? I‘m a bit concerned that scheduling with 1-latencies to form the stages is prone to bundling unrelated operations with potentially very different runtime characteristics together, such as the simple addi and memory loads in your examples. Similarly problematic would be e.g. a chain of simple combinatorial ops, all in their own stages. A naive 1:1 mapping of “high-level” to Calyx stages could result in inefficient/imbalanced pipeline implementations, I think.

My second concern is that encoding the dependences based on the order of the stages makes it inconvenient to (re-)construct a cycle-by-cycle scheduling problem later in the flow. For that, we’d ideally still have the original SSA graph and the additional memory and control dependencies in the IR, i.e. basically an old-school CDFG. However, if we’d add something like this, we could also reconsider to schedule using the actual latencies (in conjunction with the operator library proposal) to form the stages in the first place… Sorry for meandering between ideas here!