LLVM Discussion Forums

Continuing the conversation on per-module versus per-instance…

A potentially irrelevant observation: one fundamental difference between hardware and software is the time at which instantiation is done. In software, instantiation of classes is at runtime. In hardware it’s at compile time. Technically, the same exponential size applies to both situations. Hardware, however, has an upper bound – the size of the device. (Even ASICs are limited in size by the reticle limit.) So I think the worst case is the amount of memory and time (for each design tree descent) you’d need to compile is proportional to the size of the device, which is admittedly a different paradigm from software compilations because it incorporates elements of software runtime.

In the simple case, I fail to see how the per-module representation couldn’t be used to identify instances. There is no need to fully elaborate – the compiler can just lazily elaborate at descent time. In the case where instance specific modifications need to happen, I don’t think the worst case can be avoided in both space and time – barring some complex Haskell-style lazy evaluation, which MLIR is not set up to easily do. The opposite case wherein changes need to occur to every instance of a particular module type is an easy one. I think it’s more practical to focus on the in between cases.

For cases wherein a subset of the instances need to be changed: assuming it is the same transformation to each instance, this can be stored as a pattern matching of the hierarchical instance names to identify the instances upon which the transformation took place in the transformed module. In terms of time, it should be possible to amortize any analysis and transformation over identical modules or module hierarchies.

As a concrete example, take the per-instance representation on slide 27 of the FIRRTL talk we had today. In order to walk this tree in exponential time but not space, just lazily elaborate the per-module representation into the per-instance tree. In order to avoid exponential time, memoize the descent by the module name. Some hybrid of the two (in the case where C1 and C2 descent should be treated differently) should be possible With two different memoization tables for the two different descents.One could also attempts some descent reconvergence by playing tricks with the memorization tables.

image

To store a change for the D1 only on the C1 descent, replace it with D1’ but store from D1’ on down as a per module representation, making the tree asymmetric. In order to make changes to the D2’s only on the left half, create D2’, and identify the replacement location as c1.* To differentiate it from the C2 path which contains unmodified D2’s.Similarly, one could identify modifications as *.c1.c2 to replace only the second and sixth D3 in constant space.

So I think what I’m suggesting here is some sort of hybrid data structure – a per-module tree for the original design overlaid with a per-pattern-matched-instance representation for edits… This data structure would have the worst-case behavior of exponential storage and time, best case behavior of linear storage and time, and I suspect something closer to best case for the average case. I have not, however, thought through how module parameterization affects this. Presumably, depending on the algorithm operating on this tree the descent metallization table may or may not have to include the module parameters.

How well this maps into MLIR – particularly the pattern matching identifiers – is unclear to me.

If I recall correctly, this was brought up in the context of tracking (tree) location identifiers across transforms, yes? If so, I see that is a different but related problem which could be solved efficiently (in the average case) with this data structure, Though I don’t fully understand that use case.

Am I missing something? Do I not fully understand the problem? I may have completely misinterpreted what was being said to me today.

I don’t think you’re missing anything - that approach should work. The problem with it is that this elaboration bloats out the IR, and duplication of a node can be very large - e.g. an entire CPU in a multicore config when a verification engineer wants to do some stuff related to CPU#0 only. This is the issue that Adam was identifying, and this can have a huge effect on compile time. Not only is it a memory bloat, but it causes redundant work for pretty much everything downstream which now has to be done to multiple modules.

I think the way to handle this is to represent the names as special parameters that don’t inform the hardware. In a multicore config, you’d have something like this:

instance "cpu" #0
instance "cpu" #1
instance "cpu" #2
instance "cpu" #3

Where the numbers are equivalent to standard verilog parameters - the module can then be parameterized on this, including having conditional logic for #0 etc.

This prevents having to elaborate early.

I also think you’re grokking the issue. Storing copies of different modules seems tractable.

Note: I view managing the fungibility of these two representations as the critical difficulty of circuit IRs. Because a circuit hierarchy is going to be expanded and physically placed, there is information associated with the unfolded representation that is meaningful (e.g., floorplan, clock domain, reset vector).

The program IR analog would seem to be doing optimizations based on the specific call stack to a function.

FIRRTL Compiler Implementation

What follows is some more detail on how the current implementation of the FIRRTL compiler deals with this problem.

The FIRRTL compiler has the concept of an Annotation and a Target. An Annotation is a metadata container associated with zero or more Targets. A Target is something with a name in a circuit (e.g., a register, a circuit, a module). (I think these are just weaker MLIR regions/attributes?)

In the example you link from the slide, you can then attach metadata to either module D3:

# - "In circuit D0"
# - "module D3"
~D0|D3

Or to a specific full-path instance to one D3:

# - "in circuit D0"
# - "in module D0"
# - "instance c1 of D1"
# - "instance c1 of D2"
# - "instance c1 of D3"
~D0|D0/c1:D1/c1:D2/c1:D3

Or to a partial path to one D3:

# - "in circuit D0"
# - "in module D2"
# - "in instance c1 of D3"
~D0|D2/c1:D3

Annotations are not stored in the IR, but in a separate data structure. This places a burden onto the compiler to keep these up to date. To do this, each transform returns a RenameMap that stores information about what it did as Target -> Seq[Target]. The compiler can then update all annotations. Notably this logic has been the most complicated aspect of the FIRRTL compiler and a source of a lot of bugs.

Deduplication / Duplication

The FIRRTL compiler has both deduplication and duplication utilities. The former collapses structurally equivalent modules together. The latter splits specific instances into lone modules.

This has produced a pattern of duplicate-transform-deduplicate for situations where the modifications you want to make are difficult to reason about unless you can be assured that you are operating on a lone instance. E.g., if you want to punch out wires inside a specific instance, but don’t want to have to think about which modules on the path to the root require duplication.

As Chris identifies, this is sub-optimal and results in more work than needed. There needs to be thought on how to avoid the duplication/deduplication.

Parameterization

Parameterization could work, though it would need to be more powerful than what Verilog supports as module differentiation may add or remove IO and this isn’t allowed in Verilog (and I don’t think there’s an ifdef hack to get this to work). This is tractable with a more powerful RTL dialect.

Maybe this isn’t all that different from merging two functions in program IR and adding control logic to choose between them?

1 Like

Great summary @seldridge. Agreed, we’d need to allow “conditional ports” based on parameters, which seem like a pretty big omission from Verilog in general.

I agree w/ @clattner – good overview.

It sounds like the FIRRTL compiler uses a scheme similar to what I referred to as “pattern matching” to avoid the RenameMap from a potentially exponential storage requirement.

RE parameterization: yes, this sounds very similar to software function merging. It’s actually very similar to C++ templates! Sorry if I’m late to the party on aligning my mental model. (It took a drafts of this post to think it through.) @clattner: how would you model C++ template parameters in MLIR? (I’m just now making the connection between this and some of your comments on preserving code parameterization until you absolutely must do the elaboration. I’m assuming you’ve done some thinking in this area.)

Does it make sense to track these parameters as MLIR function arguments? I’m imagining a very complex, constant value (really a tree of values) at the top level (or wherever the parameterization stops). This is the parameter passed into the C++ template at its instantiation. It would be peeled off with each “instance operation” with each layer containing the parameters for that instance and all of the sub-instances.

How would you model “conditional ports” in MLIR? We’ve been discussing [input] ports as function argument.

We need to discuss how to represent parameterization in the IR in general, I don’t have an “obviously correct” model in mind - I just see many possible models with different tradeoffs. Once we have that, conditional ports can be tied into it. One simple way is to represent them as array of things, and have that array be either zero or one elements based on a parameter.

When we tackle this, I’d recommend looking at how the swift SIL IR does parameterized generics - it is structurally the same problem. Swift and SIL doesn’t require instantiation of generics at all, such a thing is a performance optimization that enables specialization and inlining, just like we want for modules in the RTL dialect.

Check out this talk for an intro to some of these concepts.

-Chris

Which community are you referring to as “we”? CIRCT or MLIR?

~John

Both :slight_smile: but I was referring to CIRCT.

One question is what compatibility with SV generate loops and conditionals would look like. Making the module parameterization richer with conditional ports is good, but generate loops make new namespaces per iteration at elaboration (well, an array of namespaces). Code outside the generate block can refer to specific environment frames generated by specific loop iterations. I don’t think this is a module parameterization implementation question, but we should be prepared for people to want to represent this in at least the SV dialect. Which is to say, conditional module instantiation won’t be enough to capture SV generate statements.

Agreed. I think the problem is much more broad than conditional instantiation and much more broad than CIRCT. I would like to see a solution at the MLIR level eventually.

If we see a need first, we should do some experimentation in CIRCT and report back to the MLIR community.

Yeah, this is messy, and I don’t fully understand the semantics here. I would look at this as two different problems:

  1. How do we model verilog semantics at the SV level, and how far do we go. It isn’t clear how far we want to go here. We would never “parse” a verilog file into a sv.generator.for loop, but there is room for generators to product these things – just like the sv.ifdef node.

  2. What is the principled thing we want to have that allows analysis and transformation of parameterized IR, e.g. at the RTL level? I think we’d want to do something like SIL in Swift where we have type system integration (the ‘tau’ types) as well as type argument declarations. However, there may be other ways to tackle this.

As the RTL/SV dialect divide gets better defined, I think we should come back to this topic in a design meeting.