A potentially irrelevant observation: one fundamental difference between hardware and software is the time at which instantiation is done. In software, instantiation of classes is at runtime. In hardware it’s at compile time. Technically, the same exponential size applies to both situations. Hardware, however, has an upper bound – the size of the device. (Even ASICs are limited in size by the reticle limit.) So I think the worst case is the amount of memory and time (for each design tree descent) you’d need to compile is proportional to the size of the device, which is admittedly a different paradigm from software compilations because it incorporates elements of software runtime.
In the simple case, I fail to see how the per-module representation couldn’t be used to identify instances. There is no need to fully elaborate – the compiler can just lazily elaborate at descent time. In the case where instance specific modifications need to happen, I don’t think the worst case can be avoided in both space and time – barring some complex Haskell-style lazy evaluation, which MLIR is not set up to easily do. The opposite case wherein changes need to occur to every instance of a particular module type is an easy one. I think it’s more practical to focus on the in between cases.
For cases wherein a subset of the instances need to be changed: assuming it is the same transformation to each instance, this can be stored as a pattern matching of the hierarchical instance names to identify the instances upon which the transformation took place in the transformed module. In terms of time, it should be possible to amortize any analysis and transformation over identical modules or module hierarchies.
As a concrete example, take the per-instance representation on slide 27 of the FIRRTL talk we had today. In order to walk this tree in exponential time but not space, just lazily elaborate the per-module representation into the per-instance tree. In order to avoid exponential time, memoize the descent by the module name. Some hybrid of the two (in the case where C1 and C2 descent should be treated differently) should be possible With two different memoization tables for the two different descents.One could also attempts some descent reconvergence by playing tricks with the memorization tables.
To store a change for the D1 only on the C1 descent, replace it with D1’ but store from D1’ on down as a per module representation, making the tree asymmetric. In order to make changes to the D2’s only on the left half, create D2’, and identify the replacement location as
c1.* To differentiate it from the C2 path which contains unmodified D2’s.Similarly, one could identify modifications as
*.c1.c2 to replace only the second and sixth D3 in constant space.
So I think what I’m suggesting here is some sort of hybrid data structure – a per-module tree for the original design overlaid with a per-pattern-matched-instance representation for edits… This data structure would have the worst-case behavior of exponential storage and time, best case behavior of linear storage and time, and I suspect something closer to best case for the average case. I have not, however, thought through how module parameterization affects this. Presumably, depending on the algorithm operating on this tree the descent metallization table may or may not have to include the module parameters.
How well this maps into MLIR – particularly the pattern matching identifiers – is unclear to me.
If I recall correctly, this was brought up in the context of tracking (tree) location identifiers across transforms, yes? If so, I see that is a different but related problem which could be solved efficiently (in the average case) with this data structure, Though I don’t fully understand that use case.
Am I missing something? Do I not fully understand the problem? I may have completely misinterpreted what was being said to me today.