You are definitely bringing up valid points. With HLS, there are certainly trade-offs between area, power, latency, throughput, etc. that need to be considered. I think MLIR provides a really good framework for exploring this optimization space. Regarding parameter buffers, I know there has been some work in the field with FPGAs to store parameters in on-chip memories that can provide high throughput.
I guess the application scenarios I’m imagining are related to inference workloads, either on FPGAs or with custom ASICs. There are many other directions this could go, but I think there is a real need here that we can start chipping away at. The vendors are touting FPGAs as a way to accelerate ML workloads, but we don’t have good, standardized open source tooling for putting our models onto such machines. CIRCT is trying to develop such tools, which is why I thought it might make an interesting backend to NPCOMP.
Here’s another example that is more concrete. Steve sent me this project from Xilinx a while back: https://finn.readthedocs.io/en/latest/end_to_end_flow.html. This end-to-end flow diagram shows an example “starting from a trained PyTorch/Brevitas network and going all the way to a running FPGA accelerator”. This is the kind of flow I’m hoping to enable by integrating CIRCT as a backend for NPCOMP. Looking at the FINN diagram, I’m imagining the top part of the flow would be NPCOMP, the target interface would exist before the “Convert to HLS Layers” step, the bottom half of the flow is CIRCT.
Again, CIRCT is still really early on, and I don’t want to speak for the other parties, but this is what I’d personally like to see.