In my ongoing quest to make MLIR faster at threaded compilation :-), I ran into a new problem. Earlier I was profiling on an intel mac, but I switched to an Apple M1Max laptop. It shows a very different profile.
In a release build of CIRCT with release build on MLIR, I now see
OperationName::OperationName at the very top of the profile:
Of course, this is completely dominated by the mutex operations in
A couple of questions:
- does anyone know why mutex ops are so much slower on an Apple M1 MBP laptop than they are on an Intel X86 MBP?
- has anyone thought about improving this, e.g. by having the OperationName lookups happen during dialect registration (which is typically single threaded) and cached in a readonly map attached to the dialect? If we did that, then
OpBuilder::createcould check that before going to the big map in the
This is easy to reproduce FWIW, I’m using the public CIRCT build with the chipyard…hi.fir test input, and this command:
firtool chipyard.hi.fir -o chipyard.hi.fir.v -verilog -mlir-timing.
It is a 94M input file and takes about 21s of wall time, which isn’t a huge input but it is enough to measure.