Machine block placement time and complexity

Hi, I am having some interesting issues when running llc on a piece of auto-generated code, namely that the machine block placement pass is taking an excessive amount of time.

For one file with a rather large function (a threaded interpreter core with indirect br instructions) and a lot of small helper functions (say 3-4k of them) I get the following timings:

llc -O1

  1616.9128 ( 42.6%)   0.4521 ( 22.5%)  1617.3649 ( 42.6%)  1617.7932 ( 42.6%)  Branch Probability Basic Block Placement

llc -O2

  1626.9825 ( 42.1%)   0.3805 ( 13.4%)  1627.3631 ( 42.0%)  1627.5726 ( 42.0%)  Branch Probability Basic Block Placement

llc -O3

  3322.8112 ( 60.2%)   0.8212 ( 35.7%)  3323.6324 ( 60.2%)  3324.1152 ( 60.2%)  Branch Probability Basic Block Placement

That is almost an hour for compiling with -O3, but I have a larger interpreter with more helper functions, and that was clocking in at 10h in compile time, but I am not going to rerun that test right now :grimacing:.

So, what I am wondering if there is a description somewhere about how the pass works or if someone can explain it quickly, and maybe what the approaches to controlling the time it takes would be? This would help me fix our code generator.