Register allocation and performance fluctuation caused by different micro-architecture


I found that bad eviction chains and local spills caused by local interferences problem had been solved by Yatsina(

I want to reproduce the result in LLVM test suite. Performance of the geomean(or overall runtime) was about the same as expected(less than 0.1% speedup), but some benchmarks have been improved while some of them actually deteriorated for the individual benchmarks.
This performance fluctuation seems micro-architecture dependent since I get a different performance improvement/degradation list in AMD CPU or different Intel CPU generations.

I think this is not a trivial problem, since the impact of the front-end efficiency of the micro-architecture(e.g. I-cache miss, code alignment) often dominates the performance improvements by the register allocation.

But as far as I know, register allocation developers don’t mention this problem. Is it because such “noise” is generally ignored? or have I missed something?