clang front-end does not seem to allow fine-grain control of optimization (i.e. beyond the
-Ox options). However, for some benchmarking I’d like to be able to do this.
I have tried decomposing the compilation process. Instead of the monolithic command:
clang -O1 try.c -o try.elf1
I am executing:
clang -emit-llvm -S try.c -o try.ll opt -O1 try.ll -o try.bc llc try.bc -o try.s clang try.s -o try.elf2
But the result is not the same - for some vectorial code I wrote (Intel/AVX2) the result of the monolithic command (
try.elf1) is 10x faster than the output of the decomposed compilation (
try.elf2). The only way to recover the performance of the monolithic case is to add
-O1 to the
clang call that produces the
try.ll file (which kind of defeats my purpose). I have also tried listing the optimizations applied by the monolithic command (using option
-fsave-optimization-record) and adding all these options to the call to
opt, to no avail.
So my question is: how to expose the optimization pipeline in a way that allows reproducing what the monolithic command does, allowing enabling and disabling of individual passes such as