[Idea] Deduplicating sequences of instructions only reachable via jump

Disclaimer: I’ve never used (directly) or worked on LLVM, so I’m pretty clueless here, and am just proposing an idea for an optimization with very little plan to implement it myself.

Problem

After monomorphisation, inlining, and optimization passes, I have a theory that binaries often have a ton of duplicated sequences of instructions that are taking up binary size and causing instruction cache to overflow.

Solution

Near the end of compilation, a new inter-procedural optimization pass is run which collects sequences of instructions which cannot be “fallen through” into (can only be entered via a jump). This optimization collects these instruction sequences and deduplicates them, redirecting all jumps to a common implementation (a form of outlining).

This would not introduce more jumps, so at run time the exact same instructions are executed, but it will improve binary sizes and therefore improve instruction cache hits and other Fun Magic that I’m not familiar with.

Complications

This would most likely need to run during LTO, if possible, to make sure the most sequences are available to be deduplicated, although theoretically it would be possible within a compilation unit (although reduced effectiveness).

When I have brought up this idea previously in other spaces, people have pointed out that this would probably conflict with CFI as it would look pretty similar to ROP (which is where I got this idea from, actually), however a simple solution is to disable this optimization when CFI is enabled.

This sounds very similar to IBBF (Inter-procedural Basic Block Folding) which I recently wrote here: IBBF

Indeed, it looks like your proposal is a much more fleshed out and knowledgable writeup of the same idea. Good to know that this is being thought about!