Já bych to nazval spíš úklidem nebo reorganizací binárky za účelem lepší stravitelnosti pro CPU.
Citace z článku na sdtimes.com [1]:
"Highly complex services, such as those here at Facebook, have large source code bases in order to deliver a wide range of features and functionality. Even after the machine code for one of these services is compiled, it can range from 10s to 100s of megabytes in size, which is often too large to fit in any modern CPU instruction cache. As a result, the hardware spends a considerable amount of processing time — nearly 30 percent, in many cases — getting an instruction stream from memory to the CPU,...
BOLT rearranges code inside functions based on their execution profile, the company explained. The body of the function is split based on how frequently the code is executed, and then it performs and optimal layout of hot chunks of code depending on the call graph profile."
Jenze jestli neco "vi", co bude pro CPu stravitelny, tak je to prave kompilator, protoze prave a pouze ten vidi, co programator kodem zamyslel, kdezto cokoli potom uz ma informacni deficit danej prave tim, jak se rozhod kompilator.
A protoze i kompilator ti klidne muze nektery kusy kodu preorganizaovat tak, ze nebudou fungovat, tak prevazne existujou moznosti, jak jeho optimalizacim zabranit. Coz je opet informace v kodu pro kompilator, kterou ale optimalizator binarky ... nema.
A optimalizovať na ktorý procesor
AMD vs. Intel
The Compiler as Referee
test
444 namd optimalizovaný na Intel P7 bežal na Opterone na úrovni 89% výkonu kódu aptimalizovaného pre Opteron
a
test
454 calculix optimaliyovaný pre Opteron bežal na Pč-kovom Xeone s výkonom 50% výkonu kódu optimalizoavného pre P4-ku
https://www.pgroup.com/lit/presentations/pgisc06.pdf