Here’s a few tidbits that I’ve run into when building latency-sensitive applications in Linux…

gcc -c -fno-guess-branch-probabilities

If GCC is not provided with branch probability data (either by using -fprofile-generate or __builtin_expect), it will run its own estimate. Unfortunately, very minor changes in source code can have very major changes in these estimates, which can foil someone attempting to measure the effect of source-level optimizations. Personally, my opinion is to use -fprofile-generate and -fprofile-use for maximum performance, but the next best is to have consistent results while developing.

Also, see this brief gcc mailing list discussion about the topic.

ld -z now
Equivalent to gcc -Wl,-z,now (which is probably preferrable, since I write C++ and always use the g++ frontend), this tells the dynamic linker to resolve symbols on program start instead of on first use. I’ve noticed that sometimes, the first run through a particular code path results in a latency of 800 milliseconds. That’s huge; that’s big enough for a human to notice with the naked eye. For server processes, that generally also means that the first request serviced on startup blows chunks, latency-wise. While it may not make a huge difference in the grand scheme of things, it’s nice to be able to pay the cost of symbol lookup up front instead of at first use.