GCC has flags. A lot of them. I’ve spent a fair amount of time going through the man-page trying to figure out the best “general purpose” set of flags for my own personal development. Here’s what I use as the baseline for my home C++ projects (GCC 4.3.0, linux, old Intel Pentium4) YMMV, especially with third-party tools, since a lot of these settings are

  • C++-only (-Wnon-virtual-dtor)
  • highly opinionated (-Wold-style-casts)
  • likely to break on code that didn’t use it from the get-go (-ansi -pedantic)

Enjoy.

Language Features

-ansi
-pedantic
In my experience, most GCC-isms that have caused issues in the Microsoft and Intel compilers have been catchable by specifying a more strict interpretation of the language standard. -ansi tends to work with most other code I’ve run into, whereas -pedantic oftentimes breaks old code by rejecting stray semicolons such as namespace foo { };.
-std=c++98 or
-std=c++0x
This may be superfluous given the amount of other settings that I use, but it does clarify to a reader what language features I expect to be using. Plus, it will take a while for GCC to start using C++0x as the default ANSI version, so why wait?
-Wno-long-long
Because of the ANSI specification, I do have to go through extra hoops to specify “Yes, I want certain features of C99 that are not required to be present in C++98”. The only one that regularly comes up is the usage of long long as a data type for sequence ids.
-D__STDC_FORMAT_MACROS
-D__STDC_CONSTANT_MACROS
-D__STDC_LIMIT_MACROS
If I’m using <stdint.h> or <inttypes.h> in C++ code, these are required to enable all of the C99 features.

Warnings

-Wall
-Wextra
Standard supergroups of warnings that turns on a bunch of basic settings.
-Wwrite-strings
Old C code reused literal string constants as storage space for whatever. That’s just plain wrong now.
-Winit-self
Using an undefined variable… to define itself. I only ever run into this when I’m refactoring variable names and discover that there are two levels of loops, each with an i.
-Wcast-align
-Wcast-qual
-Wold-style-cast
C++ casts and unions are much easier to grep for in source code, so I avoid C-style casts entirely. My biggest beef is when user-defined datatypes expect users to call cast operators to perform common routines… imagine if std::string used operator const char* instead of c_str()… This looks like an eyesore, and in the case of obscure types, it’s not obvious to a maintenance programmer if it means static_cast (yes, in this case) or reinterpret_cast (segfault due to garbage data) or dynamic_cast (segfault due to NULL string).


printf("the answer-->[%s]\n", (const char*)answer);
-Wpointer-arith
Any sort of pointer arithmetic is suspect. I manage to do a lot of pointer-based optimizations without triggering this warning, so I honestly can’t say I remember what it does specifically.
-Wstrict-aliasing
Necessary for -fstrict-aliasing.

-Wformat=2
Functions like printf will core or (worse) print weird data at run-time if the format arguments don’t match the varargs. I’ve never been too fearful of varargs in C++ (unlike most of the rest of the community), mostly because GCC protects me from my own carelessness in this way.
-Wuninitialized
This requires -O1 or above… it’s a no-brainer. Point out variables that have garbage data.
-Wmissing-declarations
Find free functions defined in implementation files that probably be either declared in the header or marked as static.

-Woverloaded-virtual
Sometimes method signatures change, and C++ lacks the nice override keyword that C# has to specify “this method only exists to implement the behavior of a virtual method from a parent.” Any way to detect mismatched virtual method overrides is good.
-Wnon-virtual-dtor
It’s usually (but not always) to declare a class with virtual methods but no virtual destructor. Moreso, I find that it’s also usually an design error to want a class with virtual methods but no virtual destructor, because it usually means some form of static polymorphism is more appropriate.
-Wctor-dtor-privacy
This one mostly catches stupid cases where I forget to add public to a class declaration. It’s more obvious than weird errors later complaining that the object can’t be instantiated.

Optimization

-O3
Of course, there are certain local cases where -O3 is not optimal, but I find that overall, I’ve never run into a global example of it being measurably worse than -O1 or -Os.
-ftree-vectorize
-ftree-vectorizer-verbose=2
As of GCC 4.3, -ftree-vectorize is built into -O3, so this is not always necessary. For anyone familiar with the happy LOOP VECTORIZED diagnostic from ICC, this gets the same result. From what I’ve seen, ICC is still much better at diagnosing vectorizable loops, and this may not buy you much for non-numerical computing.
-ffast-math
This flag tends to raise people’s blood pressures, but I guess I haven’t yet encountered a situation where it bit me (mind you, I don’t do any development outside of x86, so take my opinion with a pinch of naivety). I started turning it on religiously when I was optimizing an undergraduate raytracer project, and discovered a 5-10% improvement. All of the corner cases where -ffast-math causes problems ended up resulting in major failures in the raytracer, and ultimately allowed me to catch subtle bugs easier.
-fstrict-aliasing
IMO, C-style pointer casting is evil. Maybe I’ve had too much GCC Kool-Aid, but I tend to replace all pointer-casting:

  • Implicit casting (allowing T* to downgrade to char*. Implicit is generally “bad” because it’s hidden, but this tends to only work in obvious places such as memcpy(&dst, &src, sizof(src)).
  • static_cast<T> for something like the C socket API that distinguishes between struct sockaddr and struct sockaddr_in (and the relevant structures are all local stack objects).
  • reinterpret_cast<T> for C-style APIs that pass around void*
  • union everywhere else. GCC seems to deal with unions better than arbitrary casting.
-freorder-blocks
Judicious use of __builting_expect along with block-reordering can remove a lot of branch-related stalls from the fast path. For instance, std::vector::push_back() could be written such that the expected case (size() < capacity()) incurs no branch misprediction and exhibits maximal instruction cache locality. In personal experiments, I’ve seen this make a difference of fivefold or more for very lightweight template containers.

-march=native/pentium4/nocona/core2
-msse2
-msse3 (maybe)
Or -march=whatever for your local platform, since native is a recent addition to the GCC syntax. If you know the target platform (and I always do, since all of the software that I’ve ever written has been for personal or in-house use), there’s no reason not to set these flags for a release build. However, there are a few valid reasons to not use this:

  • If you don’t know the target platform, or if there are a variety of target platforms. But it’s probably still good to provide platform-optimized code, since generic i386 instructions are so… ancient. After all, if you weren’t concerned with performance, you wouldn’t be using C++, would you?
  • If you plan on running your software in an emulator. This includes Valgrind, which gives me nice segmentation faults when I use the core2 instruction set. Maybe I need to upgrade my version of Valgrind.
-mfpmath=sse
Another machine-specific performance tweak, this actually gives another significant benefit. On most x86 hardware, floating-point computations get done in 80-bit registers, and only truncated to 64 bits (for double) when they round-trip to memory. The net effect is that in certain edge cases, double x = 0.1; double y = x; assert(x == y) can result in an assertion failure due to lost significant figures. You can force all floating-point calculations to round-trip through memory with -ffloat-store, but that incurs a significant performance penalty (and if you weren’t concerned with performance you wouldn’t be using C++, would you?). However, from what I have read, using SSE instructions mitigates this issue entirely.
-minline-all-stringops
This replaces all str{cpy,len,...} and mem{cpy,move,set} library calls with GCC builtins, which generally turn into multibyte assignments or machine-specific string instructions. I’ve seen it turn a strcpy into several movl instructions, with the string data interpreted as an array of unsigned integers. Neat. Usually this is faster (due to the removal of a function call), but it doesn’t always speed up code: the extra instructions may increase instruction cache misses, which definitely affects aggressively inlined blocks.

Makefile Integration

-MMD
tells the compiler to generate Makefile dependency-information as a side effect of compilation. This is a requirement for iterative development, otherwise the only way to get a correct build is to make clean every iteration.
-MF [filename]
Usually my Makefile rule for compilation looks like this:


%.o : %.cxx
    $(CXX) -c -o $@ $< -MMD -MF $(basename $@).dep $(CXXFLAGS)
include $(wildcard *.dep)

With that, foo.cxx produces object file foo.o and Makefile dependency rule file foo.dep. I always find it best to use GCC for the dependency generation rather than a separate step (such as the makedepend program or some batshit insane sed scripts that I’ve seen littering some Makefiles, probably a relic from before the compiler generated this information). GCC itself produces a 100% accurate result and the generated rule has all pathing information set correctly as well, which other tools may not set up correctly. Add to that tools like makedepend modifies the Makefile itself by default, which adds a lot of unnecessary churn in the revision control software.

Miscellaneous

-pipe
Supposedly this makes compiling and linking faster by staying in memory instead of storing all intermediate representations in temporary files… I haven’t ever timed it, but I type it out of habit.
-save-temps
If I want to examine a particular piece of code, I’ll typically add this as a temporary compilation flag so that the compiler saves all preprocessed output (foo.ii) and generated assembler (foo.s). Note that this nullifies the -pipe setting.
-Wa,-a
If I want to examine assembler but see code generated inline with it, I will specify this flag and modify my Makefile rule to redirect output to $(basename $@).s. This gives more readable results than -save-temps and doesn’t affect -pipe.

Summary

Here’s a snippet out of one of my Makefiles that includes most or all of these settings:

CXX = g++ -Wa,-a -pipe
CC = gcc -Wa,a -pipe
LD = g++ -pipe

WARN = error all extra write-strings init-self cast-align cast-qual \
       pointer-arith strict-aliasing format=2 uninitialized \
       missing-declarations no-long-long no-unused-parameter
CXXWARN = overloaded-virtual non-virtual-dtor ctor-dtor-privacy
OPTIM = \
    $(addprefix -f,strict-aliasing reorder-blocks) \
    $(addprefix -m,arch=native sse2 fpmath=sse inline-all-stringops) \
    -O3

CXXFLAGS = -ansi -pedantic -std=c++0x -ggdb \
           $(addprefix -W,$(WARN) $(CXXWARN)) $(OPTIM)
CFLAGS = -ansi -pedantic -std=c99 -ggdb \
           $(addprefix -W,$(WARN)) $(OPTIM)
LDFLAGS = -lrt

define DO_LINK
$(LD) -o $@ $^ $(LDFLAGS)
endef

define DO_COMPILE_CXX
$(CXX) -c -o $@ $< $(basename $@).s
endef

define DO_COMPILE_C
$(CC) -c -o $@ $< $(basename $@).s
endef
Advertisement