Defending Macros
by Steve Donovan, the Author of C++ by Example: "UnderC" Learning Edition
MAR 15, 2002
C macros in C++ code have been considered bad manners for years now. In this article, Steve Donovan explains how using a few carefully chosen macros can result in clearer, more readable code (provided they are used in a disciplined way).
This article is provided courtesy of Que.
The C++ preprocessor comes from the C legacy that many supporters of the language want to go away. The preprocessor step does exactly that; it compensates for some of the deficiencies of C by working on the source text, replacing symbolic constants by numbers, etc. Because these deficiencies have mostly been addressed, the separate preprocessing step seems very old-fashioned and inelegant. But the inclusion of files, conditional compilation, and so on depend on the preprocessor so strongly that no proposal to retire it has been successful.
I think most people are in agreement that inline functions and constants are much better than macros. The basic problem is that the preprocessor actually does text substitution on what the compiler is going to see. So macro-defined constants appear to the compiler as plain numbers. If I define PI in the old-fashioned way, as follows, then subsequently the debugger will have no knowledge of this symbolic constant, because it literally did not see it:
#define PI 3.1412
Simple-minded text substitution can easily cause havoc and is not saved by putting in parentheses:
#define SQR1(x) x*x
#define SQR2(x) (x)*(x)
...
SQR1(1+y) => 1+y*1+y bad!
SQR2(1+y) => (1+y)*(1+y) ok
SQR2(sin(x)) => (sin(x))*(sin(x)) ok - eval. twice
SQR2(i++) => (i++)*(i++) bad - eval. twice!
This vulnerability to side effects is the most serious problem affecting all macros. (Yes, I know the traditional hack in this case, but it has a serious weakness.) Inline functions do a much better job, and are just as fast. Again, once macros get any larger than SQR, then you are lost if you are trying to browse for a macro symbol or trace through a macro call. Debuggability and browsability are often unappreciated qualities when evaluating coding idioms; in this case, they agree that macros stink.
Another serious problem with macros is related to their lack of browsability; they are completely outside the C++ scoping system. You never know when a macro is going to clobber your program text in some strange way. (Potential namespace pollution problems are nothing compared to this one!) So, a naming convention is essential; any macros must be in uppercase and have at least one argument, so they don't conflict with any symbolic constants. The only type-checking you have with macros will be for the number of arguments, so it's wise to use this. Of course, macros also have an important role to play in conditional compilation, so such symbols must also be distinctly named (initial underscores are useful).
I have rehashed all these old issues so you can appreciate the strict limits we must place on any useful macro. The first one I'll introduce is a modified version of the FOR macro:
#define FOR(i,n) for(int i = 0, _ct = (n); i < _ct; i++)
This is free of the side-effect problem because a temporary variable is used to contain the loop count. (I am assuming that the compiler has the proper scoping for variables declared like this! Both i and _ct must be private to the loop. The Microsoft compiler will finally be compliant on this irritating item this year.) If n is a constant, then a good compiler will eliminate the local variable, so no penalty for correctness is necessary here.
My argument is that using FOR consistently leads to fewer errors and improved code readability. One problem with typing the for-statement is that the loop variable is repeated three times, so mistakes happen. My favorite is typing an i instead of a j in a nested loop:
for(int i = 0; i < n; i++)
for(int j = 0; i < m; j++)
...
Note here that slight differences are invisible in the lexical noise. If I see FOR(k,m), then I know this is a normal k = 0..m-1 loop, whereas if I see a for-statement I know it deviates from this pattern (like for(k = 0; k <= m; k++)). So, exceptional cases are made more visible. I find it entertaining that these statement macros are actually safer in standard C++ because you can define local loop variables.
In my article called "Overdoing Templates," I point out that C++ is not good at internal iterators. Here is a typical situation:
void has_expired(Shape *ps)
{ return ps->modified_time() < expiry_time; }
...
int cnt =
std::count_if(lsl.begin(),lsl.end(),has_expired);
Compare this with
list<Shape *>::iterator sli;
int cnt = 0;
for(sli = lsl.begin(); sli != lsl.end(); ++sli)
if ((*sli)->modified_time() < expiry_time) ++cnt;
This is much better behaved; the condition within the loop is explicit, and the scope of expiry_time (which is effectively global in the first version) can be local. The explicit declaration of an iterator does make this more verbose, however, and it's good practice to use typedefs (such as 'ShapeList') here.
I want to introduce a few candidate statement macros to make this common pattern even easier on the eye. First, let me introduce GCC's typeof operator, which makes a lot of template trickery quite straightforward. It takes an expression (which, as with sizeof, is not evaluated), and deduces its type, and can be used in declarations wherever a type is required:
double x1 = 2.3;
typeof(x) x2 = x1;
typeof(&x) px = &x1;
I'm presenting typeof as a part of C++ because Bjarne Stroustrup would like to see it included in the next revision of the standard, and it's a cool feature that needs every vote it can get. With it, I can write the FORALL statement macro, and express our example more simply:
#define FORALL(it,c) \
for(typeof((c).begin()) it = (c).begin(); \
it != (c).end(); ++it)
...
int cnt = 0;
FORALL(sli,ls)
if ((*sli)->modified_time() < expiry_time) ++cnt;
This statement macro works with any container-like object; that is, any type that defines an iterator and begin()/end(). But it suffers from the side-effect problem. It should not be called when the container argument is some non-trivial expression, such as a function call, because that expression must be evaluated for each iteration of the loop. And there is no way to enforce that restriction, because macros are too dumb. So FORALL does not meet our criterion as a "safe" statement macro. Besides, it simply cannot be expressed in the standard language.
A better candidate is FOR_EACH, which is a construct that is found in many languages. For instance, AWK has 'for(i in array)' for iterating over all keys in an associative array, and Visual Basic (and now C#) has FOR EACH. I will show that FOR_EACH is a much better-behaved statement macro, and it can in fact be implemented using the standard language, although not so efficiently. This is what I want to be able to say:
int cnt = 0;
Shape *ps;
FOR_EACH(ps,ls)
if (ps->modified_time() < expiry_time) ++cnt;
Note that this form makes it hard to accidentally modify the list, and as a bonus, is rather more debuggable. I bring this up because debugging code using the standard containers can be frustrating. If sli is an iterator, then *sli is the value—but the built-in expression evaluators in gdb and Visual Studio can't understand these smart pointers.
The FOR_EACH construct needs a special kind of iterator that binds a variable reference to each object in turn. When we ask this iterator for the next element, it assigns the next value to the variable reference. Eventually, it signals to the caller that there are no more elements in the collection, and the loop can terminate. The implementation using typeof follows:
// foreach.h
template <class C, class T>
struct _ForEach {
typename C::iterator m_it,m_end;
T& m_var;
_ForEach(C& c, T& t) : m_var(t)
{ m_it = c.begin(); m_end = c.end(); }
bool get() {
bool res = m_it != m_end;
if (res) m_var = *m_it;
return res;
}
void next() { ++m_it; }
};
#define FOR_EACH(v,c) \
for(_ForEach<typeof(c),typeof(v)> _fe(c,v); \
_fe.get(); _fe.next())
The ForEach constructor requires two things: a reference and a container-like object. These are only evaluated once as arguments, so there are no side effects. So FOR_EACH is valid in a number of contexts. Please note that it is better for containers of pointers or small objects because copying of each element takes place in turn.
The typeof operator is essential here because template classes will not deduce their types from their constructor arguments. But you can actually implement FOR_EACH without typeof, using the fact that function templates can deduce their argument types. However, you cannot declare the concrete type, so it must be derived from an abstract base and then created dynamically. The listing can be found here.
int i;
string s = "hello";
// gives 104 101 108 108 111
FOR_EACH(i,s) cout << i << ' ';
list<string> lss;
...
FOR_EACH(s,lss) ... // may involve excessive copying!
How efficient is FOR_EACH? Tests with iterating through a list show that the typeof version is only about 20% slower than the explicit loop because the reference iterator code can be easily inlined. The standard version is nearly three times slower because of the virtual method calls. Even so, in a real application, its use would most likely have no discernible effect on the total run time.
A serious criticism of statement macros is that they allow people to invent their own private language that ends up being less readable and maintainable. However, in a large project, programmers will fashion an appropriate idiom for the job in hand; hopefully, they leave documentation about their choices. You certainly don't need the preprocessor to generate a private language. One or two new control constructs can be introduced on a per-case basis without affecting readability adversely. I'm not suggesting that programmers should be given carte blanche to make their C++ look like Basic or Algol 68, but a case can be made for using statement macros to improve code readability. This is particularly true for the more informal code that gets generated in interactive exploration and test frameworking.
Macros still remain outside of the language, and for this reason, I don't expect much support on this modest position. It is interesting to speculate about what extra features C++ would need to support these custom control structures. Here is what a statement template might look like:
template <class T, class S>
__statement FOR(T t, S e) for(int t = 0; t < e; t++)
It would still probably involve a lexical substitution, but done by the compiler, not the preprocessor. FOR is now a proper C++ symbol and can be properly scoped. Most importantly, potential side effects would automatically be eliminated because e will only be evaluated once. The macro FORALL can now safely be defined. Here is an example that cannot be done reliably using the preprocessor; an alternative implementation of Bjarne Stroustrup's idea of input sequences.
template <class C>
__statement iseq(C c) c.begin(), c.end()
....
copy(iseq(ls),array);
If lexical substitution were more closely integrated into the language, then the preprocessor could finally be retired after a long and curious career.