Conversational C++
by Steve Donovan, the Author of C++ by Example: "UnderC" Learning Edition
MAR 15, 2002
Interpreted languages usually have an interactive mode in which expressions and statements can be evaluated in a conversational way, and there is no technical reason why C++ cannot be used like this. According to Steve Donovan, C++ turns out to be a very interesting conversational language.
This article is provided courtesy of Que.
The secret weapon of most popular scripting languages is that they support an interactive mode. LISP and BASIC systems became popular because expressions could be typed in and immediately evaluated. (Obviously, they have other virtues, but they would probably remained unappreciated if they were not so interactive.) This, of course, makes most sense for languages such as LISP, in which everything has a value—even if it is nil, but nearly any programming language can be used interactively. Such an implementation supports a "conversational" style of programming, in which the feedback from the computer is practically instantaneous. Programmers were very productive while working with interactive systems such as LISP, APL or FORTH, despite the limitations of these languages.
The only requirement is that the language be function-based. So, even Pascal can be interpreted interactively1, Eiffel or Java would present problems because they are class-based—there are no free-standing functions. An interactive implementation of Java would necessarily do violence to the language.
The big difference between a batch and an interactive environment is that batch environments have to keep compilation going, whereas interactive environments have to keep evaluation going. Typically, interactive systems stop attempting to compile at the first error. C++ compilers get confused easily, and this in turn confuses the programmer, who really needs to sort out the reported problem first. This is particularly true for the novice programmer. I first started working on conversational C++ systems because I felt they would be very useful tools for the beginner learning the language. Freedom to explore seems an essential part of the learning process, and human language acquisition is bound up initially with talking, not writing and reading.
C++ is, curiously enough, better suited to interactive work than BASIC because expressions can be valid statements. For example, you may type an assignment and you will be shown the value. Similarly, variables are valid (if silly) C++ statements. What I hope to show you in this article is the practicality and the power of using C++ in this interactive mode, and how this can potentially vastly improve programmer productivity.
Interactive C++ needs a slightly relaxed syntax. I'll use UnderC2 for these examples because CINT does not allow function or macro definitions in interactive mode. In standard C++, declarations can be mixed with other statements, but only in class or function bodies. In interactive C++, function definitions can be mixed with statements at global scope:
;> double sqr(double x) { return x*x; }
;> sqr(2);
(double) 4.0
;> double x = 2;
;> x;
(double) x=2.0
These are all valid C++ statements, except there's no scope restriction on their use. Our old and disreputable friend—the preprocessor—becomes very useful, indeed. For example, I commonly define a macro to do the usual for-statement:
;> #define FOR(i,n) for(int i=0; i < (n); i++)
;> FOR(i,5) cout << i << endl;
0
1
2
3
4
Here is another reason why C++ is a better interactive language than Pascal (or even BASIC); it is a terse language ({ for "begin," etc.) that can be made even more compact with macros. Conversational C++ is often throwaway code, and you don't want to waste keystrokes.
Now, making up your own control statements is generally not a good idea in formal C++ (can we say written C++?), but in a conversational context, you can afford to use a more informal style. In a formal legal document in English, you would not use contractions (such as "don't"). I'm going to make a case for (limited) control statement macros in another article, but during an interactive session you may pretty much do anything syntactically legal in the pursuit of faster interaction:
;> #define ALL(ls) (ls).begin(),(ls).end()
;> copy(ALL(ls),arr);
I would certainly hesitate to do this in production code!
C++ was regarded as being too complex to use in an interpretive fashion, but this is an historical problem that has to do with the way the language was initially implemented: preprocessor, translator (to C), compiler (to assembly), assembler (to object code), linker (to executable). A modern interpreter, such as UnderC, pulls input through a built-in preprocessing tokenizer, and compiles directly to pcode—which then is immediately executed. Obviously, this code will be slower than the result of a true compiler, but interactive systems only have to be faster than human reaction time. Generally, if you can do something in less than 150 ms, then you are considered instantaneous. A myth exists that there is a complexity/speed trade-off with interactive implementations, which simply isn't true. CINT and UnderC run rings around Tcl (which is about as simple as a useful language can get).
C++ is terse enough, but perhaps it is too syntactically involved to use interactively. I agree that the for-statement does not trip easily off the fingers, which is why I suggested defining FOR to do all that typing for you. Using FOR is also much less prone to errors; I have been caught more than once in nested loops by the tendency to substitute an i for a j. People will not routinely define functions and classes at the prompt. (If they do, then facilities exist for copying the transcript to a file or to the clipboard.) Complex stuff will be typed into a file, and loaded as needed.
A more interesting question is whether C++ is expressive enough. Pascal is certainly not (as well as not being terse enough). People use application-specific languages because all the tools of their particular trade are a few keystrokes away. An application that illustrates this point is prototyping signal processing algorithms. Years ago, I used a system called Asyst, which was an array-based Forth dialect. Despite being backward in many ways (not just syntactically), it was a tremendously powerful way to experiment with algorithms because the language made things such as Fast Fourier Transforms and data plots available as primitives, which you then could use interactively. But it was not a good applications platform. One of my colleagues uses IDL (a kind of interactive Fortran on steroids, with rich data display features and extended support for vectors) for prototyping algorithms, and again the conversational mode overweighs the disadvantage of working with a niche language. Recently, CH3, a C-like language (with impressive C99 compatibility), is been marketed for the same purpose.
C++ is well-suited to this kind of style due to its capability to precisely define language semantics to fit the desired syntax. Functions may be passed vectors, and they may return vectors. Arithmetic with complex numbers uses the usual notation. For example, an interactive C++ system could allow the prototypes to view the power spectrum of the differences between two data vectors with a line like this:
;> plot(abs(fft(x+y));
This isn't an appropriate style for really high-performance C++ number-crunching (see Blitz C++4), but that is not crucial here. In fact, the interpreter is being used to glue functional components together. The components themselves would be implemented as shared libraries built by an optimizing compiler. I'm hoping to find the time to make such libraries available this year.
This kind of engineering application may seem specialized, but most kinds of projects can benefit from interactive prototyping. At the core of most large projects are a few design unknowns that need to be explored. "Exploratory programming" seems to be the very antithesis of careful software engineering process, but it can contribute to design and implementation in several ways. According to B.A. Sheil5, exploratory programming is about amplifying programmers, whereas structured methods are designed to restrain programmers. Both have their place in the software development process. Another place that the interactive mode has in rigorous development is testing. I find that classes reveal their problems in conversation. Obviously, you need to generate static tests for further verification, but...
Conversational C++ is one way around the header problem described in my article titled "Scripting with C++." There seems no standard way around the fact that large amounts of header information must be included, and these headers make up the largest part of the program text that the compiler must deal with. A little 30-line standard C++ program involving iostreams and strings involves the compiler having to chew nearly 9,000 lines of header files. Invariably, dependencies develop between parts of large programs until building becomes a tedious process, and we have the "nightly build" phenomenon.
In an interactive C++ session, headers can be loaded up front, and much recompiling can be done very quickly because these headers are already present in the system. There is no link phase; all UnderC methods have indirect references to the actual pcode, so there is no need to go through all function references patching addresses. This works particularly well when the headers are for an imported system that rarely changes. In the "Scripting with C++" article, I discuss using C++ as the glue language for VTK applications, in which the problem was loading (and parsing!) 300+ class headers each time a script was to be run. But once the headers are loaded into an interactive session, then VTK test programs in C++ can be recompiled within milliseconds. This suggests to me a way to efficiently organize a C++ scripting environment: The interpreter remains resident as server, and a small client script launcher communicates with it and uses it to execute scripts. The compiler state remains persistent, and headers would only need to be included on the first execution. This would be an excellent implementation for a CGI scripting application, for instance. The scripts themselves could be small main programs that call the rest of the system, and state would automatically be maintained between individual connections.
The dependency problem also becomes more manageable. For instance, a common change to a class is adding another method. Although that class definition may be needed throughout the system (perhaps hundreds of thousands of lines or more) we really need to recompile only the class itself because the class layout has not changed. In fact, you would have to compile only the method itself. Providing you are not adding a first virtual method (and thereby implicitly adding a hidden pointer to the class layout) you should be able to avoid an otherwise tedious build. (If you are getting nervous at this point, remember that dangerous "lazy builds" could, in principle, be detected by the system.)
So, it's possible to create an environment in which large C++ systems can often be modified in milliseconds, rather than minutes or even hours. A criticism at this point is that this "encourages hacking," but by the same token it encourages whole-system testing. Exploratory programming encourages a shift of thinking away from monolithic executables; there isn't really a "program" that is "launched." Rather, you are inside the system. When debugging GUI applications in traditional environments, I have often found that there is a lot of time that is wasted getting to the place you want to test. This is equivalent to the "time to working area" problem of deep mining logistics. But within interactive systems, code may be modified while the system is running. The analogous situation in mining would be repairing equipment underground, rather than having to send it back up to surface.
What I find exciting about these possibilities is that a traditional, strongly typed object-oriented language such as C++ can still be used in an exploratory fashion, and we can break through difficulties that are really caused by traditional implementations.