Why does C not have an exponentiation operator?

The most authoritative source for an answer to the question in the title would be Dennis Ritchie. Next best is Bjarne Stroustrup:

The semantics of C operators should be simple to the point where each corresponds to a machine instruction on a typical computer. An exponentiation operator doesn’t meet this criterion. [1], page 247

Stroustrup’s criterion needs to be taken with a grain of salt. For example, the assignment operator does not meet the criterion: a = b takes a LOAD and a STORE. In this case the criterion translates to: “should correspond to no more than a LOAD and a STORE. But Stroustrup was onto something, because we can tweak his answer to:

The semantics of C operators should be simple to the point where they each compile to code that is as fast as what one can write in assembler.

The example of the assignment operator in a = b is instructive in another way. One could argue that it would be more in the spirit of C not to have the assignment operator in the first place, but instead have two other operators, one corresponding to LOAD and the other to STORE. But that would require C’s computational model to include registers, which it does not [5]. There is a reason for that.

One of the secrets behind C’s success is that just about any machine architecture has byte-addressable random-access memory. Moreover, pick just about any pair of machine architectures, and they differ in their register structures. By excluding registers from its computational model, C has hit the sweet spot in being close to the machine, yet not too close. So there is no place for LOAD and STORE in C’s computational model, while the assignment operator is an operation on random-access memory in the presence of an unspecified complement of registers.

These are some of the hints that C is unique in the way it manages to be low-level and high-level at the same time. Another hint is from Alexander Stepanov’s history of STL [3]. In the early 1980s he had been working on an abstract mathematical approach to a library of algorithms and data structures. His striving for abstractness drove him towards the highest posssible level in the programming language. Stepanov considered Backus’s FP. In collaboration with Kapur and Musser, Tecton was designed. This language was of such a high level that its implementability was in doubt.

When Bell Labs hired Stepanov in 1987 for work on a library for C++, Andy Koenig taught him the semantics of C. Stepanov writes: “The abstract machine behind C was a revelation” [3]. Some way past reading this I did a double-take back to this sentence: abstract machine behind C? What’s going on here? Isn’t C the one language not in need of any abstract machine? Isn’t C the one language in which to write abstract machines? Probably Koenig wasn’t talking about an abstract machine and Stepanov introduces the term to say that the way C relates to its computational model was surprising to him. In the sequel I’ll follow Stepanov in abusing the term “abstract machine” to refer to the unwieldily named “computational model”. I’ll go further and suggest that “abstract machine” is a valuable addition to one’s conceptual tool kit.

For example, “abstract machine” is a concept that clarifies an otherwise unfathomable utterance in “The Roots of Lisp” by Paul Graham [2].

It seems to me that there have been two really clean, consistent models of programming so far: the C model and the Lisp model. These two seem points of high ground, with swampy lowlands between them. [2]

It is intriguing that Graham, for whom Lisp is the one and only language, singles out C as the only other language worth knowing about. What could these languages possibly have in common so that they stand out in the vast landscape of languages? In the following I will argue that the answer is: the way these two languages relate to their (implied) abstract machines.

In “The Roots of Lisp” [2] Graham gives a précis of John McCarthy’s seminal paper [4] in which McCarthy describes the language he discovered. Graham first introduces the list. Expressions in Lisp language are lists. At the same time lists are data structures when viewed from the machine end. Seven primitive operations on expressions are introduced: ATOM, QUOTE, EQ, CAR, CDR, CONS, and COND. To these are added a notation for functions by means of the keyword LAMBDA; the keyword LABEL is added to make recursive function definitions possible. I speak of “primitive operations” advisedly because this term can refer to functions taking a data structure as argument as well as to commands to change the state of a machine. And thus to invite comparison of Lisp with languages implemented via abstract machines.

With these nine keywords of Lisp the function EVAL is defined which takes as its argument an expression and yields the value of that expression as result. EVAL computes any computable function; not a surprise, given earlier theoretical results in the lambda calculus. McCarthy presented his discovery as a theoretical result that exhibits a (to his taste) superior alternative to the Turing machine as basis for computing. One of McCarthy’s students, Steve Russell, did not see the theoretical intention as an obstacle to implementing lists and the nine primitives in assembler on the IBM 704, resulting in, presto, the first Lisp implementation.

McCarthy’s paper [4] gives the definition in Lisp of mutually recursive functions APPLY and EVAL with the former as top-level call. Graham’s cleaned-up version only defines EVAL, with the same result. In both cases the definitions occupy about a page of text.

One page of language L to implement L in itself. Of course every language can be implemented in itself. Fortran, Basic, … even COBOL, if you insist. I doubt whether such implementations would be readable. McCarthy’s is, though I prefer Graham’s typography. But what I find even more striking is that the seven primitives occur frequently in all Lisp code. Compare that with languages based on an abstract machine, like Prolog and Java.

This invites a comparison between the complexity of the Java Virtual Machine with that of the software infrastructure needed to support McCarthy’s seven primitives, given today’s typical machine architecture. If these complexities are about the same, then Lisp is a low-level language compared to Java.

Let us return to C. Unlike Lisp, neither the language nor its definition invites us to think in terms of an abstract machine. But, Stroustrup’s remark suggests that fragments of an abstract machine one can be ferreted out from the semantics of the language. The arithmetic operators correspond to machine instructions, but they are mostly the same as in just about any other language. What is more intriguing is that assignment and dereferencing are frequently used and close to the machine.

From the point of view of primitive operations, the level of Lisp is as low as you can get: if you count function call as primitive, then all Lisp code consists of primitives. C is lower-level than other languages, with the exception of Lisp: in C there are, apart from operators and function calls, many language features that do not fall into these two categories.

This situation suggests a tantalizing project: explicitly define the C abstract machine and define a new language (“TRUE C”?) entirely in terms of operations on this machine.

But even in its current state C is special: it is unique in the imperative part of programming in the same way that Lisp is unique in the functional part. Maybe that is why Graham considered Lisp and C as high points in the swampy lowlands that make up the landscape of programming languages.

Acknowledgments

Thanks to Paul McJones for corrections and helpful remarks.

References

[1] “The Design and Evolution of C++” by Bjarne Stroustrup. Addison-Wesley, 1994.
[2] The Roots of Lisp by Paul Graham.
[3] Short History of STL by A. Stepanov.
[4] “Recursive functions of symbolic expressions and their computation by machine, Part I.” by John McCarthy. Communications of the ACM 3.4 (1960): 184-195.
[5] Declaring a variable as “register” in C does not imply that registers exist in the computational model of C. It is merely a suggestion to give the variable priority for being kept in the unspecified set of available registers.

This entry was posted on March 15, 2016 at 12:58 pm and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

A Programmers Place