On October 23, 2011 John McCarthy died. He is remembered as the inventor of Lisp, for contributions to the theory of computation, and for contributions to artificial intelligence. In this essay I argue that he did more than inventing Lisp; even more than originating the concept of high-level programming: he showed to those willing to see it a recipe for designing high-level languages. I support my argument by turning the clock back to 1950 and imagining how McCarthy’s recipe could have been followed at that point in time. By taking this unusual point of view I hope to bring out more clearly McCarthy’s contribution to programming languages.
How can it be relevant to go back so far in history when computing changes so fast? The impression of rapid development is entirely caused by the break-neck speed of hardware development. Compared to this, software moves but slowly. By the mid-1970s the main programming paradigms had been born: imperative, functional, object-oriented, and logic programming.
Most of the languages in use today are descendants of these four paradigms. Paul Graham [Note HundredYearLanguage] explains the difference between hardware and software by observing that a programming language is not a technology but a notation. A notation reflects human thought habits and these change slowly. Functional programming is regarded as a more or less pure form of the lambda calculus; logic programming programming as a more or less pure form of the predicate calculus.
In this essay I imagine us back in 1950. Although the calculi existed at the time I imagine us (busy engineers) ignorant of such esoteric matters. But I do imagine us fired up by the invention of the subroutine so that we want to use this invention to the hilt. I will show that we would have invented the high-level languages in the form of subroutines acting on high-level memories. To get started let me tell how I see the main protagonists of the story.
- Imperative programming
All computation consists of the execution of commands. These are of two types: (1) those that change a state consisting of contents of memory registers and (2) those that transfer control. This branch of the evolutionary tree began with machine code and developed into more and more sophisticated assemblers, Fortran, Algol, and C. A side branch from Algol became Simula, Smalltalk, and later object-oriented languages, where the state is articulated into the separate states of dynamically created objects.
- Functional programming
All computation consists of the evaluation of expressions of the lambda calculus. These are either constants or composite expressions specifying the application of a function to the values of its arguments. The program consists of function definitions and a single expression whose evaluation initiates program executions.
- Logic programming
All computation consists of attempting to prove an instance of a formula (in the sense of predicate logic) with free variables. In case of success the result of the computation is available as a substitution for the variables. A program consists of definitions of predicates and a single formula to be proved.
Of course these brief characterizations omit a lot of detail. But it is remarkable that the rapid development of logic programming in 1971 – 1974 was from the beginning based on predicate calculus. The development of functional programming, which I reckon to have started with Lisp in 1959 and which only reached the ideal of functional programming with Scheme in 1975, followed a more tortuous route.
In 1958, when McCarthy embarked on his new programming language, he had decided on the use of lambda notation, but he ignored the lambda calculus itself. “To use functions as arguments, one needs a notation for functions, and it seems natural to use the lambda-notation of Church. I didn’t understand the rest of the book, so I wasn’t tempted to try to implement his more general mechanism for defining functions.” [Note McCarthyHistory] Thus the Lisp implementation of 1959 was not intended to be an implementation of the lambda calculus; it is surprising how close it later turned out to be.
When it became clear what an implementation of the lambda calculus could look like, Sussman and Steele discovered, by a roundabout route [Note SussmanSteele], that Lisp, modified by replacing dynamic scope with lexical scope, is an implementation of lambda calculus. Hence my dating the origin of functional programming at 1975, preceded by sixteen years of close approximations in the form of the various and widely used dialects of Lisp.
Landin had identified lambda calculus as the basis of a programming language (ISWIM) independently of, and earlier than, Scheme. Although Landin had supplied SECD as an abstract machine for it, I believe that the Scheme of 1975 was the first implementation of functional programming. The SECD machine is in one of the papers of Landin published in 1964 and 1966. I believe the first implementation based on it was Henderson’s Lispkit Lisp of 1980. By the end of the century functional programming had a variety of widely used forms: Scheme, ML, Haskell, CAML. Yet all were recognized as being based on lambda calculus, a mathematical formalism that existed before computers.
The discrete charm of the subroutine
In spite of this overwhelming display of the power of abstract mathematics over programming I like to imagine how functional and logic programming could have been invented independently of lambda or predicate calculus before even Fortran existed. It could have happened like this. Just like E.W. Dijkstra launched a crusade against the goto statement in 1968, a similarly charismatic guru could have done this in 1948. Just as Dijkstra did in 1968, this guru could have written
For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce.
In 1948 the implied message would have been that the closed subroutine should be the first choice for transfer of control, with “closed” meaning that the subroutine only interacts with its environment through its parameters. Turing’s ACE report of 1946 [Note TuringACE] proposes subroutines under the name “subsidiary operation”. What may be the first book on programming [Note WilkesWheelerGill] is organized around the concept of subroutine. As the book was published in 1951 I can imagine this early guru to have had the insight in 1948 that programs be structured around subroutines: that they should consist of subroutine definitions with the sole exception of a single call to initiate the entire program; that each subroutine definition should consist of subroutine calls embedded in a minimum of glue code for the marshalling of arguments in preparation of each call. Let us suppose that the guru would have called this Principled Programming.
There are two ways in which a closed subroutine can interact with its environment: modify the arguments (“procedure” in Algol or “void function” in C terminology) or cause a single result of the body’s computation to replace the subroutine call (“function” in Algol or “nonvoid function” in C). It is in the spirit of Principled Programming to require that a subroutine use only one of these interaction modes. It is in line with human nature that the Principled Programming movement would split up into two warring factions each of which insisting on the superiority of one of these modes: functional versus procedural programming.
When the first Fortran implementation was delivered in 1957, the main innovation was that of the expression. It was this feature that convinced McCarthy of the superiority of Fortran over IPL-V. To continue our minimalist fantasy, let us imagine that the concept of subroutine were only enriched with expressions. In functional programming this eliminates the need for any glue code between subroutine calls in a subroutine body. The body is changed to a single expression and the subroutines calls are changed to subexpressions. Program execution consists expression evaluation only. In effect the entire program consists of a single big expression, broken up into mentally digestible chunks by subroutine definitions.
The process of subroutines calling subroutines is occasionally interrupted by a subroutine body consisting of an expression without subexpressions, a literal that directly denotes a value. Expressions eliminate the need for glue code in functional subroutine programming, thus rendering it pure. The elimination of glue code would leave pure procedural programs with procedure bodies consisting only of procedure calls. In this way the chain of procedures calling procedures can only be stopped by procedures with empty bodies. This is not possible, because their execution would have no effect. Thus the advent of expressions makes functional subroutine programming pure, leaving procedural programming with the need for glue code. Among the devotees of Principled Programming, victory for functional programming.
Yet, with Prolog, pure procedural programming was realized. So there must be procedures with empty bodies, as indeed there are. But calling such procedures causes something to happen because Prolog cheats: when a procedure is called, body execution is not the only thing that happens; it is preceded by unification of the arguments of the call with the parameters of the definition. An argument in a function call is an expression typically containing an unbound variable. Unification with the corresponding parameter in the definition causes this variable to become bound. It is typically bound to an expression containing unassigned variables, making further bindings possible. The variables of Prolog are called logical variables to indicate that they are created unbound and can only be bound once in the same computation thread. A binding can only be undone by backtracking to restore the computation state at the time of binding and continuing the thread some other way.
Three types of memory
Back to reality. Since 1948 imperative programming has evolved to an eclectic mixture of control constructs such as if-, case-, for- and while statements. Memory has differentiated into global, local, and heap memory. In spite of these differences, they are basically memory registers, albeit with a more convenient naming system than raw addresses would provide. The pure kinds of programming admit only the subroutine call as control construct and can be characterized as
FUNCTIONAL PROGRAMMING = SUBROUTINES + EXPRESSION MEMORY LOGIC PROGRAMMING = SUBROUTINES + RATIONAL-TREE MEMORY
Here “rational tree” is the name used by Colmerauer for expressions (that is, trees) in which the leaves are constants or single-assignment pointers to rational trees. Here we see that the two types of programming are made possible by assuming a suitable form of high-level memory, a different type for each type of programming. With this insight, let us review the birth of Lisp.
Graham’s account of what McCarthy did
Graham has written a summary of McCarthy’s original Lisp paper. Great as the original undoubtedly is, it benefits from the thirty years of hindsight that Graham was able to give it. Graham’s summary can be summarized by starting with a number of questions:
- What is a simple and flexible data structure?
- What is a textual expression to represent such a data structure as simply as possible?
- What is a small and complete set of operations and tests on the data structure?
- What is a small and complete set of logical operations?
- How to represent definitions of subroutines in the data structure?
- How to represent calls to these subroutines in the data structure?
McCarthy gave the following answers:
- A possibly empty binary tree.
- A dotted pair or NIL.
- CONS, CAR, CDR, EQ, ATOM
- LAMBDA, LABEL, COND, QUOTE
- As a binary tree. The left subtree denotes the name of function being defined. The right subtree denotes the expression denoting the value of the function. The context provided by reading the source text tells the interpreter which expressions are definitions and which is the expression to be evaluated initially.
- As a binary tree. The left subtree denotes the function, the right subtree denotes its arguments.
Graham then delivers the punch line in the form of McCarthy’s two mutually recursive functions, apply and eval, defined in terms of the primitive functions that specify the value of an arbitrary expression. Eval served as the first Lisp interpreter. The method followed by McCarthy can be generalized to a recipe for an abstract machine. What makes McCarthy’s machine abstract is that it assumes all memory to have the form of binary trees.
The abstract machine consisting of binary-tree memory with CAR, CDR, CONS, ATOM as instructions is interestingly different from the abstract machines for other languages, such as Prolog and Java. In the latter languages an occasional programmer like myself is not capable of writing a procedure or function in abstract machine code. In Lisp, on the other hand, CAR, CDR, CONS, ATOM are the most common functions in every program. This makes Lisp both low-level and high-level, the key to its versatility. In this way we can reconstruct pure functional programming with subroutines from the ground up.
How about pure procedural programming? It is an interesting exercise to accept the above five questions as a recipe for the design and implementation of a logic programming language. What the above way of telling the story of Lisp suggests is that it is the memory model that makes programming high-level. The experience with logic programming and Prolog suggests that rational trees are a suitable starting point. The fact that rational trees enable pure procedural programming makes this interesting enough to apply McCarthy’s recipe to this choice of memory model. As result we should get an interesting reconstruction of logic programming, one of the few truly novel programming paradigms in the, by now, sixty-year history of the art.
[Note HundredYearLanguage] “ The Hundred-Year Language” by Paul Graham.
[Note McCarthyHistory] “History of Lisp” Stanford AI Laboratory memo 1979 by J. McCarthy, page 6.
[Note SchemeRevisit] “The first report on Scheme revisited” Sussman and Steele, 1998.
[Note SussmanSteele] AIM-349 MIT Artificial Intelligence Laboratory, 1975.
[Note TuringACE] “Proposal for the development in the Mathematics Division of an automatic computing engine”, by A.M. Turing, Report E882 Executive Committee NPL 1946. Reprinted April 1972 as NPL report Com.Sci. 57. (Turing proposed subroutines under the name “subsidiary operation”.)
[Note WilkesWheelerGill] Wilkes, Maurice V. et al “The Preparation of Programs for an Electronic Digital Computer”, 1951 This book is organized around the concept of subroutine.