In the final paragraph of my 2009 article “Software Engineering: From Craft to Industry?” [8], I ventured to disagree. From the final paragraph:
While the processing of material leaves an irreducible residue of work for humans, in the processing of information any work that is routine instantly vanishes. Extracting the routine part from an information processing task is a creative endeavour. It is called programming. In the building of a software system any time you think you have something routine to be handed over to managed cubicle-dwelling drones, [9], you are missing an opportunity for automation. In the building of a software system there is only room for creative work. It is Craft, irreducibly so.
At the time I had read John Allen’s “Whither Software Engineering?”. I found it fascinating, but dismissed it as unrealistic and I was not convinced of its urgency. This article explains why I changed my mind.
The standard model in software development is, and has always been, to follow the test-debug cycle. I call it the standard model not because of any virtues, but because of the lack of alternatives, unless it would be to prove software correct, a notion universally rejected as utterly unrealistic (but about which more later).
The problem with the standard model can be expressed by a truism that is by now so old that nobody dares any more to utter it, or even to remember it. The problem with truisms is that some of them are true. Driven by today’s dire circumstances I’ll resurrect it here:
Testing can prove the incorrectness of code, but not its correctness.
Ever since the days of Robert Morris, Jr (he of the worm) and Clifford Stoll (he of “The Cuckoo’s Egg”) the (implicit) thinking has been: any time now the software/hardware substrate will be good enough to network everything in sight: student records, bank accounts, patient records, to mention just some of the more ransomable things. There is no need to wait. In the blind drive to network everything in sight, ever higher towers are being built without waiting for a foundation: teetering towers in a swamp.
Such was the spectre in front of John Allen’s eyes ten years ago when he wrote his remarkable paper [1], precariously published in an obscure corner of the internet. He faces the inescapable fact that the only alternative to the test-debug cycle is proof of correctness. He sketches certain advanced concepts in logic, advanced only in the sense that they are beyond the current undergraduate curriculum. According to these concepts, proofs can be constructive, constructive in the sense of also being programs in a suitable programming language. Allen mentions the programming language ML as an example.
“Whither software engineering?” [1] was presented at the 2008 Workshop on Philosophy in Engineering. My guess is that it has been read by few, if any, of those who are in a position to do anything about the parlous state of systems software. Those few have probably dismissed the paper as an extreme example of academic lack of realism: to redo operating systems as proofs in constructive type theory! Yet … I will let a recent paper [2] speak for itself:
FSCQ is the first file system with a machine-checkable proof that its implementation meets a specification, even in the presence of fail-stop crashes. FSCQ provably avoids bugs that have plagued previous file systems, such as performing disk writes without sufficient barriers or forgetting to zero out directory blocks. If a crash happens at an inopportune time, these bugs can lead to data loss. FSCQ’s theorems prove that, under any sequence of crashes followed by reboots, FSCQ will recover its state correctly without losing data.
The authors use the Coq proof assistant with a new variant of Hoare logic [3, 4]. The correctness properties are stated as theorems in the Coq variant of constructive type theory. The proofs are written by the authors and are checked by Coq running as proof assistant. The extraction feature of Coq converts the Coq programs to Haskell, the functional programming language. The resulting programs replace functions in the file system to run unmodified Unix applications such as vim, emacs, git, gcc, make, and a mail server.
There is considerable variation in performance among the various alternative file systems that can be used with Unix. Among the various benchmarks FSCQ is usually the slowest, though not by more than a factor two compared to the average of the other file systems. Thus the FSCQ project shows that Allen’s vision is technically realistic.
Is the FSCQ project the harbinger of a trend that might rescue the teetering towers? The six authors of [2] represent an unusual mix of skills, covering both systems programming and constructive type theory. The project suggests that logic can play the same role in software development as is played by mathematics in the established branches of engineering. It will take a long time before anything can happen on the required scale. The depth of change necessary make one doubt whether this is possible. Allen believes it is. To convince his readers he includes a sketch of the history of engineering.
Allen starts by reviewing the history of what is now called “software engineering”. The term was coined [5] at the conference mentioned earlier that was convened under the pressure of what was perceived at the time as “the software crisis”. It had been noticed that the problems encountered in the established branches of engineering were not nearly as severe as the ones in software development. Therefore, a new discipline called “software engineering” was called into existence, by fiat. Allen’s paper helps us understand the difference between engineering and the simulacrum thus called forth.
Established engineering programs require courses in calculus, physics, and linear algebra in the early part of the curriculum. This in spite of the fact that the students cannot yet see their use. These courses are required because the proven effectiveness of their content in engineering. For example, the behaviour of an antenna can be predicted with the theory of the electro-magnetic field, and this theory can only be understood with calculus and vector analysis. If, as often happens, a prospective “software engineer” is required to take calculus, then it is not because calculus can help to make better software.
It is not clear whether any science can help. It is only recently, with papers such as Allen’s and the one on FSCQ, that some inklings have appeared as to what kind of science can be helpful. It will take time before this clarifies and then it will take time before it is lodged in the curriculum as solidly as mathematics is in the established branches of engineering. Only then will software engineering deserve the name.
“Whither software engineering?” describes how it took a long time for the established branches of engineering to become based on mathematics. The first application of calculus in engineering may have occurred as early as 1742 with the publication by Benjamin Robins of New Principles of Gunnery. This book was adopted in the École Royale du Génie, the engineering school founded in the mid-eighteenth century in Mézières in France.
A hundred years after this, the use of mathematical methods was still controversial, at least in Britain. This was apparent when the first trans-atlantic telegraph cable was laid. The mathematical analysis by the physicist William Thomson indicated that the input signal could be of moderate voltage; the resulting weak output signal could be compensated by making the detector at the receiving end extremely sensitive. The Chief Electrician of the cable company dismissed Thomson’s mathematical analysis as “a fiction of the schools” that was contrary to common sense; common sense which dictated unprecedented high input voltages commensurate with the unprecedented length of the cable. Subjected to such an onslaught the cable failed within a few weeks [6]. Although this, and publications of William Thomson, led to the dismissal of the Chief Electrician, the battle continued for the minds of electrical practitioners between “the practical men” and “the theoretical men” till the end of the 19th century.
The École Royale du Génie, was founded in 1748. This was a step, possibly the first, of placing engineering on a mathematical foundation. Almost 150 years later the transition was not yet complete: in 1893 William Preece was inaugurated as president of the (British) Institute of Electrical Engineers. From his inaugural address:
True theory does not require the abstruse language of mathematics to make it clear and to render it respectable. … all that is solid and substantial in science and usefully applied in practice, has been made clear by relegating mathematical symbols to their proper store place—the study [7]. .
In the face of continuing increasingly disastrous failures, the practical men of today do not seem to be looking for an alternative to software that has as only credential that it has been around the test-debug cycle a number of times. Allen’s paper and the FSCQ system may offer hope for an effective alternative. Do they? If not, is software engineering possible?
Thanks to Paul McJones for pointing me to the FSCQ paper and thus providing the motivation to revisit [8].
[1] “Whither software engineering” by John Allen. Workshop on Philosophy and Engineering. London, 2008.
[2] “Certifying a File System Using Crash Hoare Logic: Correctness in the Presence of Crashes” by Tej Chajed, Haogang Chen, Adam Chlipala, M. Frans Kaashoek, Nikolai Zeldovich, and Daniel Ziegler. Comm. ACM, vol. 60, no. 4 (April 2017), pp 75–84.
[3] “An axiomatic basis for computer programming” by C.A.R. Hoare. Comm. ACM 12.10 (1969): 576-580.
[4] “Ten years of Hoare’s logic: A survey—Part I” by K.R. Apt. ACM Transactions on Programming Languages and Systems (TOPLAS) 3.4 (1981): 431-483.
[5] Report on a conference sponsored by the NATO Science Committee Garmisch, Germany, 7th to 11th October 1968.
[6] Oliver Heaviside: sage in solitude by Paul Nahin. IEEE Press, 1987, page 34.
[7] Journal of the Institution of Electrical Engineers, Volume 22 (1893), Address of the President, page 63.
[8] “Software Engineering: From Craft to Industry?” by M.H. van Emden, wordpress, 2009, http://tinyurl.com/n69ymao
[9] I am aware of the biologically unfortunate analogy: it should be “worker bees in cubicles”. The Oxford English Dictionary recognizes the figurative use of “drone”, but there it means a non-working member of the community. But “drones” as in “drones in cubicles” lodged itself in the contemporary idiom: on May 20, 2017 this search string registered 1430 hits on Google.
Logic programming shares with mechanical theorem proving the use of logic to represent knowledge and the use of deduction to solve problems by deriving logical consequences. [1]
X: Ah, I see—a kind of Artificial Intelligence, which I am not so much interested in. Yet, I find a lot of interesting stuff in Logic Programming, the journal, in its first decade, and in several books of that time, Sterling and Shapiro’s The Art of Prolog and O’Keefe’s The Craft of Prolog. Why listen to Kowalski anyway?
Y: He invented “Logic Programming” as a term and substantiated his definition with his book [2]. Moreover, his “Predicate logic as a programming language” [3], established him as a co-inventor, with Colmerauer, of pure Prolog.
X: It seems that the term “Logic Programming” has been hijacked by people interested in what makes Prolog different from other programming languages and in how logic can be used as starting point in the design of programming languages.
Y: One of the ways in which Prolog is different from other programming languages is that it avoids the correctness problem.
X: Oh? How so?
Y: A program in pure Prolog is a sentence of logic. The results of running it are logical implications of that sentence. So, if you write the program as a specification, then the correctness problem is avoided because you are running the specification.
X: Not even Kowalski believes that. In “Algorithm = Logic + Control” [4] he uses as example the sorting of a list. He begins with the definition saying that the result of sorting is an ordered permutation of the input list. He refers to his 1974 paper [3], the one that launched (what came to be called later) logic programming. In [4] he points out that, though different controls give different algorithms, none is efficient enough to be acceptable. Then he presents a definition that reads just like, say, the C program for quicksort in Kernighan and Ritchie [13], and calls it “the logic of quicksort” (I have taken the liberty to add the emphasis).
Y: The fact remains that the pure specification of sortedness is executable.
X: In case you mean the sorting program in [3], here it is, with the minimal changes to make it run under SWI Prolog anno 2017:
pSort(X,Y) :- perm(X,Y), ord(Y). perm([],[]). perm(Z,[X|Y]) :- perm(Z1,Y), del(X,Z,Z1). del(X,[X|Y],Y). del(X,[Y|Z],[Y|Z1]) :- del(X,Z,Z1). ord([]). ord([X]). ord([X,Y|Z]) :- le(X,Y), ord([Y|Z]). le(1,2). le(2,3). le(1,3). le(X,X).
Sure, the definition of pSort is acceptable as a specification, but look at the definition of permutation! It is an algorithm for checking permutedness: keep removing elements from the one list and delete each from the other list. If all these deletions succeed and if the other list ends up empty, then the two lists you started out with are permutations of each other.
Y: What’s wrong with that?
X: A specification of sortedness would have to refer to a specification of permutedness and I see an algorithm instead. To be algorithmically neutral you would have to take the mathematical definition, namely “a permutation of a finite set is an invertible mapping onto itself”. Don’t ask me how you say that in logic.
Y: OK, supposing that the example of sorting shoots down the general notion of logic programs as runnable specifications. Is there any way in which logic programming can alleviate the correctness problem?
X: Funny I should be asked, but actually, there is. Not by logic programming primarily, but by logic generally, yes.
Y: Oh? How so?
X: For that we need to go back in the programming literature, to before Colmerauer’s collaboration with Kowalski, namely to Peter Naur’s “snapshots” [5] and Robert Floyd’s “verification conditions” [6]. Floyd considered properties of program fragments expressed as {P}S{Q} (in current notation), where S is the program fragment. The fragment can comprise almost all of a procedure body’s code or as little as a single assignment statement. P is the precondition and Q is the postcondition. These are conditions (also called “assertions”) in the form of a logic formula expressing a relation between values of variables occurring in S.
The meaning of {P}S{Q} is: if P is true at the start of an execution of S and if this execution terminates, then Q holds upon termination. The notation “{P}S{Q}” is the currently used variant of one introduced by C.A.R. Hoare [7]. We call such an expression a “(Hoare) triple”.
Floyd showed how to combine statements in logic about program fragments into a statement about the larger program fragment that results from combining the fragments.
Y: I trust it works for the examples given by Naur and Floyd, but do you know any other ones?
X: I know lots of other examples. My current favourite is the clever algorithm (due to Josef Stein [10]) for computing in logarithmic time the gcd (greatest common denominator) of two numbers. Here it is in C:
// Program 1 void s(int x, int y, int& z) { // precondition s: x == x0 && y == y0 && x0 > 0 && y0 > 0 // purpose: return z == gcd(x0, y0) int fac = 1; inv: if (x%2 == 0) goto a; else goto b; a: if (y%2 == 0) { x /= 2; y /= 2; fac *= 2; goto inv;} else { x /= 2; goto d; } b: if (y%2 == 0) { y /= 2; goto b; } else goto c; c: if (x == y) { z = x*fac; return; } if (x < y) { y = (y-x)/2; goto b; } else { x = (x-y)/2; goto d; } d: if (x%2 == 0) { x /= 2; goto d; } else goto c; }
Y: I see C code, I see labels, but where are the assertions?
X: Oops. Here it is with assertions inserted.
// Program 1a void s(int x, int y, int& z) { // precondition s: x == x0 && y == y0 && x0 > 0 && y0 > 0 // purpose: return z == gcd(x0, y0) // s: x > 0 && y > 0 && x == x0 && y == y0 int fac = 1; inv: // x > 0 && y > 0 && gcd(x0, y0) == fac * gcd(x,y) if (x%2 == 0) goto a; else goto b; a: // inv && even(x) if (y%2 == 0) { x /= 2; y /= 2; fac *= 2; goto inv;} else { x /= 2; goto d; } b: // inv && odd(x) if (y%2 == 0) { y /= 2; goto b; } else goto c; c: // inv && odd(x) && odd(y) if (x == y) { z = x*fac; return; } if (x < y) { y = (y-x)/2; goto b; } else { x = (x-y)/2; goto d; } d: // inv && odd(y) if (x%2 == 0) { x /= 2; goto d; } else goto c; }
Here are examples of the triples hiding here:
{a && even(y)} x /= 2; y /= 2; fac *= 2; {inv} {a && odd(y)} x /= 2; {d}
Floyd’s theorem applied to this example says that if the atomic triples are true, then the one for the whole function (which says that it returns the gcd) is true.
Y: I think I see that your example triples are justified. I suppose that, once the assertions are there, one could check all the triples. But how can I find the necessary assertions? To do that I would need to understand the utterly obscure code that you first presented. That is, I would need to already see that the function is correct before I can even start thinking about assertions.
X: I agree that it’s hard to find useful assertions in existing code. The difficulty was acknowledged by Edsger Dijkstra:
When concern for correctness comes as an afterthought, so that correctness proofs have to be given once the program is already completed, then the programmer can indeed expect severe troubles. If, however, he adheres to the discipline of producing correctness proofs as he writes his program, he will produce program and proof with less effort than just the programming alone would have taken [8].
Y: Discipline indeed! More like black magic: “… producing correctness proofs as he writes his program“. I would like to see examples of that.
X: He doesn’t give any in [8]. I suppose that his subsequent books [11, 12] illustrate his idea of “… producing correctness proofs as he writes his program”.
Y: I’d like to go back to that gcd program. You gave two triples as examples. There should be more. I’d like to see them all.
X: The program has access to variables x,y,z, and fac. On entry the values of x and y are x0 and y0. On exit we want to have z == gcd(x0,y0). First a list of the assertions with their labels:
s: x == x0 && y == y0 && x0 > 0 && y0 > 0 inv: x > 0 && y > 0 && gcd(x0, y0) == fac * gcd(x,y) a: inv && even(x) b: inv && odd(x) c: inv && odd(x) && odd(y) d: inv && odd(y) h: z == gcd(x0,y0)
Note that all the assertions except s imply inv. Next, the triples:
{s} fac = 1 {inv} {inv && even(x)}{a} {inv && odd(x)}{b} {a && even(y)} x /= 2; y /= 2; fac *= 2; {inv} {a && odd(y)} x /= 2; {d} {b && even(y)} y /= 2; {b} {b && odd(y)} {c} {c && x == y} z = fac*x; {h} {c && xy} x = (x-y)/2; {d} {d && even(x)} x /= 2; {d} {d && odd(x)} {c}
Y: I count eleven. Makes me think of one of the quotes from Alan Perlis: “If you have a procedure with ten parameters, you probably missed some.” How do you know that you have all triples?
X: There is a progression in the order they are written. In the beginning new assertions are needed. Then you have to add triples starting from the new assertions. But after the last four triples were added, no new assertions have appeared.
You don’t find a label s in the code because s is not in any postcondition. You don’t find a label h in the code because h is not in any precondition.
Y: You say that in {P}S{Q} P is the precondition, Q is the postcondition and S is the code. What does {P}S{Q} say as a formula of logic?
X: If P is true in state T and S transforms T to U, then Q is true in U. It connects to code by associating P and Q with locations in the code. Such a statement is true or false. Of course to be of any use, it has to be true. I wrote them down with a progression in mind, but once they are there, their order does not matter. I like to see them as independent “snippets of truth”.
Y: I like that: “programming by gathering snippets of truth”. I would like to make the generic snippet {P}S{Q} more precise. How about the following.
Q is true in a state T iff there is a satisfactorily terminating computation starting from location Q in state T.
And similarly for P. Then I could translate {P}S{Q} to the Prolog dialect of logic as:
p(X,Y,Fac,Z) :- ..., q(X1,Y1,Fac1,Z1).
Here the … stands for Prolog code that describes the code of
a triple from state (X,Y,Fac,Z) to state (X1,Y1,Fac1,Z1).
X:
It looks backward; the opposite of the reading of
{P}S{Q} in Floyd and Hoare, who reason from P to Q. In your
way of making it precise, this reads
P is true in a state T iff there is a
computation starting from the start location and ending
in location P in state T.
And similarly for Q.
Then I could translate {P}S{Q} to the Prolog dialect of logic as:
q(X1,Y1,Fac1,Z1) :- p(X,Y,Fac,Z), ... .
Y:
Right. There are apparently two translations, a forward one
like yours, and a backward one like mine.
As one can’t do both at the same time, I’ll pick the backward one.
The following is the snippet-by-snippet translation.
% Program 2 s(X, Y, Z) :- inv(X, Y, 1, Z). inv(X, Y, Fac, Z) :- even(X), !, a(X, Y, Fac, Z). inv(X, Y, Fac, Z) :- odd(X), b(X, Y, Fac, Z). a(X, Y, Fac, Z) :- even(Y), !, X1 is X/2, Y1 is Y/2, Fac1 is 2*Fac, inv(X1, Y1, Fac1, Z). a(X, Y, Fac, Z) :- odd(Y), X1 is X/2, d(X1, Y, Fac, Z). b(X, Y, Fac, Z) :- even(Y), !, Y1 is Y/2, b(X, Y1, Fac, Z). b(X, Y, Fac, Z) :- odd(Y), c(X, Y, Fac, Z). c(X, Y, Fac, Z) :- X =:= Y, !, Z is X*Fac. c(X, Y, Fac, Z) :- XY, X1 is (X-Y)/2, d(X1, Y, Fac, Z). d(X, Y, Fac, Z) :- even(X), !, X1 is X/2, d(X1, Y, Fac, Z). d(X, Y, Fac, Z) :- odd(X), c(X, Y, Fac, Z).
I’ve been so free as to smooth over the clumsy Prolog arithmetic by making use of, and adding:
even(X) :- X mod 2 =:= 0. odd(X) :- X mod 2 =:= 1.
The result is a Prolog program, ready to run, as in
?- X is 123*4567, Y is 123*5678, s(X,Y,Z). X = 561741, Y = 698394, Z = 123 .
X: I see what you did: you just translated every snippet to a procedure.
Y: Not a procedure in the conventional sense. For example, you have to somehow bundle the two snippets for inv into one conventional procedure. In Prolog you don’t have to combine any snippets; you translate each of them as is.
X: If you are willing to overlook this trivial difference, then I can do in C [14] what you just did in Prolog.
// Program 3 // the function declarations (not always necessary) void s(int x, int y, int& z); void inv(int x, int y, int fac, int& z); void a(int x, int y, int fac, int& z); void b(int x, int y, int fac, int& z); void c(int x, int y, int fac, int& z); void d(int x, int y, int fac, int& z); // the function definitions void s(int x, int y, int& z) { // precondition s: x == x0 && y == y0 && x0 > 0 && y0 > 0 // purpose: return z == gcd(x0, y0) inv(x, y, 1, z); } void inv(int x, int y, int fac, int& z) { // precondition: inv: x > 0 && y > 0 && fac*gcd(x,y) == gcd(x0,y0) if (x%2 == 0) a(x,y,fac,z); else b(x,y,fac,z); } void a(int x, int y, int fac, int& z) { // precondition: a: inv && even(x) if (y%2 == 0) inv(x/2,y/2,fac*2,z); else d(x/2,y,fac,z); } void b(int x, int y, int fac, int& z) { // precondition: b: a && odd(x) if (y%2 == 0) b(x,y/2,fac,z); else c(x,y,fac,z); } void c(int x, int y, int fac, int& z) { // precondition: c: inv && odd(x) && odd(y) if (x == y) { z = x*fac; return; } if (x < y) b(x,(y-x)/2,fac,z); else d((x-y)/2,y,fac,z); } void d(int x, int y, int fac, int& z) { // precondition: d: inv && odd(y) if (x%2 == 0) d(x/2, y, fac, z); else c(x,y,fac,z); }
Come to think of it, this is a way to structure code that is trivial to verify, given its close correspondence to the snippets of truth. We have discovered how to do better than “… producing correctness proofs as he writes his program” by writing a correctness proof before writing the program!
Y: Still, even such C code is at one remove from truth, because it is not itself logic. Pure Prolog is.
X: Minor quibble: the (essential) use of “is” is not part of pure Prolog, so disqualifies the Prolog program as logic. And what are those cuts doing there? Major quibble: I deny that the Prolog clauses are any more logic than properly written C. The Prolog clause states how a problem can be reduced to sub-problems. It’s great that you can do that in logic. But there is nothing more in the Prolog clause than that problem reduction, which can be expressed in C just as well. In fact, in 1932 Andrej Kolmogorov pointed out [9] that predicate calculus can be interpreted as a calculus of problems solved and to be solved.
Y: I see that your translation of the snippets to C expresses just as well as what the logic clauses express. The difference between Prolog and C is that in Prolog you can’t do anything but write sentences of logic. Prolog can’t force you to make these sentences true, but it does force you to write your snippets in such a way that they are true or false according to the formal semantics of logic. In C you can write things that have no logical interpretation—most C programmers do it all the time. It may be that just now you were the first ever to write logic in C.
X: Of course my translation of the snippets to C functions resulted in a ridiculous program. The complex structure of Program 1 is caused by the desire to avoid redundant tests for the parity of x and y. Such tests only require a single shift, one of the fastest instructions. To save some of these, the program repeatedly unleashes the whole protocols of procedure entry and exit.
However, inspection of that code shows that it is an easy victim of tail recursion optimization, a standard technique. So instead of the procedural version, I should have presented the result of this optimization:
// Program 4 void s(int x, int y, int& z) { // precondition s: x == x0 && y == y0 && x0 > 0 && y0 > 0 // purpose: return z == gcd(x0, y0) int fac = 1; inv: if (x%2 == 0) goto a; else goto b; a: if (y%2 == 0) { x /= 2; y /= 2; fac *= 2; goto inv;} else { x /= 2; goto d; } b: if (y%2 == 0) { y /= 2; goto b; } else goto c; c: if (x == y) { z = x*fac; return; } if (x < y) { y = (y-x)/2; goto b; } else { x = (x-y)/2; goto d; } d: if (x%2 == 0) { x /= 2; goto d; } else goto c; }
Y: Well, well. Quite a transformation! Are you sure that all these changes are necessitated by tail-recursion optimization?
X: Yes. It’s perfectly standard. If you are at all a programmer, you do it half asleep.
Y: Did you know that since around 1980 Prolog compilers perform tail recursion optimization? So what shows as procedure calls in my translation of the snippets is lost in compilation.
X: No, I didn’t know that. Does this mean that your Prolog program runs as fast as my optimized C program?
Y: Hey, wait a moment! That result of your manual tail-recursion optimization is identical to Program 1!
X: Isn’t it to be expected that you return to a similar program by following the route that we did?
Y: Of course, but the code is identical. Even the minor inconsistency in formatting is reproduced. What’s going on?
X: No need to freak out. We act under the illusion of free will while in actual fact we may be puppets in somebody’s game. Any time now the puppet master will close down the show: last chance for a wrap-up.
What’s new to me is that one can start by writing Hoare triples for an as yet unknown algorithm, guided by the goal and the constraints in achieving it. In this case the goal was computing the gcd in logarithmic time under the constraint of not repeating a test for parity. It was clear how the set of triples needed to be expanded and it was clear when this set was complete. We could have skipped the detour via Prolog and gone direct to C functions and then perform tail-recursion optimization. Because of the route taken it was clear how to verify this code, because we started with the assertions.
Y: For me this conversation has been useful in clarifying Prolog programming. For the programmer, logic has no advantage that is not available in C. The opposite is true for the language designer. Only by writing, with Kowalski’s input, a theorem-prover could Colmerauer and his team have produced the miracle that is the programming language Prolog.
Thanks to Paul McJones and Richard O’Keefe for their help.
[1] “The early years of logic programming” by R.A. Kowalski, Comm. ACM, January 1988.
[2] Logic for Problem-Solving by R.A. Kowalski. North Holland Elsevier, 1979.
[3] “Predicate logic as a programming language” by R.A. Kowalski. Information Processing 74, North-Holland, 1974, pp 569–574.
[4] “Algorithm = Logic + Control” by R.A. Kowalski. Comm. ACM 22:7(1979), pp 424–436.
[5] “Proof of algorithms by general snapshots” by P. Naur. BIT Numerical Mathematics, 6:4 (1966), pp 310–316.
[6] “Assigning meanings to programs” by R. Floyd. Proc. Symp. Appl. Math. vol. 19: “Mathematical Aspects of Computer Science”, pp 19–32.
[7] “An axiomatic basis for computer programming” by C.A.R. Hoare. Comm. ACM 12:10(1969), pp 576–580.
[8] “Concern for correctness as a guiding principle for program composition” by E.W. Dijkstra. Pages 359–367, The Fourth Generation, Infotech, 1971. See also EWD 288.
[9] “Zur Deutung der Intuitionistischen Logik” by A. Kolmogorov. Mathematische Zeitschrift 35.1 (1932): 58-65.
[10] Elements of Programming by A. Stepanov and P. McJones. Addison-Wesley, 2009.
[11] A Discipline of Programming by E.W. Dijkstra. Prentice-Hall, 1976.
[12] A Method of Programming by E.W. Dijkstra and W.H.J. Feijen. Addison-Wesley, 1988.
[13] The C Programming Language by B.W. Kernighan and D.M. Ritchie Prentice-Hall, 2nd edition 1988, p. 87.
[14] Strictly, C++ rather than C. Calling it C++ would wrongly call up associations with object-oriented programming. What we want here is C as-it-should-be, that is, with only call-by-reference added.
All branches of knowledge had vigorously developed in the first half of the 20th century. All of it had been sustained by what I like to call a conversation: an open exchange of knowledge in books and journals. Before World War I this was also true for cryptology; afterwards, traffic on that channel fell silent. By the end of the 20th century the cryptology conversation was intense, wide-ranging, and immensely productive of innovations, of which bitcoin technology is but one example. In this post I trace the chain of events that led cryptology from its dark age, which lasted from 1918 to 1967, to its renaissance. My material is obtained, unless otherwise noted, from Crypto, a book by Steven Levy, published in 2001 [2].
The first of these events is the effect of the 1960 defection of Martin and Mitchell on David Kahn, a journalist for Newsday. Although Kahn was, like many others, an avid cryptology hobbyist and although as a journalist he kept eyes and ears open for anything to do with his pet subject, the existence of the NSA, as revealed by the Martin-Mitchell defection, was a revelation to Kahn.
After writing a background article for the New York Times Book Review, Kahn received offers from publishers to write a book. MacMillan, the one selected by Kahn, sent the manuscript to the Department of Defense for review. In his exposé of the NSA, The Puzzle Palace, James Bamford wrote that “innumerable hours of meetings and discussions, involving the highest levels of the agency, including the director, were spent in attempts to sandbag the book”. The reaction of the Department of Defense was that “publication would not be in the national interest”. When MacMillan did not respond by undertaking to refrain from publication, the director of the NSA met with the chairman of MacMillan, the editor, and the legal counsel to make a personal appeal for three specific deletions. Kahn considered these surprisingly inconsequential, and agreed. In return, the book was allowed to include the statement that it had been reviewed by the Department of Defense.
Kahn’s The Codebreakers [1] never became a bestseller, but sales remained steady for a long time. Its importance is due to the second link in the chain of events recounted here: it was found by the one person who desperately needed it and who was destined to change the course of the history of cryptology. That person was Whitfield Diffie
As a high-school student, Diffie had been fascinated by turning messages into cipher-encrypted mysteries. When he was an undergraduate at MIT, the aura of cryptology was eclipsed by the glamour of modern mathematics. When Diffie graduated in 1965, he found an effective way of evading the draft by taking a job at the Mitre Corporation, also in Cambridge, Massachusetts. His supervisor was a mathematician named Roland Silver. The work was for a project jointly undertaken with the MIT AI Lab, which became Diffie’s work location. This was the time when computer time-sharing systems were still experimental. But they were also used. CTSS, one of the systems, required users to have passwords. Many were opposed, with the result that the password file, in care of the system administrator, kept being hacked. Another timesharing system, confined to an inner circle of hackers, was called ITS, for Incompatible Timesharing System (CTSS stands of Compatible Time Sharing System), did not require passwords: every file was accessible to anyone.
Diffie was strongly in favour of privacy, but was not satisfied with CTSS where he had to trust his password to the system administrator. This reminded him of his boyhood hobby, cryptography. But this only tells you how to encrypt your files. If you want to share these files with someone else, you need to share the key, which could not be done securely in CTSS.
When Diffie discussed this problem with his boss, it transpired that a lot more was known about cryptology than was familiar from the hobbyist literature. Silver could infer this much, without being party to any indiscretions, from his contacts at NSA. Diffie was hit as by a lightning bolt by the twin insights: Cryptography is vital to privacy, clear from his experience with computer time-sharing at the AI lab, and now: Crucial information is being withheld on purpose. In fact, this organization acted as if it were the sole proprietor of the relevant mathematical truths. Diffie was electrified by the challenge to rediscover enough of the mathematics to rescue privacy of computer users, a category of people that, Diffie felt, would soon include many more than the researchers of the AI lab.
By 1969 Diffie was approaching draft cut-off age, so no longer needed the shelter of a defense contractor like Mitre. Diffie found a job at John McCarthy’s AI lab at Stanford. It is hard to overrate McCarthy’s stature in computer science. As a fresh PhD in mathematics he had invented the concept of Artificial Intelligence. As a young faculty member at MIT he discovered/invented the unique programming language LISP and pioneered computer time sharing. At SAIL, the Stanford AI Lab, he presided over a wide range of eclectic path breaking projects. One of the new arrivals, Diffie, found himself in conversations with the boss where they explored concepts beyond file encryption, such as key distribution and digital signatures.
Neither McCarthy nor Diffie knew enough about cryptology to gain any idea about how such concepts could be realized. By 1972 Diffie had read The Codebreakers [1]. With his girlfriend Mary Fischer he crisscrossed the country in search of people who knew or could provide pointers. Kahn responded to his cold call with an invitation to visit and allowed him to copy some reports by William Friedman. A rare event occurred in 1974 when Bill Reeds conducted a seminar at Harvard on cryptology. Being back in Cambridge led to new contacts, like with Bill Mann working on cryptography at BBN on a contract for the ARPAnet. Larry Roberts, the leader at ARPA of this project, had been rebuffed when he approached NSA for help with the necessary cryptography.
Inquiries led to Alan Tritter, a researcher at IBM knowledgeable about Identification Friend or Foe (IFF) devices. The way these devices used encryption nudged Diffie a bit closer to his later joint breakthrough with Hellman. Tritter pointed Diffie to his colleague at IBM, Horst Feistel, who had spent years of research on IFF when at Mitre. It turned out that Feistel had left early to spend the weekend at Cape Cod. Next stop Alan Konheim, the head of the mathematics group. Konheim knew a lot. For such people it is hard to know what they can say, so he said nothing. As a consolation prize Diffie got the suggestion to get in touch with one Martin Hellman, who had worked briefly in the IBM lab.
As it happened, Hellman was at Stanford. Everything fell in place: Hellman and Diffie got on like a house of fire, Diffie and Fischer got to live in the house of McCarthy, who left for a year’s sabbatical. The next year, 1975, Diffie and Hellman got their breakthrough: public-key cryptography was born.
When Kahn started his research in the New York Public Library in 1961, there was a lot to catch up on. Just at the time when publication failed to resume after World War I, a spate of inventions came to fruition. In 1919 Gilbert Vernam was granted a patent on an encrypting teletype, soon enhanced to the truly unbreakable one-time tape method. Independently, four inventors patented rotor machines: Arthur Scherbius (Germany) 1918, Hugo Koch (the Netherlands) 1919, Arvid Damm (Sweden) 1919, and Edward Hebern (US) 1921. Several of these names are associated with multiple patents; the simplest account is in Friedrich Bauer’s book [3]. Given these inventions, the combination of Vernam and rotors was but a small step.
With these breakthroughs the balance between code making and code breaking was gone. Nobody had any idea how to break messages encrypted by rotor machines. Moreover, these machines operated at greater speed and accuracy than the manual methods they replaced. Before World War II, research started on the analysis of rotor machines in Poland and in the US. The work of the Polish group escaped to England just before the German assault on Poland in September 1939 (look under “Rejewski” in [1]). The British started a massive code-breaking operation at that time. Primed by the Polish material and the efforts of top mathematicians such as A.M. Turing and I.J. Good, the British became, in deep secrecy, the most advanced in breaking traffic encrypted by rotor machines. Included in the Polish legacy was the use of “bombes”, mechanical devices for automatically trying out large numbers of hypothetical rotor settings.
Developments in the US between the wars were different, mainly due to one person, William Friedman. He was probably by far the most powerful cryptanalyst in the world. He worked for the military starting in the 1920s. In the 1930s he assembled a small group of well-trained people. By the time the war started in Europe this group was reading traffic encrypted with PURPLE, a rotor machine, the highest grade cipher of the Japanese. The contrast with the British effort is stark: no help from the Poles, without mechanical aids, and with only a small group of people.
The post-World War I developments were not secret in the sense of the Secrecy Act in the UK that was to keep the work in Bletchley Park hidden from view. In 1930 rotor machines were for sale by the owners of the Damm and Scherbius patents. These companies may have advertised the excellence of their methods, but not their substance; it was up to qualified organizations to get in touch and it was they who would be briefed.
Vernam was granted a patent in 1919 for a “Secret Signalling System”. The idea is that one can modify a teletype to transmit the exclusive OR (XOR) of two tapes, one containing the message and the other containing the key. At the receiving end an identical key tape is mounted and combined by XOR with the received encrypted message giving as result the message in the clear. When the key tape is random and has never been used, the Vernam system is secure. The Vernam patent was public, as intended by the founding fathers. Yet its description in The Codebreakers contributed to making this book a dangerous one from the point of view of NSA.
It may well be that Kahn’s book contained the state of the art when its first edition was published. This is a remarkable feat for a book aimed at the general public. The next publication to help end the dark period of cryptology was also aimed at the general public: in May 1973 Scientific American published “Cryptography and Computer Privacy” by Horst Feistel [4]. This article described the first advance in cryptography since 1919: the block cipher. The present encryption standard, AES, is a refined and scaled-up version of the device described by Feistel.
In 1934, when Horst Feistel was twenty years old, he immigrated into the US from Germany and started his studies at MIT. The 1941 declaration of war by by Germany on the US turned Feistel into an enemy alien; he was placed under house arrest. This meant he could move around Boston, but needed permission to visit his mother in New York. On January 31, 1944 his fortunes changed abruptly: the restraints were lifted and he became a US citizen. The next day he was given a security clearance and began work at the Air Force Cambridge Research Center [5].
Feistel had been interested in cryptography since his teens and mentioned this shortly after arriving at his new job. After a few years he had built a cryptography research group at AFCRC. According to [5] “over a period of several years it made a major contribution to modern cryptography, developing the first practical block ciphers”. They believe that it was the NSA who succeeded in shutting down the cryptographic work at AFCRC. The same fate befell Feistel’s attempts to set up a cryptographic group at the MIT Lincoln Lab and at Mitre Corporation, where Feistel moved next. Only when he was hired by IBM Research around 1970, could he pursue without interference his lifelong interest, cryptography.
When Feistel’s article appeared in 1973, it was only the second publication on the subject, after Kahn’s book, since cryptology entered into it dark age fifty years earlier. Soon after, something happened that put cryptography into the centre of the limelight: the 1975 promulgation by the National Bureau of Standards (NBS, now NIST for National Institute of Standards and Technology) of DES, the proposed federal data encryption standard. It turned out that shortly before, NBS had published a competition for an encryption standard. Apparently in that short period entries had closed, had been evaluated, and DES, IBM’s entry had been declared the winner.
This raised several questions:
Speculation was rife that the whole thing had been rigged between IBM, NBS, and NSA. Grist to the mill of an investigative journalist, who appeared in the form of Steven Levy, whose articles became his Crypto published in 2001 [2]. Some of the questions, though not all, were answered to my satisfaction by his findings.
First a bit of background. Around 1970 banks had become increasingly in need of cryptography, what with interbank funds transfer and automated clearing by telex. They needed guidance, which, in the absence of public research in cryptology, could only be supplied by NSA. They needed standardization: banks did not want to have to rely on in-house research groups and were not interested in competing on security. Only NBS could provide a standard.
As it happened, it was not some Bankers’ Association that set the process in motion, but the company who supplied most of them with technology: IBM. And within IBM it was a contract with Lloyd’s bank of London to provide automatic teller machines [5, p. 66]. Strong encryption was essential, and IBM was on its own. The only expertise existed in NSA, which had probably plenty strong encryption algorithms. But all this was classified, so could not be put in the hands of uncleared users. NSA declined to design a new, unclassified algorithm, possibly concerned that such an algorithm would reveal their design philosophy.
A group in IBM in Kingston, New York, headed by Walter Tuchman got the task of developing the algorithm. He learned about Lucifer on a visit to IBM research in Yorktown Heights and decided to use it, but adapted to the constraints imposed by the need to implement the algorithm in a compact hardware unit. The resulting algorithm became known as DSD-1, which IBM decided to enter in the NBS competition for the federal encryption standard. No matter that the deadline had passed: a call from the right person to the head of NBS sufficed to get the call for entries in the competition re-issued and to get IBM’s DSD-1 accepted.
NBS passed DSD-1 on to NSA, which summoned Tuchman and presented him with a list of demands amounting to the creation of a virtual annex of NSA within IBM to which all further work was to be confined. IBM has no choice in the matter if it were not to abandon the whole project: deployment of the technology would require export licenses. Ergo, by twisting IBM’s arm, DES (which is how DSD-1 was renamed), the federal Data Encryption Standard was finalised in a process under control of NSA.
Let us summarize by reviewing the above questions.
As of 1975 there were still only Kahn (1967) and Feistel (1973) as lone harbingers of the end of the dark age in cryptology. On 17 March 1975, the proposed DES was published in the Federal Register. Public comments were requested; plenty were received. A furore arose about the opaqueness of the process and the eight missing bits of the key. IBM’s failure to provide a rationale for the wiring of the S-boxes caused critics like Diffie and Hellman to let paranoid interpretations run wild. The Washington Post and the New York Times provided plenty of coverage.
1975 was also the year that Diffie and Hellman got their breakthrough in public-key cryptography, as I noted earlier. The concept was published the next year as “New Directions in Cryptography”, IEEE Transactions on Information Theory, November 1976. It was only the concept; no implementation was provided. Ronald Rivest, Adi Shamir, and Leonard Adleman published one in April 1977 as an MIT technical report. It was what has since become famous as RSA public-key cryptography.
The authors took the unusual step of sending a copy of the report to Martin Gardner, who ran the “Mathematical Games” column in Scientific American. Gardner was in the habit of ending his column with some homework for his readers, with feedback in the next issue for selected successful solutions. For the RSA column, which appeared in the August 1977 issue, the puzzle was to solve a brief message encrypted with RSA. Because this time not all of the needed details were in the column, readers were invited to send a stamped, self-addressed envelope to MIT to receive a copy of the report.
Thousands of such requests arrived from all over the world. Before R, S, or A could organize an envelope-stuffing party, things started happening. The program for the IEEE International Symposium on Information Theory at Cornell University, scheduled for October, featured a presentation of the RSA work. IEEE received a letter from one Joseph A. Meyer, not identified by any affiliation, but with a home address and a member number, expressing concern about some of the presentations announced. This was the first time that academics heard of ITAR, the US International Traffic in Arms Regulation, of the fact that cryptographic devices were classified as munitions. Not only devices were deemed to be munitions, but also information facilitating them. And presenting such information in the US with non-US nationals present amounted to export. Violations of ITAR could result in fines, arrests, or jail terms.
Thanking Mr Meyer for the timely warning, IEEE took the position that, as long as they notified the presenters, this was not their problem. The notifications went out. In addition to pondering whether it was prudent to present new work in cryptography with non-US nationals in the room, MIT was presented with the fait accompli of a non-US national, in the form of Adi Shamir, not only having been in the room, but being one of the creators of the new work. And what was to be done about the envelope-stuffing party? Include relevant sections of ITAR? The 35-cent stamps provided by Scientific American readers were not going to be enough.
The administrations of MIT and Stanford decided to stick their necks out and assured the scheduled speakers that they would provide legal defense if needed. In their turn, the speakers decided to stick their necks out and decided to ignore the Meyer letter. The Cornell meeting went ahead as scheduled in October. In December of 1977 the envelope-stuffing party took place, with pizza (as reported in [2]) and beer (as imagined by me). None of the readers solved the message. By the time it was solved, decades later, column was no more, and none of R, S, and A could remember what the message was.
The new flood of publication in cryptology had started and has continued unabated to the present day. What has also continued, at least for the period covered in Levy’s book, was harassment. This took several forms.
In all these cases the government backed down, but only after a vigorous campaign by the victims, which involved paying for lawyers, engaging the media, and writing letters to representatives in congress.
Those who continue research in the field profit from these successful counter actions.
[1] The Codebreakers by David Kahn. MacMillan, New York, 1967; revised edition, Scribner, New York, 1996.
[2] Crypto by Steven Levy. Viking Penguin, 2001.
[3] Decrypted Secrets by F.L. Bauer. Springer-Verlag, Berlin-Heidelberg, 1997.
[4] “Cryptography and computer privacy” by Horst Feistel. Scientific American 228 (1973): 15-23.
[5] Privacy on the Line: the Politics of Wiretapping and Encryption by Whitfield Diffie and Susan Landau. MIT Press 1998; second edition 2007.
~
And then the 60s started with an absolute miracle, viz. ALGOL 60. This was a miracle because on the one hand this programming language had been designed by a committee, while on the other hand its qualities were so outstanding that in retrospect it has been characterized as “a major improvement on most of its successors” (C.A.R. Hoare).
…
Several friends of mine, when asked to suggest a date of birth for Computing Science, came up with January 1960, precisely because it was ALGOL 60 that showed the first ways in which automatic computing could and should and did become a topic of academic concern. [1]).
Algol was a miracle as a language. It was short-lived, but it left a momentous legacy that acted in two ways: in the way the Revised Report on Algol 60 describes the language and in the way subsequent language designers were influenced by being shown what a programming language could be. In celebration of Algol 60 I refer to these designers as “Children of the Miracle”.
The first Children of the Miracle were the members of the Simula team. Although that language quickly followed Algol 60 into oblivion, its distinguishing feature, classes, survived as object-oriented programming in the hands of Bjarne Stroustrup [2] as his C++ language. Although that language is no longer among the most widely used object-oriented languages, it is very much alive.
The Simula team was exposed to Algol 60 pretty much in one place and at the same time. It is remarkable that three of the main contributors to Prolog implementation also had Algol as their formative experience, but independently, scattered in space and time. In the remainder of this article I give an account of how they were influenced by Algol.
In the case of Alain Colmerauer I rely on [3, 4, 5, 6]. In 1963 Colmerauer, as a new graduate student, joined a group at the University of Grenoble whose task was to build an Algol-60 compiler for the IBM 7044. Among the various available parsing techniques, the group was attracted to the method of Edgar Irons. Compared to the other recursive approach, that of Booker and Morris, the one of Irons was more generally applicable, but required non-deterministic choice among production rules. This was Colmerauer’s first brush with non-determinism.
After Colmerauer completed his PhD work, in parsing, he remained in Grenoble to work on two other topics. One of these was to implement an extension to Algol 60 according to a proposal by Robert Floyd for a general mechanism that is not restricted to such non-determinism as may arise in parsing and that can be added to any imperative language.
The other topic was the two-level grammar of Adriaan van Wijngaarden. Colmerauer wrote both a parser and a generator such grammars. Van Wijngaarden thought that going beyond context-free grammars is important for better definition of programming languages. Accordingly, these grammars were used in the definition of Algol 68. If context-free grammars seem inadequate for the definition of programming language, then they seem more obviously inadequate for natural-language processing.
At the University of Montreal, where he had moved to in 1967, Colmerauer implemented “Q systems”, a generalization of context-free grammars that has similarities to two-level grammars as well as to the type-0 grammars of the Chomsky hierarchy. Q systems can be used both in parsing and in generating mode. This property makes them attractive for natural-language translation: parse the source-language text to capture a semantic structure, generate target-language text on the basis of that structure.
The dual-mode property of Q systems makes them also an attractive choice for question-answering systems in natural language: parse the assertions to capture their information content, update the semantic structure, parse the question to retrieve information from the semantic structure, and generate answers on the basis of that information. Such a use suggests a specific kind of semantic structure, namely a sufficiently expressive logic. That is, view a question-answering system as a front-end to automatic theorem-proving.
When Colmerauer left Montreal in 1970 to take up a faculty position in the University of Aix-Marseille, he assembled a small group to develop a question-answering system in French. For the theorem-prover they selected J.A. Robinson’s resolution logic, possibly inspired by Cordell Green’s question-answering system QA3 [7] where Lisp is the language for the assertions and queries.
The most promising work in resolution theorem-proving was seen to be happening in the University of Edinburgh. Colmerauer invited Robert Kowalski for a brief visit in 1971, followed by a longer visit in 1972. The expectations of the Marseille group were met by what they learned about SL resolution, a new technique recently developed in Edinburgh. Beyond these expectations there was a surprise: Boyer, Kowalski, and Moore had noticed that positive Horn clauses play a pivotal role in resolution logic: in their presence SL resolution refrained from misbehaviour and these clauses can be read as context-free production rules. This suggested that the rules be expressed in positive Horn clauses, with SL resolution acting in parsing or in generating mode, as required by the question-answering system.
When the Marseille group learned these results, the relevance to their project was apparent: the Q systems, which were modeled on type-0 grammars, could be replaced by an SL-resolution theorem-prover based on productions in the form of positive Horn clauses. In spite of a resemblance to type-2 grammars, these logic grammars are more expressive because of the availability of parameters, reminiscent of the rules of the two-level grammars of Algol 68.
Thus the Kowalski visits resulted in a drastic re-design of the Marseille question-answering system. Instead of Q systems for parsing assertions and queries and for generating answers, with logic restricted to the semantic structures, it became logic for everything: an SL theorem-prover specialized for positive Horn clauses could parse, generate, and make inferences. This theorem-prover was only a step away from a general-purpose programming language.
Colmerauer got a grant for a year with the goal to produce a “man-machine communication system in French”. The strategy: first create the programming language, then write the required system. The group adopted “Prolog” as the name of the programming language. The most incisive account of the logic kernel of Prolog is Kowalski’s [8].
In 1975 the action moved to Budapest, Hungary, where Peter Szeredi completed his first Prolog implementation. As a student in mathematics, Szeredi had been programming since 1966. He started in Autocode on the Elliott 803 and used it to write assemblers for this machine. He next became involved in Algol 68: translated parts of the report into Hungarian. Szeredi is credited with discovering an error in the type system, which was later corrected by introducing “incestuous unions”.
As a result of his involvement with Algol 68, Szeredi became acquainted with the Compiler Definition Language (CDL), developed by Cornelis Koster, one of the authors of the Algol 68 report. CDL is closely related, via affix grammars, to the W-grammars in which Algol 68 is defined. As others did, Szeredi found CDL a congenial medium for software development and he used it for systems programs for a new Hungarian computer.
In 1975 the Fortran source code of the second Marseille Prolog implementation reached Hungary together with a few transparencies by David H.D. Warren explaining the main ideas of this interpreter. By the time another group in Budapest had overcome its problems in porting the Fortran code to the locally available machine, Szeredi had completed a new Prolog implementation written in CDL. He credits the similarity of CDL with Prolog for this quick success. See [9] for more information about Szeredi’s work in connection with Prolog.
As Algol plays such an important part in the causal chain connecting Szeredi’s early programming experience with his Prolog implementations, I count him among the Children of the Miracle.
Starting in 1976 Keith Clark took the lead in implementing a sequence of Prolog-like languages. These languages exploited the non-sequential semantics of Horn clauses and used control structures such as coroutines and guards or used parallelism. Let us look at Clark’s pre-Prolog computing experience.
After completing his undergraduate work in logic and philosophy, Clark continued his academic study by taking the Computer Science Conversion diploma course at Imperial College, London. There were no examinations; one graduated on the basis of a dissertation. At the start of the work he was handed a listing of an implementation of Euler on the IBM 7094, written in Algol 60. “Do something with this.”
Clark extended the language with list processing capabilities and liberalised the syntax to allow var declarations to appear anywhere in procedure bodies. In addition to introducing new primitives, he had to extend the BNF grammar of Euler, which associated abstract-machine code generation with most of the grammar rules, add extra abstract-machine instructions, and change the definition of its abstract interpreter for code sequences in reverse Polish. He ended up producing a usable implementation of the extended Euler. In this way Clark was exposed to a double dose of Algol 60: by understanding the language so as to be able to modify the implementation and by immersing himself in Euler, a language inspired by Algol 60.
The miracle that was Algol 60 exerted its influence through the language itself as well as via derivatives such as Simula, Euler, and Algol 68. For the Prolog pioneers these languages were the formative experience.
Thanks to Keith Clark, Paul McJones, and Péter Szeredi for help with this article.
[1] Edsger W. Dijkstra: “Computing Science: Achievements and Challenges” (EWD1284) http://tinyurl.com/znbzyd7
[2] Stroustrup, Bjarne. The design and evolution of C++. Addison Wesley, 1994.
[3] Cohen, Jacques. “A view of the origins and development of Prolog.” Communications of the ACM 31.1 (1988): 26-36.
[4] Cohen, Jacques. “A tribute to Alain Colmerauer.” Theory and Practice of Logic Programming 1.06 (2001): 637-646.
[5] Colmerauer, Alain, and Philippe Roussel. “The birth of Prolog.” History of programming languages—II. ACM, 1996.
[6] Kowalski, Robert A. “The early years of logic programming.” Communications of the ACM 31.1 (1988): 38-43.
[7] Green, Cordell. “Theorem proving by resolution as a basis for question-answering systems.” Machine intelligence 4 (1969): 183-205.
[8] Kowalski, R. “Predicate logic as a programming language”. Proc. of IFIP Congress ’74, pp. 569-574, North Holland.
[9] P. Szeredi “The Early Days of Prolog in Hungary”. ALP Newsletter, Vol 17 n. 4, November 2004.
PS
Alain Colmerauer, January 24, 1941 – May 12, 2017.
We mourn a great scientist and a dear friend.
What is judged to be essence is in the eye of the beholder. I am more interested in what the members of the Algol family have in common and perhaps even with languages not usually considered as members. For example, Prolog. Soon after completion of this language in Marseille, Robert Kowalski wrote a paper (published as [6]) that established a procedural interpretation of a form of first-order predicate logic. Let us examine this interpretation to see whether this gives a hint concerning the essence of Algol as a procedural language.
Kowalski considered logic in the form of clauses, with resolution as inference rule. A clause is a set of atomic formulas, each possibly negated. Here is an example of a clause with three atomic formulas, two of which are negated:
{sibling(X,Y), ~parent(Z,X), ~parent(Z,Y)}
A Horn clause is a clause with at most one unnegated formula. It is a positive Horn clause if there is such a unnegated formula, as is the case here. These are of special interest because they can be read as a rule with a conclusion followed by zero of more conditions. The above example can be read as
sibling(X,Y) if there exists Z such that
parent(Z,X) & parent(Z,Y)
which actually makes sense if one accepts half brothers and half sisters as siblings. Note the “there exists” qualifying the variable Z that occurs in a condition, and not in the conclusion.
Another example, more typical of programming, is the Horn clause
sort(X,Y) if there exist X1, X2, Y1, Y2 such that
split(X, X1, X2) &
sort(X1, Y1) & sort(X2, Y2) &
merge(Y1, Y2, Y)
In his paper [6]) on the procedural interpretation of Horn clauses, Kowalski pointed to the following similarities:
Kowalski claimed that these similarities showed not only that logic can be a programming language, but even a procedural one. At this point in the argument a typical member of Kowalski’s IFIP 74 audience must have dismissed the claim as preposterous: surely something more than definitions and calls to procedures is needed to make a procedural programming language. What about branching? What about iteration? According to Kowalski a program would never do anything but call procedures. Of course procedures were widely acknowledged to help in organizing one’s code. But, from the conventional point of view, a procedure is overhead in the sense that it postpones the computation that is the purpose of the code. A logic program would forever be calling procedures and would anything ever happen?
The punch line of Kowalski’s paper [6]) is the observation that a language such as the one he described actually existed [1]. By the time the paper was published in 1974, this language had been used to
That language is Prolog.
Kowalski’s characterization can be regarded as the view that logic is the essence of at least one procedural language. If logic is the essence of one procedural language, could it be that it is the essence of another one, for example Algol? To see if this is the case let us see what happens if one removes inessential features from Algol. In choosing what to eliminate I will be guided by what would make Algol more like Prolog.
In spite of its succinct definition in 34 pages [6a], Algol 60 is a deep and complex affair [7,8]. This mainly because of its call-by-name parameter mechanism. Here I make a drastic simplification: only allow actual parameters that are primitive data: Booleans, integers, and reals. In this way labels, procedure names, and arrays are excluded. What remains I call “first-order Algol”. Furthermore, to keep things simple, I assume that procedures are closed, that is, names in procedure bodies name either local variables or formal parameters.
To get at the essence, let us consider what remains of first-order Algol after we omit those of its features that are replaceable.
Is there anything left? Yes, declarations of and calls to procedures. Aren’t they dispensable as well? Of course they are. This is proved by the first Fortran and by other pre-Algol languages. But here I am concerned with procedural languages.
My next step is to replace “if…fi” with the help of Floyd’s proposal [5] for introducing nondeterminism into imperative languages. The following quote summarizes the proposal.
Nondeterministic algorithms resemble conventional programs as represented by flowcharts, programming languages, machine-language programs, etc., except that
- One may use a multiple-valued function, choice(X), whose values are the positive integers less than or equal to X. (Other multiple-valued functions may be used, but this one is adequate.)
- All points of termination are labeled as successes or failures.
Floyd’s goal was to be applicable to flowcharts as well as to programs written in machine language or in a high-level language. As we want to do as much as possible with procedures, Floyd’s “points of termination” are returns from procedures. There must be two kinds of return: successful and failed. We can identify the conventional return with success. We add a statement of the form
if <condition> FAIL
to return with failure. When a procedure p calls a procedure q that returns with failure, then p itself returns with failure.
After this detour into nondeterministic primitives, we return to what to do with the branching of Algol. We already proposed to replace it by the if…fi of Dijkstra’s guarded-command language. In turn we propose to replace this in a way that we illustrate with the example:
if G1 -> B1 | G2 -> B2 fi
where G1 and G2, the guards, are Boolean expressions and B1 and B2 are statements, possibly compound. To eliminate if…fi, we replace this example of the alternative construct by the procedure call p(…) and add the declarations for the new procedure name p:
p(…) { if (not G1) FAIL; B1 }
p(…) { if (not G2) FAIL; B2 }
Note that these are two declarations with identical procedure headings. The effect of a call to procedure p is the usual one, except it is left undetermined which of multiple matching declarations responds to the call. In this way we replace Floyd’s nondeterministic operator choice(X) by defining procedure call as the nondeterministic operator.
With these changes in place, first-order Algol has reached the degree of similarity to Prolog that I aimed at. The above declarations in this version of Algol correspond to the Prolog declarations
p(…) :- G1, B1.
p(…) :- G2, B2.
where the arguments in each heading are pairwise different variables.
The semantics of a deterministic procedure is a function from states to states. The semantics of a nondeterministic procedure is a multivalued function from states to states; that is, it is a relation on states. With the assumption that procedures be closed, the behaviour of a procedure p is characterized by the values of the actual parameters after completion of a successful call. In first-order Algol these values are restricted to be Booleans, integers, or reals. Each possible terminating successful call to procedure p contributes one n-tuple (or, because of the possibility of nondeterminism, more n-tuples) to the relation that is the proposed meaning of p. Thus the special case of Algol 60 represented by first-order Algol was already in 1960 a method of defining n-ary relations by mutual recursion. I’m not sure how one would go about this task when given only conventional mathematical notation and the axioms for set theory.
It may be that Algol 60 represented an innovation in this respect. What, apart from conventional mathematical notation, do we have in addition to Algol as formalism for talking about n-ary relations? There is a simple and widely known such formalism, first-order predicate logic: the semantics of a predicate with n free variables is an n-ary relation over the domain of discourse. This logic therefore seems the natural home for the definition of n-ary relations. <!– I would be interested in hearing about such a method that dates from before 1974. I mention this year because that is the year Kowalski published [6]. –> Among other things, Kowalski’s paper [6] describes the formal use of first-order predicate logic for defining n-ary relations, possibly recursive, possibly mutually so.
Thus we have at least two formalisms for the definition of relations, Algol 60 and logic, in chronological order. Of course there may be more. But why would one need any formalism? Can’t we just use the informal mathematics in which the Zermelo-Fraenkel axioms for set theory are expressed? Relations are defined as subsets of Cartesian products. To justify the existence of such subsets one appeals to the Axiom of Separation. This axiom uses a formula of first-order predicate logic. Thus it appears that for the definition of relations one can’t get around logic. Apparently Kowalski did not add logic as just another formalism in which relations can be defined: he was down to bedrock.
Yet Kowalski only defined relations over Herbrand universes. His predecessor in the definition of relations, Algol 60, defined relations over Booleans, integers, or reals. That stimulates my interest in defining relations over arbitrary data types. This has resulted in [9]. This paper relies on logic more than it turned out to be necessary. Subsequent work gives informal set theory an important role, but formal logic still plays a part.
We started out looking for an Essence of Algol in an inclusive sense, looking for a characterization of Algol that was shared as much as possible by subsequent procedural languages. Prolog is a suitable sample for comparison because it is, as characterized by Kowalski, the most extreme procedural language. I argued that that which unites all procedural languages is that they define n-ary relations in definitions that may be recursive and possibly mutually so. Therefore the essence of Algol is shared with other procedural languages and consists in their being formalisms for the definition of relations. Thus relations play the role in procedural languages that is played by functions in functional programming languages. Procedural languages are relational programming languages.
In this sweeping conclusion I left some loose ends. For example, I considered using Algol without branching and iteration, but what about assignment? I left that in, while Prolog does not have assignment. There is no prospect of replacing assignment by procedure call in Algol: a call only postpones computation and this can only happen by evaluating an expression and assigning its result to a variable. All I can do about assignment is to push it out of sight. That is, to ban it from my backyard by introducing a procedure where the assignments happen: one can’t tell from the procedure call whether there is an assignment statement in its body.
Relegating assignment statements to somewhere they can’t be seen suggests a style of writing programs that I call “structured procedural programming”. This term is intended to remind of the “structured programming” that became de rigueur in the 1970s and that is still the norm today. Structured programming was a response to the sudden and acutely perceived need to eliminate the goto statement. One would expect the procedure call to be the prime candidate for its replacement. Oddly, the replacements were branching and iteration.
Enter a complementary form of structured programming: it not only excludes jumps, but also branching and iteration. I would like to be able to say that this complementary form only uses declarations of and calls to procedures, but, as admitted earlier, I cannot eliminate assignment; I can only push it out of sight.
Structured (procedural) programming avoids jumps, branching, and iteration, as described above. It distinguishes itself by only having assignment statements as far as necessary as glue code for procedure calls, so that assignment statements are as much as possible in purely imperative declarations.
I am grateful to Paul McJones for his valuable suggestions.
[1] Alain Colmerauer: “Un système de communication homme-machine en Français” http://tinyurl.com/j4fknrf consulted October 14, 2016.
[2] E.W. Dijkstra: “Goto statement considered harmful.” Communications of the ACM 11.3 (1968): 147-148.
[3] E.W. Dijkstra: “Guarded commands, nondeterminacy and formal derivation of programs.” Communications of the ACM 18.8 (1975): 453-457.
[4] E.W. Dijkstra: A discipline of programming. Prentice-Hall, 1976.
[5] Floyd: “Nondeterministic algorithms.” Journal of the ACM (JACM) 14.4 (1967): 636-644.
[6] R.A. Kowalski: “Predicate logic as a programming language”. Proceedings of IFIP 1974, North-Holland, pp. 569-574.
[6a] Peter Naur (editor): “Revised report on the algorithmic language Algol 60.” Numerische Mathematik 4.1 (1962): 420-453.
[7] John C. Reynolds: “The Essence of Algol”. In Jaco W. de Bakker and J. C. van Vliet, editors, Algorithmic Languages, pp. 345-372. North-Holland, Amsterdam, 1981.
[8] Gauthier van den Hove: “Dissolving a Half Century Old Problem about the Implementation of Procedures”. To appear Science of Computer Programming. http://www.fibonacci.org/GHE8.pdf
[9] M.H. van Emden. “Logic programming beyond Prolog”.
https://arxiv.org/pdf/1412.3480.pdf
[10] M.H. van Emden: “Kinds of programming languages”. http://tinyurl.com/j99lqlz
I doubt whether Jacquard, with his punched cards for controlling looms, thought of card punchings as utterances in a language. I suspect that it is only with the benefit of hindsight that we recognize them as predecessors of the twentieth-century programming languages. Yet, at some point in time, in some place, there must have been a person to whom it first occurred that the programming tool under consideration was a language. I suspect that this occurred soon after the first stored-program electronic computer was running in 1949. At that time there wasn’t anything suggestive of language: there were lists of instructions in the form of sequences of octal digits to be entered via switches in successive memory locations. A sensible user must have first prepared a sheet of paper with a column of fixed-length octal numerals, one for each instruction.
In the beginning of this period, these numerals may not have been thought of as “words”, nor may sequences of such items have been considered as utterances in a language. For example, Konrad Zuse referred to such documents as computation plans expressed in a calculus (hence “Plankalkül”, 1944). A paper by Knuth and Pardo [8] helps pin down the moment when programming aids first were regarded as languages. They quote a 1951 report by Arthur Burks with a title that mentions a “program language” [3]. This makes him my best bet for having been the first to make the conceptual leap that connects these programming artifacts to the world of language.
Within a quarter century there existed a motley collection of programming languages [12]. One way to get some insight into the collection is to classify, as Sammet does with her 24 categories. In this article I propose several categorizations, each characterized by a question: Is the language designed by a committee?, Is it invented or discovered?, Is it conceived as a single-application language?, and, finally, Does it have an essence?.
This is the criterion used by Frederick Brooks [2] in distinguishing software products more generally. He published two lists, one of products that “excited passionate fans”, the other of unexciting but useful items. In the former category he puts Unix, APL, Pascal, Modula, Smalltalk, and Fortran; in the latter he places Cobol, PL/I, Algol, MVS/370, and MSDOS. Brooks says that the difference is explained by whether the language or operating system was designed by a committee.
I think Brooks was onto something here, though we should add two items. If ever a language excited a devoted fandom, it is Lisp. If ever a language was further away from being committee-designed, it is Lisp: it was basically a one-man show, if one is willing to overlook crucial contributions of Herbert Gelernter and Steven Russell. Close to the one-man show, and still far away from the committee, are the small teams with a dominant designer: Smalltalk (Alan Kay) and Prolog (Alain Colmerauer). I want to add that language to Brooks’s list because it is one of those languages people can fall in love with.
Brooks correctly puts “Algol” (presumably Algol 60) in the category of committee-designed. That is not a good move, as Algol 60 is not typical. In fact, it is a miracle, a term I borrow from the following quote.
And then the 60s started with an absolute miracle, viz. ALGOL 60. This was a miracle because on the one hand this programming language had been designed by a committee, while on the other hand its qualities were so outstanding that in retrospect it has been characterized as “a major improvement on most of its successors” (C.A.R. Hoare).
…
Several friends of mine, when asked to suggest a date of birth for Computing Science, came up with January 1960, precisely because it was ALGOL 60 that showed the first ways in which automatic computing could and should and did become a topic of academic concern. [5]).
The poet William Butler Yeats is said to have remarked that prose is endlessly revisable, while a poem snaps shut, like a box. Lisp and Prolog give me this feeling. These languages seem to have been discovered rather than invented. Algol 60 and Smalltalk are ingenious inventions rather than discoveries. When Paul Graham [5]) distinguishes C and Lisp as high points surrounded by the “swampy ground” of other programming languages I guess he had this aspect in mind. Of course he did not mean Common Lisp (committee-designed), but the interpreter of McCarthy’s 1960 paper [10]), possibly cleaned up as in Scheme.
Maybe I can better start by explaining how to tell routine programming projects from the other ones: if it’s best to use an existing language, then it’s routine. Consider some of the other projects.
The advice taker never saw the light of day. It is not clear whether Dynabook ever did. All four languages escaped from their formative projects. They showed that when a man is confronted with an ambitious project, “it concentrates his mind wonderfully”.
What is judged to be essence is in the eye of the beholder. I am more interested in what the members of the Algol family have in common and perhaps even with languages not usually considered as members. This could be a topic for a future essay.
Thanks to Paul McJones for providing valuable information.
[1] Daniel G. Bobrow: “If Prolog is the Answer, What is the Question?” IEEE Transactions on Software Engineering, vol. SE-11, no. 1, November 1985.
[2] Frederick W. Brooks: “No silver bullet”. Computer Vol. 20, no. 4, April 1987, pp. 10–19.
[3] Arthur W. Burks: “An intermediate program language as an aid in program synthesis”, Engineering Research Institute, Report for Burroughs Adding Machine Company (Ann Arbor, Michigan: University of Michigan, 1951), ii + 15 pp.
[4] Alain Colmerauer: “Un système de communication homme-machine en Français” http://tinyurl.com/j4fknrf consulted October 14, 2016.
[5] Edsger W. Dijkstra: “Computing Science: Achievements and Challenges” (EWD1284) http://tinyurl.com/znbzyd7
[6] Paul Graham: “The Roots of Lisp”. http://www.paulgraham.com/rootsoflisp.html
[7] “Online Historical Encyclopaedia of Programming Languages”. hopl.info consulted October 14, 2016.
[8] Donald E. Knuth and Luis Trabb Pardo: “The early development of programming languages.” STAN-CS-76-562, August 1976. A history of computing in the twentieth century (1980): 197-273. The Stanford report is available as http://tinyurl.com/hsuwwrl (consulted October 16, 2016). It reports that the paper was commissioned by the Encyclopedia of Computer Science and Technology, Jack Belzer and Allen Kent (eds.)
[9] John McCarthy: “Programs with common sense”. In Mechanization of Thought Processes vol. I. Her Majesty’s Stationery Office, London 1959.
[10] John McCarthy: “Recursive functions of symbolic expressions and their computation by machine, Part I.” Communications of the ACM 3.4 (1960): 184-195.
[11] John C. Reynolds: “The Essence of Algol”. In Jaco W. de Bakker and J. C. van Vliet, editors, Algorithmic Languages, pp. 345-372. North-Holland, Amsterdam, 1981.
[12] Jean Sammet: “Roster of programming languages for 1976-77” ACM SIGPLAN Notices, 11/1978, Volume 13, Issue 11.
A breakthrough in mathematical logic comes with a venerable pedigree. I remember reading about some pundit in the early 19th century reviewing the status of Aristotle’s contributions. Until the 16th century the teachings of Aristotle reigned supreme. Then the Copernican revolution demolished Aristotle’s cosmology. Galileo’s experiments and philosophy demolished Aristotle’s physics. The chemists had debunked Earth-Water-Air-Fire. Reviewing the wreckage, the pundit wondered whether anything survived. The answer was, yes, Aristotle’s logic had remained unassailed. Moreover there was no prospect of the new scientific method adding anything to it. Not surprising, the pundit concluded, because logic concerns nothing less than the laws of thought, and we moderns have no better access to these than Aristotle did.
It was ironic that around the same time Boole published The Laws of Thought [1]. This book not only added to Aristotle’s logic, but it also made further development less inconceivable. For example, although Aristotle and Boole made it possible to reason about Greekness and mortality, it was not clear how formal logic could help with statements such as “for every number there is one that is greater.” For an account of this next step I will follow Robinson’s own account [2] of the history of the field to which he made his contribution.
In 1879 Gottlob Frege published a booklet containing an exposition [3] of “Begriffsschrift”, a German neologism that translates to “concept writing”. Begriffsschrift was not only expressive enough for “for every number there exists one that is greater”, but had as goal nothing less than to analyze completely the formal structure of pure thought and to represent such analysis in a systematic and mathematically precise way. In acknowledging all these achievements we need to see through Frege’s idiosyncratic presentation of formulas, which he defends by the observation that the comfort of the typesetter is not the summum bonum. Whitehead and Russell steered the notation back to an algebraic style, as Boole had first done.
Frege’s work opened two lines of research: semantics and proof theory. Semantics inquired into the nature of the concepts themselves—it asks questions like “what is a number?” or “what is infinity?”, much debated around the turn of the century. This line led to Whitehead and Russell’s Principia Mathematica and to Zermelo’s axiomatic set theory. The proof-theoretical effects of Frege’s work led to formalization of processes of deduction, leading to what we now call “algorithms”.
As far as Robinson’s work is concerned we can restrict our attention to proof theory. In this line of research not much was happening between 1879 and the work of Löwenheim in 1915. This began a fruitful period of exploration culminating in the fundamental theorem of predicate calculus: the fact, which Frege took on faith, that his concept notation is a complete system, that it actually can do everything it was intended to do. The intention behind the predicate calculus was that it should provide a formal proof of every sentence in the language that is logically valid and that this proof should be systematically constructible, given the sentence. This was proved independently around 1930 by Kurt Gödel, Jacques Herbrand, and Thoralf Skolem. It is to these investigators that we owe today’s predicate calculus proof procedures.
Of course, in the 1930s no computers existed that could execute these proof procedures. In 1953 the mere existence of computers prompted a philosophy professor, Hao Wang, to start writing programs to prove theorems. Computers had struck him as conceptually elegant and a proper home for the obsessive formal precision that characterised mathematical logic, something that mathematicians find irrelevant, pedestrian, and an obstacle to creativity.
Probably the very first program was the one Davis wrote in 1954 to prove theorems in Presburger arithmetic. Other early work included Gilmore’s program implementing Beth tableaus and a new algorithm implemented in 1960 by Davis and Putnam. This early work launched the new area of Automated Theorem Proving, which attracted many new entrants by its novel combination of psychology (how do humans do it?), logic (to tell us what counts as a proof), mathematics (where to find things to be proved), and technology (how to get a computer to do the work).
Wang and the other pioneers found that the proof procedures of the 1930s contained steps that were seen as do-able “in principle”, which was all that was considered necessary. Only when computers were available, people were forced to reduce such steps to algorithms that could be executed by computers in a reasonable amount of time. A campaign of improvement started with Prawitz in 1960 and culminated in the mid-seventies.
This campaign naturally divides into two periods: pre-Robinson and post-Robinson. These periods are separated by Robinson’s “A machine-oriented logic” written in the summer of 1963 [4]. The pre-Robinson contributions of Prawitz and of Davis/Putnam concerned the avoidance of superfluous instantiations. This line of research was closed by Robinson’s resolution inference step which incorporated the unification algorithm with the property of yielding most general substitutions. This property implied that all superfluous instantiations had been avoided. “A machine-oriented logic” was published as [5]. It described resolution logic and gave the main results.
The elegance and simplicity of resolution logic allowed one to see a whole new vista of redundancies. This started the search for restrictions of resolution opportunities that would not compromise completeness of the resolution proof system. An early step in this direction was taken by Robinson himself. He called it hyperresolution and wrote the paper on it later in 1963. I quote from [4]:
During these same months [summer 1963], back at Rice [University], I wrote my paper on hyper-resolution, which appeared in 1964—a year before the resolution paper itself, which it presupposed! It turned out that the 1963 resolution paper had lain, waiting to be refereed, for almost a whole year at MIT. I gather Marvin Minsky had been asked by the JACM to referee it. It was indeed being discussed openly in his group and they had even written about it in their internal reports. The first I knew about all this was about a year after submitting the paper, when I received a phone call from Minsky to ask my permission to use it as one of the source materials in his AI seminar! I thought (and still think) that this long delay in refereeing the paper was very bad, and so did the Editor of the JACM, Richard Hamming. I complained to him, and he got on to Minsky, and my paper was finally returned to me with a request for some small revisions. I quickly sent Hamming a revised version, and it appeared in January 1965.
Robinson had started working in automated theorem-proving during a summer visit to the Argonne National Laboratory, where he built up a group for research in this area in the summer of 1961 and several subsequent summers. Though resolution logic was adopted by some workers in automated theorem-proving elsewhere, it remained one of several alternative methods. Its role in automated theorem-proving remained uneventful in comparison with the role resolution logic was to play in artificial intelligence. There it was embraced, then rejected, and went on to spawn the new research area that came to be called Logic Programming.
Of course the first to know about resolution logic was the Argonne group. Because of Hamming’s irregular choice of Minsky as referee for [5] (irregular because there were several reputable workers in automated theorem proving at the time), it became known at MIT.
The second escape of resolution logic beyond automated theorem proving came about as follows. Quoting Robinson [6]:
It was Cordell [Green] who was a very early adopter [of resolution logic]—in fact it was in late 1963 at Rice [University], in his final undergraduate year there, when resolution figured prominently in my lectures (in 1964 I thought and talked about little else!). I think Cordell probably thought of his QA idea [using resolution logic for a question-answering system] about that time. Off he went to Stanford in mid 1964 to join McCarthy and start his PhD program.
John McCarthy was Mr Artificial Intelligence himself: he had invented the term in 1956. In 1959 he published a paper [7] in which he proposed theorem proving by computer to implement an intelligent agent. This was very different from and independent of what the automated theorem proving people had in mind. The kind of theorem McCarthy had in mind was “I can get to the airport” to be inferred from a body of premisses including such as “I am at my desk” and “From my desk one can get to the airport”. Of course other premisses were needed. McCarthy was interested in what would constitute an adequate set.
In [8] and in his thesis [9] Green works out these ideas of McCarthy’s. He goes on to establish the beginnings of what has since become known as “logic programming”. However the state of resolution logic at the time of Green’s work prevented it from gaining the recognition it deserved. Two reactions were possible. The first was to conclude that the very idea of solving problems by resolution logic is flawed. This was the reaction of Minsky and his entourage at MIT. The second possible reaction was that the resolution logic needed further development.
This development indeed took place. Although resolution had eliminated vast swaths of redundancy in the earlier theorem provers, the simplicity and elegance of resolution revealed new sources of redundancy. Independently of Green’s work several workers had found this a worthwhile challenge [10]. Loveland and Luckham achieved improvement by the introduction of the linear format. Kowalski and Kuehner achieved further restriction (maintaining completeness) with the introduction of a selection function.
With Boyer and Moore, Kowalski studied the representation by logic clauses of formal grammars. In this way the special role of Horn clauses became apparent. Meanwhile in Marseille, France, Colmerauer and Roussel were writing a program for question-answering in natural language [13]. The program was based on a grammar formalism developed by Colmerauer under the name of Q-systems. Resolution had come to their attention via Jean Trudel, a Canadian student, in Marseille with a scholarship. The group invited Kowalski for a visit. He instructed them in SL-resolution and the fact that ancestor resolution is not needed for completeness when all clauses are Horn clauses. This was enough for Colmerauer and Roussel to produce a successor to Q-systems which had the attributes of a programming language, yet was an SL-theorem prover for Horn clauses. They called it Prolog, from programmation en logique, a name suggested by Roussel’s wife Jacqueline [11].
As Prolog and Kowalski’s work became widely known, the term “logic programming” was often confused. Kowalski [11] defines “logic programming” as
Logic programming shares with mechanical theorem proving the use of [resolution] logic to represent knowledge and the use of deduction to solve problems by deriving logical consequences. However, it differs from mechanical theorem proving in two distinct but complementary ways: (1) It exploits the fact that [resolution] logic can be used to express definitions of computable functions and procedures; and (2) it exploits the use of proof procedures that perform deductions in a goal-directed manner, to run such definitions as programs.
This emphasis on the use of resolution logic for knowledge representation makes logic programming into a paradigm within artificial intelligence. In spite of the antagonism from MIT, it has been acknowledged as such elsewhere.
Independently of logic programming, resolution logic has given rise to interesting developments in programming languages. The first and biggest step is represented by the very existence of Prolog as early as 1973. Kowalski described the use of resolution logic as a programming language [12] in a way that can be regarded as the essence of Prolog. To emphasize its status as a programming language it can be characterized as the most extreme procedure-oriented programming language, one where everything apart from the definition and calling of procedures has been stripped away. Iterative constructs have been omitted because iteration is the tail-recursive special case of procedure call. Branching has been omitted because it is a special case of non-determinism. Non-determinism does not need a specific operator: the procedure call can serve as such an operator in the presence of multiple procedure declarations. Finally, by using unification for the replacement of formal by actual parameters in procedure calls, assignment statements are rendered superfluous. Keith Clark showed how an analysis of unification leads to several modes of parallelism. With a succession of collaborators he implemented such variants of Prolog. For Colmerauer unification provided the starting point of what became known as constraint logic programming, also represented by variants of Prolog.
Thus Robinson’s discovery of resolution logic not only made an impact in its native land of automatic theorem-proving, but it also escaped its home to become a significant force in artificial intelligence and in programming languages.
My thanks to Paul McJones for his comments and suggestions for improvement.
[1] George Boole: An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities. Macmillan, 1854.
[2] J.A. Robinson: “Logic programming—past, present and future”, New Generation Computing, vol. 1, 107-124. Ohmsha and Springer-Verlag, 1983.
[3] Gottlob Frege: Begriffsschrift, eine der Arithmetischen nachgebildete Formelsprache des reinen Denkens, Verlag von Louis Nebert, Halle, 1879. (Special thanks to Wikipedia for including a facsimile of the title page.)
[4] Letter of Robinson to Wolfgang Bibel approximately 2010.
[5] J.A. Robinson: “A machine-oriented logic based on the resolution principle”. Journal of the ACM, vol. 12, pp 23-41, 1965.
[6] Robinson, personal communication January 14, 2016.
[7] J. McCarthy: “Programs with common sense” in Proceedings of the Teddington conference on the mechanization of thought processes pp 75-91, Her Majesty’s Stationery Office, 1959.
[8] C. Cordell Green: “Application of theorem-proving to problem-solving”. In Proceedings IJCAI, pages 219–231, 1969.
[9] Cordell Green: “The Application of Theorem-Proving to Question-Answering Systems”. Technical Note no. 8, June 1969, Artificial Intelligence Group, Stanford Research Institute.
[10]. D.W. Loveland: “Automated Theorem Proving: a quarter-century review” in Contemporary Mathematics, vol. 29, pp 1–48, American Mathematical Society, 1984.
[11] R.A. Kowalski: “The early days of logic programming”. Comm. ACM, vol. 31(1988), pp 38–43.
[12] R.A. Kowalski: “Predicate logic as programming language”. Information Processing 74, 569–574. North Holland, 1974.
[13] Alain Colmerauer and Philippe Roussel: “The birth of Prolog”. SIGPLAN Notices vol.28 (1993), no. 3, pp. 1–31. Also: History of programming languages—II, pp 331–367, published by ACM, New York 1996.
Since time immemorial, cryptography was a common feature of puzzle columns in magazines and newspapers. But up until the 1970s, only the military, departments of foreign affairs, and spies were seriously interested in it. This state of affairs changed when banks and other commercial organizations started using computer networks. The need arose for industrial-strength cryptography. Not only that, but the cryptography needed to be standardized and sanctioned by the government. For the bank’s information-technology chief it was even more important for encryption to be standard than to be secure: if a breach occurred that cost the bank a few million, he wouldn’t lose his job as long as he had used the industry-accepted standard. This was the reason for the U.S. National Bureau of Standards (now NIST) to adopt the Data Encryption Standard (DES) in 1977. Its current successor is the Advanced Encryption Standard (AES), adopted in 2001.
In the 1990s there was a new wave of interest in cryptography arose. Private citizens became concerned about the possibility of surveillance of their personal communications. Phil Zimmermann, who had been an anti-nuclear activist, created an encryption program he named PGP (“Pretty Good Privacy”). It gave people privacy when using bulletin board systems (this was before e-mail was widely available) and security when storing messages and files.
To achieve the highest level of security, PGP uses a public-key protocol implemented by the RSA algorithm. However, this is only computationally feasible for short messages. Systems that can handle long messages are symmetric-key systems, meaning the sender and recipient have to share the same key. PGP solves this problem by using a public-key protocol for transmission of the key needed in some symmetric-key system for bulk encryption, that is, to encrypt the message itself, which can be long.
PGP marks the beginning of wildcat encryption. I am calling it so to draw attention to the difference between Zimmermann and the data security people at the banks: he didn’t care about PGP having a level of security sanctioned by the government; he wanted the best security he could get. Accordingly, for the bulk encryption part of PGP he used a cipher called IDEA, from “International Data Encryption Algorithm”, a non-standard improvement of DES published in 1990 [1].
From the point of view of Zimmermann IDEA had the advantage of key length and block size suggesting a higher level of security than DES, but it is the same type of algorithm. For example it is also a block cipher. Its blocks are twice the size of those of DES, but still minuscule compared to plaintext lengths of many thousands of bytes, not uncommon in practice.
To understand how we came to be stuck with short, fixed-sized blocks, let us have a look at the history of DES. In 1971 Horst Feistel patented a new type of encryption algorithm: a substitution-permutation network with 48-bit keys and blocks. In 1973 he published a version where this was increased to 128 bits. These were part of an IBM project named Lucifer. At the time, practical deployment required hardware implementation. The block sizes were small enough to make this feasible in the chip technology of the time.
Let us now move forward to 2001, when AES was adopted as standard to replace DES. Much has changed; what has not changed is that block sizes, though increased, are still minuscule compared to the length of the messages that the system should be able to carry.
What had changed in 2001 was that since at least a decade encryption was implemented in software and was running on byte-oriented computer architectures. The constraints imposed by hardware implementation had disappeared. And guess what, AES, like Lucifer, is a substitution-permutation network, with 128-bit keys and blocks. The structure of the network, the structure of the boxes, and the algorithm incorporate many details not necessitated by published design criteria. DES had been designed by IBM in collaboration with the U.S. government. Some commentators voiced concerns about the possibility that, in addition to the published design criteria, there was a backdoor that presented a weakened version of the system to an insider. In fact, after two decades some hitherto unpublished design criteria for DES surfaced [7].
In an attempt to lay such suspicions to rest, the designers of AES published a sizeable book with lots of abstract algebra in an attempt to convince the public that the design decisions were all in the interest of the users’ security. But no matter how big one makes such a book, it remains possible that it does not contain all design criteria. This is not the authors’ fault: the possibility is inherent in the type of encryption system: substitution-permutation network, with fixed-size blocks of a size much less than that of the larger messages to be carried.
This curious history makes it worth thinking about an encryption algorithm that does not inherit hardware design constraints and freely uses the liberty afforded by software implementation to cobble together an algorithm that stays close to fundamentals of encryption. For these fundamentals it is best to refer to the document mentioned by Whitfield Diffie in the following quote [2].
The literature of cryptography has a curious history. Secrecy, of course, has always played a central role, but until the First World War, important developments appeared in print in a more or less timely fashion and the field moved forward in much the same way as other specialized disciplines. As late as 1918, one of the most influential cryptanalytic papers of the 20th century, William F. Friedman’s monograph The Index of Coincidence and its Applications in Cryptography, appeared as a research report of the private Riverbank Laboratories. And this, despite the fact that the work had been done as part of the war effort. In the same year Edward H. Hebern of Oakland, California filed the first patent for a rotor machine, the device destined to be a mainstay of military cryptography for nearly fifty years [3]. After the First World War, however, things began to change. U.S. Army and Navy organizations, working entirely in secret, began to make fundamental advances in cryptography. During the thirties and forties a few basic papers did appear in the open literature and several treatises on the subject were published, but the latter were farther and farther behind the state of the art. By the end of the war the transition was complete. With one notable exception, the public literature had died. That exception was Claude Shannon’s paper “The Communication Theory of Secrecy Systems,” which appeared in the Bell System Technical Journal in 1949 [4]. It was similar to Friedman’s 1918 paper in that it grew out of wartime work of Shannon’s. After the Second World War ended it was declassified, possibly by mistake.
Shannon’s paper stands out as lone monument in an empty period of the literature, a period that lasted from Friedmann’s monograph to the early 1970s.
Apart from specific results, Shannon’s paper is valuable for introducing a mathematical way of thinking about cryptology, something we now take for granted. What does Shannon’s mathematical view show us? For any practical system the key is shorter than the message, much shorter. Suppose we use a key of 256 bits (reasonably large) to encrypt a message of 100,000 bits (only moderately long at 12,500 bytes). The key selects one among at least 100,000! possible message-to-cryptogram maps. But there are only 2↑256 keys possible. So the keys identify only the tiniest fraction of possible message-to-cryptogram maps [by my computer-aided reckoning 100,000! is about 2↑1,516,704]. The security of a cipher depends on how randomly it sprinkles that tiny fraction over the space of all message-to-cryptogram maps.
Shannon makes this vague criterion more precise by identifying two methods to thwart attempts to break an encrypted message: diffusion and confusion. His description of these methods is too technical to reproduce here. Suffice it to say that Feistel’s invention, the block cipher, incorporated both confusion and diffusion in Shannon’s sense. Feistel’s cipher uses a network of “boxes” of two types: S-boxes and P-boxes. The plaintext is subdivided into fixed-sized blocks. The network subjects each block of text to repeated transformation. An S-box substitutes text elements for existing text elements. This achieves Shannon’s “confusion”; his “diffusion” is achieved by the P-boxes permuting the text elements within a box. Thus Feistel was right on track according to Shannon’s fundamental principles in aiming at confusion and diffusion. But Feistel had to work under the constraint of hardware implementability, which was constrained by the chip technology of the 1970s.
What I’m proposing here is to follow Feistel in implementing substitution and permutation, but freed from the constraint of using a fixed set of S and P boxes acting on a short and constant block size. Instead, I let the boxes themselves as well as their lengths depend on the key. This results in a block size only limited by the length of the message. Of course some constraints remain. For example, I am not assuming that the entire plaintext is in random-access memory. Accordingly the proposed algorithm buffers the message using a buffer size B, which, of course, can be made as large as one likes. The length of the current block is chosen between B/2 and B in a way that depends on the key. For test runs I set B at 768 bytes (giving block sizes between 3072 and 6144 bits, to be compared to 128 bits in the case of AES). It is these variable-sized blocks that are subjected to a permutation. I have chosen the Fisher-Yates shuffle under control of the pseudo-random number generator.
This takes care of the diffusion part. The confusion part takes some more explaining. To cut to the chase, it is a version of the WWII Enigma machine. Logically, Enigma had a complex structure, a complexity necessitated by hardware constraints (mechanical hardware in this case). Software freedom allows drastic simplification of Enigma’s logical structure. The relatively large memory size available allows an Enigma-like software device where everything is much bigger than in the original. I call this gigantic version of Enigma Giganigma.
In the contemporary cryptographic community Enigma is regarded as no more than an historic curiosity. But still, let us consider its potential security. During WWII many variants of Enigma were in use, with different levels of security. As will be explained below, Enigma had no explicit key, but instead relied on a set-up procedure that was kept secret and that led to a large number of possible combinations. For the high-end Enigma, this number was about 3*(10↑35), equivalent to a key size of between 118 and 119 bits (as can be seen from the fact that 10↑3 is approximately 2↑10), rather better than DES. It is intriguing, but probably a coincidence, that Feistel chose a key size of 128 bits for his larger version of Lucifer. What matters is of course how well the uncertainty in the key is transferred to uncertainty in the plaintext when the ciphertext is given. Due to its mechanical constraints this utilization is probably rather poor in the case of Enigma, and in any case hard to assess.
Giganigma has a key size of 256, which makes it double that of AES. Key utilization in AES is hobbled by it fixed network, fixed S- and P-boxes, and a short, fixed block length. In Giganigma the structure of the components and the block length are determined by the key only, which I expect to lead to better key utilization, but, again, hard to assess.
To explain my choice of Enigma as starting point for a new approach to bulk encryption, a brief description of the device. Enigma was the most widely used family of rotor-based encryption devices. This encryption principle was independently invented by Arthur Scherbius in Germany, Edward Hebern in the U.S.A., Hugo Koch in the Netherlands, and Arvid Damm in Sweden. Their patents are all dated 1918 or 1919. Rotor machines were the mainstay of military cryptography from around 1930 to 1970.
A rotor is an electro-mechanical way of implementing an invertible substitution by a letter for a letter of the same alphabet. In the case of Enigma the alphabet consists of the 26 letters A through Z. Enigma rotors implement such a substitution by means of a circular disk of insulating material the size of a hockey puck. Around the perimeter of each face of the disk there are 26 evenly spaced electrical contacts, each of which is connected to exactly one contact on the other face. Several rotors are mounted in a pack on an axle on which they can rotate independently of each other. Each contact of a rotor facing another rotor makes a connection with a contact of that other rotor. In this way the pack as a whole implements a substitution of one letter for another.
This substitution is used for encrypting one plaintext letter. After this at least one rotor changes position, so that the pack as a whole effects another substitution, which is used for the next letter. Before entering the rotor pack, the signal travels through a plugboard which is set by the operator. The plug board also implements a substitution. At the other end of the rotor pack the signal is reflected and is sent back through the pack. This arrangement effects a simple way of decrypting: entering the ciphertext as a plaintext message as if it were to be enciphered has the effect that the plaintext emerges. In some Enigma models the reflector is field-rewirable.
The secret key to be shared by sender and recipient is not, in the case of rotor machines, in the form of some secret word. Instead, it is effected by the form of a protocol followed by sender and recipient that leaves an adversary with a large amount of uncertainty about the substitutions used to obtain the ciphertext. Let us calculate the amount of this uncertainty for a high-end Enigma with a basket of 8 rotors, out of which 4 are mounted. The protocol specifies for each day:
This gives as total number of possibilities about 3*(10↑35), the number mentioned earlier, equivalent to a key length between 118 and 119 bits. This is already a respectable length, which increases if there is a field-rewirable reflector to contend with.
To see how Enigma can serve as model for a modern incarnation, let us look at the device from an abstract point of view. In the first place, Enigma is not a single device, but a kit from which any of a large number (8*7*6*5*26!, from items 1 and 2 above) of alternative devices can be assembled in a matter of minutes. Any of these assembled Enigma’s can be regarded as a finite-state machine with an input tape (the plaintext) and an output tape (the ciphertext). Such a machine is specified by a set of states (including a distinguished one serving as start state), a next-state function, and an output function.
In the case of Enigma there are as many states as there are combinations of the 26 positions of the four rotors. That is, there are 26↑4 = 456,976 states. Each of these states amounts to a virtual rotor, which is, like the constituent real rotors, a substitution table for the 26 letters of the Enigma’s alphabet. The output function is the effect of this substitution table on the current input symbol. Enigma realizes only 26↑4 substitution tables for any particular letter, but the uncertainty faced by the adversary is compounded with the fact that it is not known which of the 8*7*6*5*26! alternative Enigma’s has been assembled.
Giganigma is derived by adopting the same abstract view as finite state machine assembled from a kit. There are several points at which software implementation suggests additional security. But first and foremost the role of the key needs to be changed. As noted above, Enigma did not have a key in the usual sense of a secret word. Instead, the uncertainty faced by the adversary was in the form of not knowing which particular Enigma was assembled and what its initial state was.
Software implementation allows us to use a key in the usual sense of the word, as a sequence of bits. A convenient length for this sequence is 256. The key-equivalent choices in Enigma propagate throughout the transmission of even the longest message. If the key is a sequence of only 256 bits, a mechanism is needed to ensure its continuing contribution throughout encryption of the plaintext or decryption of the cryptogram. In Giganigma this is achieved by a pseudo-random number generator in the form of a finite-state machine with the key as initial state.
Let us start with the assembly stage. Enigma is assembled from a basket of 8 rotors. These rotors had a fixed wiring pattern, so that the resulting substitution tables had to be assumed known by the adversary. For Giganigma we imagine a stage to precede the assembly stage. In this additional stage, which we can think of as a “manufacturing stage”, each of the basket of rotors is created as a random permutation of the alphabet derived from the key. And, while we’re at it, we might as well get more than 8 rotors in the basket. We ran our examples with a basket of 64 rotors.
Thus in Giganigma, for each message separately, the rotors are wired under control of the key.
The manufacturing stage of Giganigma is also the time where the number of mounted rotors is to be decided. In Engima the current substitution table is realized by an electrical curent traversing the four mounted rotors. In Giganigma each rotor is represented by an array A of eight-bit characters such A[i] is substituted for character i. For Giganigma our first impulse is to exploit software freedom for a much greater number of mounted rotors, to increase the amount confusion facing an adversary. How much greater?
We need to realize that the more virtual rotors we mount in Giganigma, the longer it takes to encrypt or decrypt a character, as each rotor correspohds to an array access. In Enigma this is of no concern, as the change in electrical current traverses the half-inch thickness of the rotor at at least half (my guess) the speed of light in vacuum. Which is why I set the number of mounted rotors at 16, much smaller than the number in the basket. Decrease it if you want more speed; increase it if you want more confusion.
The state in the finite-state machine [5] modeling Giganigma has following components:
State transitions in the finite-state machine modeling Giganigma are as follows.
In Giganigma software freedom offers an embarrassment of riches for item 2. An extreme libertarian would even question the use of a basket of rotors to be fixed for the entire message: why not continue wiring new rotors under control of the key stream as encryption of decryption proceeds?
To answer this question we should consider the key stream as generated by a PRNG. Even if the key is selected randomly, the key stream is not random in the sense that each byte comes out as a random choice among all possible bytes. The ideal of next-byte randomness is more closely approached in the early part of the key stream. The role of the basket of rotors is to capture the randomness of the early part of the key stream and to keep it available for encryption of the entire message, however long. Of course the later parts of the key stream are still useful. In Giganigma they are used to control the ongoing change of mounted rotors and to control block sizes.
Thus the next development in Wildcat Crypto is to replace in PGP the IDEA component for bulk encryption by a program that effects Shannon’s diffusion in the manner indicated above and that effects Shannon’s confusion by Giganigma.
[1] Lai, Xuejia, and James L. Massey. “A proposal for a new block encryption standard.” Advances in Cryptology—EUROCRYPT’90. Springer Berlin Heidelberg, 1990.
[2] Foreword for: Schneier, Bruce. “Applied Cryptography”, Wiley 1996.
[3] Diffie, Whitfield, and Martin E. Hellman. “Privacy and authentication: An introduction to cryptography.” Proceedings of the IEEE 67.3 (1979): 397-427.
This paper is also a good short overview of cryptography. I recommend the following books: “The Codebreakers” by David Kahn (MacMillan 1967), “Handbook of Applied Cryptography” by Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone (CRC Press, 1996-2001), and “Cryptography” by Douglas R. Stinson (Chapman and Hall/CRC, 3rd edition 2006).
[4] Shannon, Claude E. “Communication theory of secrecy systems.” Bell system technical journal 28.4 (1949): 656-715.
[5] Also, “finite-state automaton”. There is a veritable zoo of such paper machines, first identified by authors such as Arthur Burks, Edward Moore, and George Mealy. Their writings stand at the birth of computer science. A good early textbook is “Computation: Finite and Infinite Machines” by Marvin Minsky (Prentice-Hall 1967).
[6] “The Design of Rijndael” by Joan Daemen and Vincent Rijmen (Springer 2002).
[7] Example of a secret design criterion: “It was acknowledged that the purpose of certain unpublished design criteria of the S-boxes was to make differential analysis of DES infeasible … it was kept secret for almost 20 years …” page 101, “Cryptography” by D.R. Stinson (Chapman and Hall/CRC, 3rd edition 2006). Here of course the secret design criterion was in the interest of the user’s security.
For most teachers “an introductory programming book with C” is an oxymoron. The extreme wing in this school of thought consider only designedly friendly languages suitable for an introduction to programming. BASIC is an early example. My current favourite friendly language is Python [1]. But the mainstream of teachers of introductory programming has settled on Java as a compromise between friendliness and attractiveness to prospective employers.
These employers will not include Joel Spolsky, as he explains in The Perils of Java Schools:
All the kids who did great in high school writing pong games in BASIC for their Apple II would get to college, take CompSci 101, a data structures course, and when they hit the pointers business their brains would just totally explode and the next thing you knew, they were majoring in Political Science because law school seemed like a better idea. I’ve seen all kinds of figures for drop-out rates in CS and they’re usually between 40% and 70%. The universities tend to see this as a waste; I think it’s just a necessary culling of the people who aren’t going to be happy or successful in programming careers.
What Spolsky isn’t saying (and certainly should not be thinking) is that it is inevitable that there are brains that can’t get themselves around pointers. I think that those seemingly handicapped brains are the result of unfortunate teaching—the kind of teaching that surrounds pointers with that aura of trickiness and disreputability that comes with that other bugaboo of programming, the goto statement. Of course the author of the textbook is not in total control of the instructor, but at least he can introduce pointers in a matter-of-fact way, as just another feature of the C programming language.
This brings me to the top of the pile of books on my desk: Elements of Programming by Maarten van Emden, which recently became available worldwide. It is also available without charge as a pdf file. This book is an example of one where pointers are introduced as just another feature of C. In this respect Programming for Engineers by Aaron Bradley is even better and is the most powerful antidote I know of against the phenomenon noted by Spolsky. Bradley doesn’t start with the usual cutesy “Hello world” program. Instead, the first program, on page 2, has five lines. It declares some variables and performs some assignments. The program is followed by the memory maps for the initial state, and for each of the computational states afterwards. For each cell it shows the address, the identifier of the variable, and its content. On page 4 any mysteries around pointers are dispelled once and for all with a similarly small program. It declares a pointer variable, assigns it a value, and assigns the result of de-refencing it. Again a memory map for each change in state. Bradley shows that, by fearlessly digging down to bedrock, pointers become crystal-clear.
As far as I know, Bradley’s approach is unique. I would recommend his book, were it not for the fact that so little is covered. I’m allergic to fat books, but there is only so much you can say in eight thousand words [2].
The best book on the pile is The C Programming Language by Kernighan and Ritchie (“K&R”). It was the natural choice when we created a new introductory course where the language was mandated to be C. However, K&R assumes that the reader is a programmer and only needs to learn C as a new language. The choice of K&R for the new course would have put an undue burden on the instructor, who would have to provide a considerable amount of introductory material. The instructor, Jason Corless, believed that I could do better than the alternatives to K&R that were on the market. This led to my writing the first edition of Elements of Programming, of which you can now buy the fourth edition.
K&R stands out in several ways. It is not only the best introduction to C, but it is itself part of the history of C. When the first edition of K&R was published in 1978, the language was not standardized. As a result the various C compilers did not implement the same language. In this chaotic landscape “K&R C”, the version of C described in the book, served as the de facto standard.
Another way in which K&R stands out is Kernighan’s excellent writing style. It is of course an indiscretion to enquire into the ways in which co-authors have contributed and even worse than an indiscretion to make guesses. I’m sticking my neck out in this case because I have read four books by Kernighan and six different co-authors [3] in total, and they all have the same crisp and agreeable writing style, better than what is found in your typical non-Kernighan book.
A telling difference between these books is the attention given to the choice of examples. The most basic kind of exposition is to describe the next feature and then to give an example. Logically speaking, all that the example needs to do is to illustrate the feature. But wouldn’t it be more rewarding, if instead of half a dozen lines for such a parsimonious example, a function is given that is worth knowing about, independently of the feature being illustrated? If necessary at the expense of another half dozen lines? In this I have tried to emulate K&R, which is hard to beat in this respect.
An idiosyncratic way to classify these books is according to whether they include in their examples the Quicksort sorting algorithm. It is essential neither for illustrating features of C, nor of programming. Yet it pops up in surprisingly many of the books on my pile. In the case of my Elements of Programming the appearance is not surprising. Soon after I was introduced to any kind of programming, I had the dubious luck of being told about Quicksort, which promptly caused a year of bouts of brain fever. The result has been my induction into Knuth’s The Art of Programming [4], which some regard as a Hall of Fame in programming, though it is more accurate to view it as a monument to Knuth as a sleuth. In Knuth’s book my variant of Quicksort is awarded the dubious distinction of being the only one that has resisted Knuth’s attempts at mathematical analysis of its performance. Thus, the inclusion of Quicksort in The Elements of Programming is excusable. It is intriguing that K&R, with their more mature perspective, also go out of their way to include Quicksort.
The third book on my pile that includes a listing of Quicksort is Engineering Problem Solving with C by Delores M. Etter. The code for partition (called “separate” here) looks unusually complicated. As an obscure alternative may be the result of clever optimizations, perhaps in array accesses or item comparisons, I entered the code to find out.
To get it to compile it was sufficient to change one opening brace to an closing one. I assume that the result is the intended source code. Apparently it was not mechanically transferred to the book file. One error is excellent when one has to rely on manual retyping and proof reading.
Of course one cannot be absolutely sure that the character substitution I made did indeed result in the intended algorithm. All I can do is to report that the substitution resulted in successful compilation and in the correct sorting of the example, which is a random-looking sequence of eight integers. Exchanging the first two elements of this sequence gives an array that is not sorted correctly. After this baffling observation I generated a thousand randomly selected random permutations of {0,1,2,3,4,5,6,7}. Of these 73% were incorrectly sorted.
Before leaving the topic of Quicksort, I note that C How To Program (8th edition) by Paul and Harvey Deitel explains Quicksort in an exercise that asks the reader to implement it. Those who remember my confession to an allergy to fat books might infer that I disapprove of the Deitel and Deitel text. With its thousand pages and its weight of 1.3 kilograms, the book is not really portable. But the pages are used well: the examples are simple and illustrative and the coverage is impressive (an introduction to C++ is included). The level of expertise is impressive as well. E.g. (a very gratuitous example) decks of cards are shuffled using the Fisher-Yates shuffle. This is a fast algorithm of which it can be proved that it is as random as the underlying random-number generator. The Deitels are in love with programming. Witness Appendix D of the easily downloadable 7th edition, which is a lengthy discourse on Sudoku puzzles: how to solve them and how to generate them.
A Book on C by Kelley and Pohl first appeared in 1984 and is in print in its 4th edition. I am looking at the smaller 2nd edition. Its strength lies in what is left out: after all, a mortal will only read so much, and finding the necessary parts is easier in a small book than in a big one. A Book on C, first edition, was followed by C by Dissection in 1987 by the same authors. It has the same merits as its predecessor: it is more concise and emphasizes teaching by exhibiting program examples of which the structure is analysed by “dissecting” the program.
Most introductions to programming teach the elements first and postpone the packaging of code as a library to be treated as an advanced topic. The reality of software development is the building of an application using an existing library as much as possible. The quick and dirty approach is to write calls to library functions with a minimum amount of glue code. The better approach is to wonder what is missing from the library so that glue code becomes necessary. That may suggest extending library and proceeding with the application with the extended library, using less glue code. This is the approach chosen by The Art and Science of C by Eric Roberts. Even if one does not want to follow Roberts in his distinctive approach to learning C, his book is worth perusing for the examples and exercises, where his rich professional background shows.
[1] Python started as an offshoot of Geurts, Meertens, and Pemberton’s ABC, a programming language intended as a replacement of BASIC that would be palatable to computer scientists. See also my article on Python.
[2] Estimated from the 180 small pages in the C part of his book.
[3] Listed here, so I don’t have to look them up again, the co-authors are: Aho, Pike, Plauger, Ritchie, and Weinberger.
[4] The Art of Computer Programming, Volume 3 / Sorting and Searching, by Donald E. Knuth, Addison-Wesley, 1973.
The semantics of C operators should be simple to the point where each corresponds to a machine instruction on a typical computer. An exponentiation operator doesn’t meet this criterion. [1], page 247
Stroustrup’s criterion needs to be taken with a grain of salt. For example, the assignment operator does not meet the criterion: a = b takes a LOAD and a STORE. In this case the criterion translates to: “should correspond to no more than a LOAD and a STORE. But Stroustrup was onto something, because we can tweak his answer to:
The semantics of C operators should be simple to the point where they each compile to code that is as fast as what one can write in assembler.
The example of the assignment operator in a = b is instructive in another way. One could argue that it would be more in the spirit of C not to have the assignment operator in the first place, but instead have two other operators, one corresponding to LOAD and the other to STORE. But that would require C’s computational model to include registers, which it does not [5]. There is a reason for that.
One of the secrets behind C’s success is that just about any machine architecture has byte-addressable random-access memory. Moreover, pick just about any pair of machine architectures, and they differ in their register structures. By excluding registers from its computational model, C has hit the sweet spot in being close to the machine, yet not too close. So there is no place for LOAD and STORE in C’s computational model, while the assignment operator is an operation on random-access memory in the presence of an unspecified complement of registers.
These are some of the hints that C is unique in the way it manages to be low-level and high-level at the same time. Another hint is from Alexander Stepanov’s history of STL [3]. In the early 1980s he had been working on an abstract mathematical approach to a library of algorithms and data structures. His striving for abstractness drove him towards the highest posssible level in the programming language. Stepanov considered Backus’s FP. In collaboration with Kapur and Musser, Tecton was designed. This language was of such a high level that its implementability was in doubt.
When Bell Labs hired Stepanov in 1987 for work on a library for C++, Andy Koenig taught him the semantics of C. Stepanov writes: “The abstract machine behind C was a revelation” [3]. Some way past reading this I did a double-take back to this sentence: abstract machine behind C? What’s going on here? Isn’t C the one language not in need of any abstract machine? Isn’t C the one language in which to write abstract machines? Probably Koenig wasn’t talking about an abstract machine and Stepanov introduces the term to say that the way C relates to its computational model was surprising to him. In the sequel I’ll follow Stepanov in abusing the term “abstract machine” to refer to the unwieldily named “computational model”. I’ll go further and suggest that “abstract machine” is a valuable addition to one’s conceptual tool kit.
For example, “abstract machine” is a concept that clarifies an otherwise unfathomable utterance in “The Roots of Lisp” by Paul Graham [2].
It seems to me that there have been two really clean, consistent models of programming so far: the C model and the Lisp model. These two seem points of high ground, with swampy lowlands between them. [2]
It is intriguing that Graham, for whom Lisp is the one and only language, singles out C as the only other language worth knowing about. What could these languages possibly have in common so that they stand out in the vast landscape of languages? In the following I will argue that the answer is: the way these two languages relate to their (implied) abstract machines.
In “The Roots of Lisp” [2] Graham gives a précis of John McCarthy’s seminal paper [4] in which McCarthy describes the language he discovered. Graham first introduces the list. Expressions in Lisp language are lists. At the same time lists are data structures when viewed from the machine end. Seven primitive operations on expressions are introduced: ATOM, QUOTE, EQ, CAR, CDR, CONS, and COND. To these are added a notation for functions by means of the keyword LAMBDA; the keyword LABEL is added to make recursive function definitions possible. I speak of “primitive operations” advisedly because this term can refer to functions taking a data structure as argument as well as to commands to change the state of a machine. And thus to invite comparison of Lisp with languages implemented via abstract machines.
With these nine keywords of Lisp the function EVAL is defined which takes as its argument an expression and yields the value of that expression as result. EVAL computes any computable function; not a surprise, given earlier theoretical results in the lambda calculus. McCarthy presented his discovery as a theoretical result that exhibits a (to his taste) superior alternative to the Turing machine as basis for computing. One of McCarthy’s students, Steve Russell, did not see the theoretical intention as an obstacle to implementing lists and the nine primitives in assembler on the IBM 704, resulting in, presto, the first Lisp implementation.
McCarthy’s paper [4] gives the definition in Lisp of mutually recursive functions APPLY and EVAL with the former as top-level call. Graham’s cleaned-up version only defines EVAL, with the same result. In both cases the definitions occupy about a page of text.
One page of language L to implement L in itself. Of course every language can be implemented in itself. Fortran, Basic, … even COBOL, if you insist. I doubt whether such implementations would be readable. McCarthy’s is, though I prefer Graham’s typography. But what I find even more striking is that the seven primitives occur frequently in all Lisp code. Compare that with languages based on an abstract machine, like Prolog and Java.
This invites a comparison between the complexity of the Java Virtual Machine with that of the software infrastructure needed to support McCarthy’s seven primitives, given today’s typical machine architecture. If these complexities are about the same, then Lisp is a low-level language compared to Java.
Let us return to C. Unlike Lisp, neither the language nor its definition invites us to think in terms of an abstract machine. But, Stroustrup’s remark suggests that fragments of an abstract machine one can be ferreted out from the semantics of the language. The arithmetic operators correspond to machine instructions, but they are mostly the same as in just about any other language. What is more intriguing is that assignment and dereferencing are frequently used and close to the machine.
From the point of view of primitive operations, the level of Lisp is as low as you can get: if you count function call as primitive, then all Lisp code consists of primitives. C is lower-level than other languages, with the exception of Lisp: in C there are, apart from operators and function calls, many language features that do not fall into these two categories.
This situation suggests a tantalizing project: explicitly define the C abstract machine and define a new language (“TRUE C”?) entirely in terms of operations on this machine.
But even in its current state C is special: it is unique in the imperative part of programming in the same way that Lisp is unique in the functional part. Maybe that is why Graham considered Lisp and C as high points in the swampy lowlands that make up the landscape of programming languages.
Thanks to Paul McJones for corrections and helpful remarks.
[1] “The Design and Evolution of C++” by Bjarne Stroustrup. Addison-Wesley, 1994.
[2] The Roots of Lisp by Paul Graham.
[3] Short History of STL by A. Stepanov.
[4] “Recursive functions of symbolic expressions and their computation by machine, Part I.” by John McCarthy. Communications of the ACM 3.4 (1960): 184-195.
[5] Declaring a variable as “register” in C does not imply that registers exist in the computational model of C. It is merely a suggestion to give the variable priority for being kept in the unspecified set of available registers.