A Programmers Place

“AI”: what’s in a name?

Maarten van Emden — Sun, 20 Oct 2019 22:54:56 +0000

In spite of my near-total isolation from news media, I have noticed that references to “AI” are ubiquitous. A perusal of such material suggests that “AI” means Machine Learning, which apparently means the newly discovered technique of multilevel neural networks. While this technique is spectacularly effective, it should be realized that it still is still basically a curve-fitting exercise (albeit with a novel family of curves). The result is a tool that lacks in the transparency that is taken for granted in older methods. In this essay I explore the question whether there is any connection between “AI”, as recently featured in the non-technical media, and the Artificial Intelligence envisaged by John McCarthy, who invented the term in 1955.

History

On August 31, 1955 a document [1] was submitted with the title “A Proposal For The Dartmouth Summer Research Project On Artificial Intelligence”. The authors were:

J. McCarthy, Dartmouth College
M. L. Minsky, Harvard University
N. Rochester, IBM Corporation
C.E. Shannon, Bell Telephone Laboratories

The abstract:

We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

Although the term “artificial intelligence” (henceforth AI, without the quotes) was launched in this document, the idea had been in the air for a few years. Turing had discussed it during World War II. In 1958 a symposium was held with the title “Mechanisation of thought processes”. McCarthy contributed a paper “Programs with common sense” in which he presents his plan for a program called Advice Taker [2]. Though conceived as a computer program, it was a radical departure from any existing one: the processing was to be symbolic rather than numeric and, instead of being controlled by commands, it “took advice”.

The Advice Taker project needed a suitable programming language. McCarthy rejected Newell, Shaw, and Simon’s IPL, the only candidate. So a new language was called for. McCarthy liked Fortran’s “algebraic structure”, as he called it. He tinkered for a while with a Fortran compiler which was to accommodate IPL’s lists. Miraculously, what emerged from this unpromising start was LISP.

The Advice Taker concept was sharpened to the concept of a Question Answering program called QA3, and described by Cordell Green in his 1969 paper “Theorem-Proving by Resolution as a Basis for Question-Answering Systems” [3], which, among other things, was a precursor of logic programming.

What is “intelligence”, anyway?

In centuries past philosophers inquired into the nature of intelligence. This happened in response to external stimuli, sometimes in the form of new technology. One example is the concept of man as a machine, stimulated by the building of ever more ingenious and complex mechanisms in the 18th century.

Another such instance of new technology as mind stretcher started with the mechanisation of telegraphy in the 19th century, leading to the teleprinter. By 1917 this technology had matured sufficiently for Gilbert Vernam to invent a teleprinter that encrypted or decrypted automatically as it sent or received. A wireless version was used during the second world war. Much of Turing’s work was on decrypting messages encrypted by TUNNY, as the German version was called by the British [4], A digital electronic device named “Colossus” was built to facilitate decryption. Colossus was a precursor of the stored-program digital computer. The teleprinter was prime choice for communication with the early stored-program digital computers, the first of which became operational in 1949, in England.

It was an intriguing possibility that a person at a teletype would not know whether it is connected to another person at a teletype or to a computer. This possibility inspired Turing to conceive the Gedanken experiment that has become known as the Turing Test [5]. Apparently for Turing the test for intelligence in a stranger was the ability to strike up a congenial conversation such as one experiences occasionally with a stranger in a plane, on a train, or at a reception.

For others the test for intelligence is the ability to solve problems. At MIT a program was written to solve freshman calculus problems. Another program solved word problems. A program was written that solved planning problems in a blocks world from instructions typed in English.

Donald Michie, a collaborator of Turing during World War II and the founder of the Department of Machine Intelligence in the University of Edinburgh, pointed out that calculus problems can be solved algorithmically. Thus the mere fact that a program solves such problems is not an indication of its intelligence. Another example is chess. When Deep Blue gave Kasparov a hard time, it was an algorithm and an intelligent agent reaching the same high level of performance. The most blatant example of the contrasting approaches to solving problems appears in multiplying large numbers. There is a simple algorithm for this. When a human performs the same task, as some calculating prodigies can, it requires a sizeable knowledge base and the rapid formation of a plan; that is, it requires an intelligent agent.

Ability to solve problems is apparently not a satisfactory criterion for intelligence. A better criterion may be the ability to quickly learn a wide variety of things. But what is “learning”? This could be defined as “an agent has learned something when it has knowledge that it didn’t have before”. This is problematic because of “knowledge”.

We must not be deterred because we can’t define “intelligence” or “knowledge”. We can make progress by trying best-effort answers, even when they send us around in circles. Otto Neurath, a philosopher and economist who was active in the first half of the 20th century, described the predicament as follows:

We are like sailors who on the open sea must reconstruct their ship but are never able to start afresh from the bottom. Where a beam is taken away a new one must at once be put there, and for this the rest of the ship is used as support. In this way, by using the old beams and driftwood the ship can be shaped entirely anew, but only by gradual reconstruction [6].

Expert systems

How does an agent acquire knowledge? McCarthy was not interested in reward schedules for rats running mazes. He wanted a computer program that learned by being told, which is one of the capabilities of his projected Advice Taker.

Advice Taker was a general-purpose program. One can also imagine a program that is specialized to acquire the knowledge of an expert in a specialized area. Such a system is called “expert system”. An attractive area was that of a specialist physician. Hence INTERNIST-I, a system was designed to capture the expertise of just one man, Dr Jack D. Myers. By the late 1970s fifteen person-years of work had gone into its development. It had some educational use, but was not usable in practice. Such a tool is direly needed. The fact that it is not widely used implies that the approach is a dead end.

Dr Myers is not the first medical specialist held in awe by his colleagues for miraculous diagnostic powers: in New York Dr Emanuel Libman had a similar reputation. David Ogilvy met Libman in the late 1930s. In his memoir [7] Ogilvy reports:

Dr. Alexis Carrel, who was then head of the Rockefeller Institute of Medical Research, told me that the most important thing in medicine was to persuade Libman to write down his methods of diagnosis before he died. But this tiny, whitefaced old man got childish pleasure out of mystifying his fellow doctors and never did so. He published more than a hundred papers, but held back his magic tricks.

The childish pleasure is Ogilvy’s guess. I have a better theory: Libman could not write down his methods. In his book The Tacit Dimension [8] the philosopher Michael Polanyi writes on page 4: “we can know more than we can tell”. He devotes the rest of the chapter “Tacit Knowing” to substantiating this claim. Dr Carrel’s belief was erroneous, but plausible. It was shared by the instigators of the INTERNIST-I project. One wonders whether machine learning applied to the data collected might be a way to gain access the tacit knowing of Dr. Myers.

Alien Intelligence

I.J. Good introduced the concept of an Ultra Intelligent Machine, which he defined as follows.

Let an ultra-intelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultra-intelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultra-intelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. It is curious that this point is made so seldom outside of science fiction. It is sometimes worthwhile to take science fiction seriously. [9]

This was in 1963. After having lain dormant for a long time, the concept has given rise to a flurry of apocalyptic speculation (Bostrom, Tegmark, Harari, Lovelock). Opinion on the possibility of an Ultra Intelligent Machine is divided. There are those whose urgent concern is to prevent its realization. There are those who are resigned to, or even welcome, the advent of the UIM. Finally there are those who believe the possibility is near enough to nil not to worry.

I read Good as saying that an intelligent machine is unlikely have a level of intelligence that is in the same order of magnitude as that of humans. The lower limit is nil; such machines are familiar and ubiquitous. As we have seen here, there are machines whose level of intelligence, while debatable, is not such that visions of UIM arise. Good points out that there is no upper limit. Apparently Good views intelligence as totally ordered, that is, that any two intelligences can be compared with the result that they are equal, that one is greater than the other or the other way around.

I propose that intelligences are partially ordered, with comparability as an exception rather than the rule. Intelligences other than the human kind are likely to be incomparable, so alien rather than “ultra”. The case for powerful alien intelligences is suggested to me by Chomsky’s book What kind of creatures are we? [10].

The Materialist Thesis (MT) holds that all mental phenomena have a physical basis. This is universally accepted by scientists, but only in the weak sense that the alternative is inconceivable. Chomsky introduces a stronger version: our physical brain structure determines both the scope and limitations of what we can know. I shall refer to this as SMT, the Strong Materialist Thesis.

The SMT makes it interesting and even urgent to develop intelligences with a different physical substrate than the human brain and therefore with different scopes and different limitations. Such intelligences are the ones I call alien. The possibility of these is not suggested by Chomsky but seems to me a consequence of the SMT. Chomsky is concerned with the SMT.

It remains to give an idea how Chomsky justifies the SMT. As for the MT, it is hard to disagree, but it is inconsequential. The SMT is consequential, but is hard to believe. Chomsky justifies the SMT in two chapters of [10]. An outline follows.

What is language?

No language, no thought. The ability to acquire language is innate. It is supported by special-purpose brain structures. These structures arrive in two stages: stage I is complete at birth; stage II develops with the still rapidly growing brain in the first few years of life. During this stage exposure to language users is essential.

Language is composed of two components: I-language and E-language. I-language (I for Internal) is the Language of Thought. It has a grammar according to which non-sequential structures are manipulated. E-language (E for External) is sequential. It is used in speech or writing.

What can we understand?

These features of language imply that the scope and the limitations of what we can know is determined by features of brain structure. This is the SMT. The limitations imply that only some unknowns are knowable. Chomsky calls these “problems”. The other unknowns he calls “mysteries”.

Alien Intelligence

Chomsky’s theory suggests that thought can occur in non-human brain structures. I would call this Alien Intelligence. Kowalski argues [11] that logic is the human Language of Thought. This thesis can be made less vulnerable by splitting it in two parts, namely his answers to the questions: (1) What is the human Language of Thought? (2) Is logic adequate as Language of Thought for some kind of thinking agent? An affirmative answer would open the way to an interesting intelligence, though possibly alien. LPS, Logic Production Systems, represents progress by Kowalski et al. in this direction since [11].

References

[1] “A Proposal for the Dartmouth summer research project on Artificial Intelligence” by J. McCarthy, M. L. Minsky, N. Rochester, and C.E. Shannon. 1955.
[2] “Programs with common sense” by J. McCarthy. Symposium on Mechanization of Thought Processes. National Physical Laboratory, Teddington, England (1958).
[3] “Theorem proving by resolution as a basis for question-answering systems.” by Cordell Green. Machine intelligence 4 (1969): 183-205.
[4] Breaking Teleprinter Ciphers at Bletchley Park: General Report on Tunny with Emphasis on Statistical Methods (1945) Editor(s): James A. Reeds, Whitfield Diffie, and J. V. Field. Published 2015 by Wiley. Consists of the 1945 report written by I.J. Good and D. Michie and supporting articles.
[5] “Computing Machinery and Intelligence” by Alan Turing. Mind, LIX (236): 433–460.
[6] Problems in War Economics by Otto Neurath. According Wikipedia article “Neurathian Bootstrap”, October 4, 2019.
[7] Blood, Brains, and Beer by David Ogilvy. Atheneum, 1978.
[8] The Tacit Dimension by Michael Polanyi. Peter Smith, Gloucester, Mass., 1983. University of Chicago Press, 2009.
[9] “Speculations concerning the first ultra-intelligent machine” by I.J. Good. Advances in computers. Vol. 6. Elsevier, 1966. 31-88.
[10] What kind of creatures are we? by Noam Chomsky. Columbia University Press, 2015.
[11] Computational Logic and Human Thinking by R.A. Kowalski. Cambridge University Press, 2011.

History of “Structured Programming”

Maarten van Emden — Tue, 30 Oct 2018 05:06:45 +0000

At its introduction around 1970, structured programming was controversial. It was soon universally accepted. In this essay I trace the history of structured programming and argue that it is worthwhile to re-open the controversy.

History of the goto statement

From their beginnings stored-program computers worked by going through the fetch-execute cycle. The cycle begins with fetching the content of the memory location of which the address is found in the CPU’s instruction counter. By default this counter is incremented so that the next cycle executes the content of the next memory location. It is essential that variations on this default are possible. Such variations are provided by instructions that write to the instruction counter, so that execution can continue at any memory location of the programmer’s choice.

In the first few years of stored-program computers, before the first high-level language, all programs were written in machine code or assembler code. Such code made it natural to sequence instructions by writing to the instruction counter. In high-level languages (several before the first Fortran, see [1]) the only memory locations that were legal to transfer to were the ones at which the compiler had placed the beginnings of statements. By placing a label L in the code, this memory location was made accessible to the programmer; by writing the statement goto L the programmer achieved the effect of writing to the instruction counter.

The availability of the goto statement made, for example, Fortran a curious mixture of high- and low-level language. This was even more the case with subsequent high-level languages. These aspired to a higher level by allowing a more abstract approach to program design, yet retained the goto statement. The first that I know of who was bothered by this dissonance was Heinz Zemanek who “expressed doubts about goto statements as early as 1959” [1a, page 5]. D.V. Schorre systematically avoided labels and goto statements in 1964. Around the same time Naur published remarks critical of the goto statement and Forsythe was purging them from submissions to the algorithms section of the Comm. ACM [1a, page 5].

Compared to what came later these critical remarks were mere murmurs. In 1968 the Communications of the ACM published a Letter to the Editor with title “Goto statement considered harmful” [2] in which E.W. Dijkstra not only elaborated on the title, but proposed that programmers abstain from its use and that it be left out of future programming languages. Dijkstra argued that code should be written instead with conditional, alternative, and repetitive clauses. This publication started a lively controversy. Knuth writes [1a] of Dijkstra telling him that he received a torrent of abusive letters.

1968 was also the year of the first of two NATO Conferences on “Software Engineering”. The programming pundits convened here voiced their alarm over the number of software projects that were delivered far over deadline, were far more costly than was anticipated, or were cancelled outright. It may be here that the term “software crisis” was born. Dijkstra [3] reports that the conference gave him the courage to publish the Letter to the Editor.

Though goto-less programming is often equated with it, the term “structured programming” does not occur in the Letter to the Editor. It may not even have existed in 1968. The earliest mention of “structured programming” I have been able to find is in EWD249 “Notes on Structured Programming” [4] dated August 1969. It is this term that created even more controversy than the Letter to the Editor. It rapidly gained support, especially in academia, where it happened that professors with no programming experience had to teach the introductory programming course. As they could not tell the students what to do, they welcomed the opportunity of telling them what not to do: use a goto statement. These theoretical types enjoyed the casuistry concerning whether and in what cases the rule might exceptionally be broken.

Fast forward now to 1996, when C.A.R. Hoare finds occasion to note [5] that the dire predictions of the 1968 NATO conference had not come to pass. Specifically, the prediction was that without anything less than a revolution, software development would be stuck in the sorry state noted at the conference. For Dijkstra this revolution would have to take the form of correctness proofs of all code. Hoare published a verification method soon after. Yet the title of Hoare’s 1996 paper is “How Did Software Get So Reliable Without Proof?”.

Hoare reviews several contributions to this welcome, yet unanticipated development. Successive sections are devoted to Management, Testing, Debugging, and Over-engineering. Most relevant to this article is the next section, Programming Methodology.

Research into programming methodology has already had dramatic effects on the way that people write programs today. One of the most spectacular successes occurred so long ago that it is now quite non-controversial. It is the almost universal adoption of the practice of structured programming, otherwise known as avoidance of jumps (or gotos). Millions of lines of code have now been written without them. But it was not always so. At one time, most programmers were proud of their skill in the use of jumps and labels. They regarded structured notations as unnatural and counter-intuitive, and took it as a challenge to write such complex networks of jumps that no structured notations could ever express them.

As we will see later, not everyone agreed on this characterization of structured programming. Hoare continues with an explanation of how this other revolution came about.

The decisive breakthrough in the adoption of structured programming by IBM was the publication of a simple result in pure programming theory, the Bohm-Jacopini theorem. This showed that an arbitrary program with jumps could be executed by an interpreter written without any jumps at all; so in principle any task whatsoever can be carried out by purely structured code. This theorem was needed to convince senior managers of the company that no harm would come from adopting structured programming as a company policy; and project managers needed it to protect themselves from having to show their programmers how to do it by rewriting every piece of complex spaghetti code that might be submitted. Instead the programmers were just instructed to find a way, secure in the knowledge that they always could. And after a while, they always did.

This completes what I consider the external history of structured programming: the history viewed from the outside. For the internal history I turn to what Dijkstra wrote about it in EWD1308 (dated June 2001) [3]: “I knew that in retrospect, the Algol implementation and the THE Multiprogramming System had only been agility exercises and that now I had to tackle the real problem of How to Do Difficult Things.” As a first step in this new direction he wrote EWD249 “Notes on Structured Programming”. It is interesting to compare what Hoare wrote about the significance of IBM and its adoption of structured programming with what Dijkstra wrote in 2001 [3]:

… IBM … stole the term “Structured Programming” and under its auspices Harlan D. Mills trivialized the original concept to the abolishment of the goto statement.

Indeed a far cry from “Learning How to Do Difficult Things.”

In 1969, with the completion of “Notes on Structured Programming”, Dijkstra took the next step with “Concern for Correctness as a Guiding Principle for Program Construction” [6]. He believed that the Software Crisis, the theme of the 1968 NATO conference, could only be overcome by verifying all code. In this paper he wrote

When correctness concerns come as an afterthought and correctness proofs have to be given once the program is already completed, the programmer can indeed expect severe troubles. If, however, he adheres to the discipline to produce the correctness proofs as he programs along [my italics], he will produce program and proof with less effort than programming alone would have taken.

In structured programming one cannot help writing the loop code first and then wondering what might be a fitting invariant. In the quoted passage Dijkstra implicitly repudiated structured programming. And rightly so, when verified code is the goal, structured programs are not necessarily helpful: they are valued because they are readable, which means they are plausible, rather than verified. If verification is the goal, then one has to start from scratch.

I get the impression that Dijkstra’s “Concern for Correctness …” was a bit of a lead balloon. As of this writing, Google Scholar gives 13 as the number of citations. Compare this with 1768 citations for “Goto Considered Harmful” and 2093 for “Structured Programming”, the book. This makes one wonder whether anything has happened since 1970 to find something more specific than “to produce the correctness proofs as he programs along”.

In 2009 R-J. Back wrote a paper [7] under the title “Invariant-Based Programming”, which solved Dijkstra’s conundrum by the observation that, given loop code, finding a fitting invariant can be hard, whereas given a suitable invariant, it is easy to find corresponding code. He attributes this method to Reynolds’s 1978 “Programming with Transition Diagrams” [8] and to van Emden’s 1979 “Programming with Verification Conditions” [9]. These papers introduce the same concept. Reynolds starts with the syntactic side (the diagrams), whereas van Emden starts with the semantic side (“verification conditions” in the sense of Floyd).

What is imperative programming?

On the subject of programming paradigms “imperative” is often contrasted with “declarative”. Imperative is characterized by being dominated by the concept of state, whereas state is supposed to play no role in declarative programming. Lamport argues [10] that this dichotomy is not helpful. A program in Scheme or in Prolog, however true to the language’s ideal style, specifies a computation and, as Lamport has pointed out, all computation is performed by a state machine of some sort. This is obviously true of imperative programs, but it is also true of programs in Scheme or Prolog, although typically less obviously so.

Still, structured programming is different from the kind of programming for which Scheme or Prolog is intended for. How can one characterize the difference? Not by whether a state machine is involved. The difference seems to lie in the structure of the state and in its visibility. In the kind of programming that Dijkstra is interested in, and that I loosely refer to as imperative programming, the structure of the state is simple: the state is a vector of primitive values indexed by variables. The state is visible: new values are created as values of expressions containing state components and decisions are taken on the basis of such expressions. In the kind of programming for which Scheme and Prolog are intended, the structure of the state is complex, containing stacks and pointers into them. These are not visible and are manipulated indirectly via the evaluation of expressions (Scheme) or via the elaboration of goal statements (Prolog).

Floyd’s verification method

To act on Dijkstra’s injunction to make a correctness proof the guiding principle in program design, I find that Floyd’s verification method [11] is the best starting point. So let us review this method, if only to establish terminology.

The program is given as a flowchart with the nodes identified by labels. Without loss of generality I consider flowcharts with two distinguished nodes: the start node (the one node without an incoming arc) and the halt node (the one node without an outgoing arc). For a correctness proof each label is associated with an assertion.

An assertion is a formula of first-order predicate logic. As such it is restricted to a given vocabulary, which consists of function and predicate symbols and of variables. The variables can be bound or free; only the latter concern us here. The set of assertions that belong to the same flowchart are restricted to the same vocabulary. Whether an assertion is true depends on an interpretation of the function and predicate symbols and on the state (the values of the variables).

To follow a computation by a flowchart one has to know what the current state is and at what node control has arrived. Thus a computation can be formalized as a sequence of pairs consisting of a label and a state. Such a sequence is a trace of the flowchart if is followed by whenever L0 is followed by L1 in the flowchart with no intervening label and if s0 is transformed to s1 by the intervening code. Infinite traces exist in some flowcharts. Finite traces end in the halt node. A finite trace is a computation of the flowchart if it starts in the start node.

The expression {p}C{q} has become known as “Hoare triple”, although Hoare introduced it as p{C}q to mean the same as the verification condition introduced by Floyd and written by him as V_C(p;q). I adopt the notation of the Hoare triple and use Floyd’s name for it. The meaning of a verification condition {p}C{q} can be characterized by saying that it is true whenever truth of p in state s and s being transformed to state t by executing C together imply that q is true in t.

In Floyd’s original form, the command C is a fragment of a flowchart. Hoare restricted it to be a fragment of a structured program. Programming With Verification Conditions (as in [9]) is based on the observation that {p}C{q} is defined for any C of which the meaning is characterized by a binary relation. This gives a generalization with respect to flowcharts because a formula by itself can be regarded as a binary relation between states that is a subset of the identity relation on states. For example, {p} x>0 {p & x>0} is true for any assertion p. Think of x>0 as a partially defined command that fails in any state in which x <= 0 and that otherwise successfully terminates without changing the state.

Imagine now that we have a program with labels and that each label is associated with an assertion. Suppose that the semantics of goto L is a claim that assertion L is true of the state. If p and q are the assertions associated with the labels P and Q, then (P: C; goto Q) is executable code and {p}C{q} is its verification condition. Any set of verification conditions with common state variables becomes a program. All we have to do is to allow programs for which attempts at execution can have the outcomes that can be classified as follows: (a) there is a final state, and (b) there is no final state (“infinite loop”). In case (a) we can have that (a1) the final state is reached at the halt label (“success”), and (a2) the final state is reached at another label (“finite failure”). (a2) is not possible in flowcharts; it is a possibility that arises in the generalization of flowcharts to sets of verification conditions.

Matrix Code

Once we know that any set of verification conditions with a common vocabulary is in principle executable, any conceptual barriers that may have existed are removed; what remains is mere technicalities. What follows is a description of one route from a set of verification conditions to a program I can run on my computer.

A set of verification conditions, being a set, is not ordered. In a listing we can re-order them in any way without affecting the meaning of the program. The first thing to happen after [8] and [9] was the observation that a set of verification conditions {p}C{q} can be read as a matrix in which C is the entry in column P (the label of p) and in row Q (the label of q); the entries not thus specified contain the empty binary relation. Thus the typical set of verification conditions is a sparse matrix. Indeed, sparse matrices of reals are often specified by an unordered listing of triples consisting of a non-zero entry together with its row and column index. The unlisted entries are assumed zero.

In [12] the matrix format for verification conditions is called Matrix Code. An advantage of the matrix format is that behaviour of the program can be couched in terms of matrix multiplication. Apart from its application to computer programs this paper contains a number of theoretical results. Floyd’s characterization of the verification conditions for a flowchart is generalized and related to fixpoint semantics.

In [12] programs are left in the traditional two-dimensional format for matrices. Though this works reasonably well for small examples on a whiteboard (which is where Matrix Code was born), it helps to use a simple character-oriented editor such as emacs or vi. To bridge the gap between the program-as-set-of-verification-conditions and compilable code it helps to have an intermediate form. One will see all this worked out in [13].

Coda

Hoare’s retrospective [5] shows that after 25 years programming method was still stuck in structured programming. How could this happen? Dijkstra was dismayed to find that structured programming had been equated with programming without goto statements. He couldn’t have expected otherwise, what with his emphasis on source code structures (conditional, alternative, and repetitive statements). With his introduction of the guarded-command language Dijkstra reinforced this emphasis. People could not have guessed that structured programming was meant as a first step towards a method for “doing difficult things”.

I advocate Matrix Code as a further step towards such a method. Matrix Code requires translation to code with the goto statement as the only way of transferring control. In the Letter to the Editor “Goto statement considered harmful” [2] Dijkstra advocates never using the goto statement. There must be something in the letter that I disagree with. What is it?

Dijkstra opens the argument with

My first remark is that, although the programmer’s activity ends when he has constructed a correct program, the process taking place under control of his program is the true subject matter of his activity, for it is this process that has to accomplish the desired effect; it is this process that in its dynamic behaviour has to satisfy the desired specifications.

What I propose is to change “the true subject matter of his activity” from that process to the goal the process is to achieve. That goal is a static entity. It can be expressed as an assertion. The programmer’s activity should be to design achievable goals and to link their assertions with verification conditions. One can statically ascertain whether a set of verification conditions is sufficient for the task at hand.

To proceed thus is to work at a higher level than structured programming; if it needs a name, then “goal-directed programming” might do. From the point of view of [6] this is a step forward because it is assumed there that code has to be verified. Indeed goal-directed programming results in code that is verified in the sense of Floyd. But I see a more important advantage. Without knowing an algorithm to achieve the goal, the sub goals needed suggest themselves, together with verification conditions that link them to a complete set. As observed before, to get from a set of verification conditions to executable code is trivial mechanics. For examples of this miracle, see [13].

If you have a goal, then you know where you are going to and goto is exactly what you need.

Acknowledgements

I am indebted to Paul McJones for discussions and pointers to material.

References

[1] “The early development of programming languages.” by D.E. Knuth and L.T. Pardo. A history of computing in the twentieth century. 1980. 197-273.
[1a] “Structured Programming with go to Statements” by D.E. Knuth. ACM Computing Surveys 6.4 (1974): 261-301.
[2] “Letters to the editor: go to statement considered harmful.” by E.W. Dijkstra. Communications of the ACM 11.3 (1968): 147-148.
[3] “What led to ‘Notes on Structured Programming'” by E.W. Dijkstra. EWD1308, June 2001.
[4] “Notes on Structured Programming” by E.W. Dijkstra. EWD249, August 1969.
[5] “How did software get so reliable without proof?” by C.A.R. Hoare. International Symposium of Formal Methods Europe. Springer, 1996.
[6] “Concern for correctness as guiding principle in program construction” by E.W. Dijkstra. EWD288, July 1970.
[7] “Invariant based programming: basic approach and teaching experiences” by R-J. Back. Formal Aspects of Computing 21.3 (2009): 227-244.
[8] “Programming with transition diagrams” by J.C. Reynolds. Programming Methodology. Springer, 1978. 153-165.
[9] “Programming with verification conditions” by M.H. van Emden. IEEE Transactions on Software Engineering 2 (1979): 148-159.
[10] “Computation and state machines” by L. Lamport. Unpublished note (2008).
[11] “Assigning meanings to programs” by R.W. Floyd. Mathematical aspects of computer science 19.19-32 (1967): 1.
[12] “Matrix code” by M.H. van Emden. Science of Computer Programming 84 (2014): 3-21.
[13] “Beyond Structured Programming” by M.H. van Emden.
Posted on arXiv October 2018.

A Bridge too Far: E.W. Dijkstra and Logic

Maarten van Emden — Sat, 21 Jul 2018 17:47:37 +0000

1. Introduction

Shortly after the death of E.W. Dijkstra (1930-2002), K.R. Apt wrote an obituary under the title Portrait of a Genius [1]. At the time it seemed to me a bit over the top. Now, with the added wisdom of the intervening years, I consider the title accurate. The most striking feature of EWD’s career was that his remarkable string of successes was followed by a debacle: the book on logic that he co-authored. In this article I list EWD’s achievements, sketch their significance, and try to penetrate into the reasons why he attached such value to logic.

2. Highlights

Let us consider the highlights in EWD’s work.

Shortest path algorithm (1956)
Started the industry of graph algorithms. For the genesis of the algorithm see [1a].
Compiler for Algol 60
Algol 60 has been characterized by C.A.R. Hoare as [2]

… a language so far ahead of its time, that it was not only an improvement on its predecessors, but also on nearly all its successors.

The committee in charge of the language’s definition was divided on whether the language could be implemented even when its report was approved in January 1960. By the summer of 1960 the Algol 60 compiler was completed by EWD and J. Zonneveld.

Concurrent programming
In the late 1950’s EWD had the task of designing the I/O routines for the Electrologica X-1, an early computer with interrupts. This set him thinking about a suitable abstraction to deal with the new phenomenon of non-reproducible behaviour of hardware. By 1961 he had invented the semaphore, which featured in his 1965 paper “Cooperating Sequential Processes” together with the introduction of the mutual exclusion problem and critical sections. This started a flurry of activity in search of ever better synchronization primitives of which the monitors of Hoare and Brinch-Hansen are an example.
The THE operating system
At the first Symposium on Operating Systems Principles (1967) EWD reported on the structure of the multiprogramming operating system for the Electrologica X-8 developed under his leadership at the Technical University in Eindhoven. The audience was impressed, hearing for the first time concepts that have since then been widely adopted. One in the audience, surprised at getting answers to detailed questions, said: “You mean, did you people implement all this?”
Structured programming
As far as I can tell EWD invented the term “structured programming”. The earliest occurrence I found was in EWD249 under the title “Notes on Structured Programming” dated August 1969. The term escaped into the world at large with the Academic Press (1971) book authored by O.J. Dahl, E.W. Dijkstra, and C.A.R. Hoare with title “Structured Programming”. The book consists of three parts, each attributed to a single one of the three book authors. I presume that the book title was taken from Dijkstra’s contribution, which is EWD249, possibly revised.

“Structured programming”, the term, took off like a rocket. The extent to which it set the academic programming world abuzz is illustrated by the fact that there are in Knuth’s 1974 “Structured programming with goto statements” [3] about 35 references where I can tell from the title that they are about structured programming.

The publication of the book had the effect of turning the largely negative reaction to “goto statement considered harmful” into a positive one. It was as if the world said: “Now I see what you mean.”

Non-determinism
In 1967 R.W. Floyd had introduced non-determinism as an additional language feature. The proposal concentrated on how to implement it. For EWD it was a subtraction: by avoiding what he considered overspecification, it became a natural consequence of the semantics of the guarded-command language [4a].
Self-stabilization
So far EWD’s contributions were instant hits. Self-stabilization, on the contrary, was truly a sleeper. The concept was introduced by EWD in 1974 in a two-page paper [3a]. It was ignored until 1983, when its importance was stressed in an invited talk by Lamport. Apt and Shoja [3b] sketch its meteoric rise since then. They count 2000 references and report the 18th annual Self-Stabilizing Systems Workshop by 2016.
Vivid examples and terminology
EWD has been successful in coining terminology that sticks in the mind:
- banker’s algorithm
- critical section
- deadly embrace
- dining philosophers
- guards
- semaphores
- sleeping barber
3. “A Discipline of Programming”

In 1976 EWD published a book under the title “A Discipline of Programming” [4], here referred to as DoP. It has two components.

3.1. Neat algorithms

From the preface I quote

… on the one hand I knew that programs could have a compelling and deep logical beauty, on the other hand I was forced to admit that most programs are presented in a way fit for mechanical execution but, even if of any beauty at all, totally unfit for human appreciation.

EWD had in mind a collection of neat algorithms for which even Algol 60 was not good enough to do justice to their “compelling and deep logical beauty”.

3.2. Guarded commands

In “Guarded Commands, Nondeterminacy and Formal Derivation of Programs” [4a], EWD presented the guarded-command language as one in which it is possible to formally derive programs from their specification in the form of a postcondition. The derivation then yields both the program and the preconditions giving the initial states from which the final states are guaranteed to satisfy the given postcondition. In addition to this remarkable innovation, the guarded command language has the property that its use brings out in the open the logical beauty of certain algorithms, otherwise unfit for human appreciation.

4. Remarks on DoP

I will start with the experts for an evaluation of DoP. This is C.A.R. Hoare, in the Foreword of DoP:

The book expounds, in its author’s usual cultured style, his radical new insights into the nature of computer programming. From these insights, he had developed a new range of programming methods and notational tools, which are displayed and tested in a host of elegant and efficient examples. This will surely be recognized as one of the outstanding achievements in the development of the intellectual discipline of computer programming.

When I spotted in 1976 a copy of DoP in the University of Waterloo’s book store, I pounced on it. Spurred on by this Foreword I decided to make that copy my own. On the basis of [4a] I looked forward to being able to derive alorithms from their specifications. After all, this article had only shown how to do it for the maximum of two numbers and for Euclid’s algorithm for the GCD. On closer inspection, WPs are mainly used in DoP to “semantically characterize” the guarded-command language.

The first few chapters are devoted to Weakest Preconditions (here referred to as WPs), the guarded command language, and its characterization by WPs. Chapter 8, “The formal treatment of some small examples”, contains eight examples. The first seven are derived by means of WPs, and are trivial. The eighth example is neat: given its rank, find the permutation that has this rank. It is not derived. What EWD offers as explanation is interesting, but I failed to see a connection with the program. All I see is an intriguing bit of code of which I did not even begin to see how it would solve the problem. The easiest way to discover whether it was just nonsense, seemed to be to commit the ultimate sacrilege, to translate it to C. It passed enough tests to convince me of the original’s correctness.

Chapters 13-25 are devoted to neat algorithms, one per chapter. I have only studied one, Chapter 17 on Hamming’s problem, well enough to reproduce a solution independently. Here EWD does a better job at explanation. Both the rank-to-permutation of Chapter 8 and Hamming’s problem effortlessly translate to C with a result that is hardly longer and not particularly inelegant.

Although DoP does not make good on the promise of the title of [4a], it has compensating merits: it proves that the guarded-command language is a valuable invention: it is easy to learn and, thanks to it, EWD’s algorithms are accessible to me. This would not be the case if he had written them in Java, Python, or JavaScript. The neat algorithms, though not derived, are rationally motivated by presenting invariants that are found with plausible heuristics.

5. The logic debacle

In 1990 EWD published “Predicate Calculus and Program Semantics”, co-authored with C.S. Scholten” [5], here referred to as D&S. Egon Börger, a logician on the faculty of the Computer Science Department of the University of Pisa, wrote a review of D&S that appeared in The Journal of Symbolic Logic and in Science of Computer Programming [6]. It begins as follows

The book deals with a fascinating subject, indicated in its title: a theory of program semantics based on predicate logic. When I was asked to write a review of this book, I was preparing a course on the subject and accepted with pleasure the opportunity to study what I expected to be an illuminating study from famous authors’ hands. Unfortunately, my expectation was not met. Moreover, I believe the book will not be helpful to those interested in the subject area. What follows is a detailed review that explains my critical judgement and the reasons I cannot recommend the book.

What follows in Börger’s review seems to me a factual and somewhat technical account of some of the shortcomings of D&S.

David Gries tried to be helpful and co-authored a paper that attempted to knock D&S into shape to make it palatable to logicians. EWD’s reaction is in “a semi-open letter to David Gries” (EWD1227). The following passage is relevant here:

I never felt obliged to placate the logicians. I (kindly!) assume that their customary precautions are needed and sufficient for reaching their goals. If they can help and correct me by pointing out where my precautions have been insufficient and where my own discipline has led or will lead me astray, they are most welcome: I love to be corrected. (Besides being a most instructive experience, being corrected shows that the other one cares about you.) If however, they only get infuriated because I don’t play my game according to their rules, I cannot resist the temptation to ignore their fury and to shrug my shoulders in the most polite manner. (The other day I was shown—in Science of Computer Programming 23 (1994) 91-101 a Book Review by Egon Börger; it was a nice example of that fury.)

It is puzzling that a matter-of-fact technical review is experienced as “fury”. That D&S is perhaps problematic to such a degree as being beyond review is suggested to me by another passage from EWD1227:

I get completely confused as soon as logicians drag “models” and “interpretations” into the picture. I thought that the whole purpose of the creation of a formal system in which one can calculate was to create something independent of any model or interpretation.

The story so far presents us with a puzzle: with the shortest path algorithm, the Algol 60 compiler, the creation of high-level concurrent programming primitives, and the THE operating system, EWD appears as a titan going from triumph to triumph. In each successive new field, it was “Veni, Vidi, Vici”. And then the debacle of D&S.

I would like to offer the following explanation. Each of the triumphs occurred in virgin territory. If you are of EWD calibre, you can make a killing without the tedium of studying the literature because there is none. This was even true of the shortest path algorithm: although graph theory existed and had a considerable literature, the shortest-path algorithm is of interest as an instance of dynamic programming, which was in 1959 still too young to be required reading.

6. Mitigating circumstances

Until recently I only read the first paragraph of the Börger review, the one quoted above. I agreed: don’t buy this book. When I read further, my suspicion arose that there is more going on with D&S: it may not be just a badly written exposition of predicate logic. Badly written compared to what? What is predicate logic, anyway?

Both questions have the same answer, D&S is badly written compared to the expositions in the textbooks by Patrick Suppes [7], Elliott Mendelson [8], Joseph Shoenfield [9], Herbert Enderton [10], and Andrzej Grzegorczyk [11], a random selection occasioned by my undirected course in logic. These texts describe the same logic: first-order predicate logic. These are written in such a way that I could read them—which was not the case with D&S. Why then did Börger not just dismiss D&S for what it seemed to be to me? Why did he find fault with D&S for not referencing work by I.I. Shegalkin and by H. Hermes, authors of whom I had never heard?

6.1. Two varieties of predicate logic

It turns out that there are two kinds of predicate logic: the kind unanimously described in the texts listed above, which I will call “mainstream logic” and an exotic kind, which Börger calls “boolean term logic”. I had no idea such a thing existed, but Börger, a logician, apparently recognized it as the kind of logic of D&S.

If you are interested in boolean term logic, look elsewhere. But I will try to convey the sketchy idea of it that I picked up. The language of mainstream logic contains formulas and terms. The formulas assert relations that hold between individuals that are elements of a set called the universe of discourse. In the formulas the individuals are denoted by the terms of the language. A formula may be atomic, and then it consists of a predicate symbol and a smaller or larger number of terms, or it may be composed by means of connectives or quantifiers. A term may be simple, and then it is a constant or a variable, or it may be composed of a function symbol and a smaller or larger number of terms.

The purpose of mainstream logic is to formalize axiomatic theories concerning individuals interest: numbers, or elements of other algebras such as semigroups, monoids, and rings; to mention some mathematical examples.

This makes mainstream logic versatile: any set can be chosen as universe of discourse; any repertoire of function and predicate symbols can be chosen and any interpretation of these symbols can be chosen. Thus the universe of discourse can be infinite or finite. In the latter case it can consist of two truth values and the function symbols can be the operations of boolean algebra.

This interpretation creates an intriguing situation. Each of the connectives that makes formulas out of simpler formulas is mirrored by the function symbol for the corresponding boolean operation. Inferences consisting of sequences or trees of formulas are mirrored by sequences or trees of equations between boolean terms. Though it seems to me an abuse of the versatility of mainstream logic, it is apparently sufficiently intriguing for logicians like Shegalkin and Hermes to investigate.

So, the geniuses set out to discover logic from scratch, called it predicate logic, and it turns out to be the exotic variant. This is implausible, if one believes EWD’s official position that logicians have nothing of interest to tell him. If this is the case, then indeed there is a fifty/fifty chance of the coin falling one way (mainstream logic) or the other (boolean term logic). But if boolean term logic is what I think it is, then one must have some idea of mainstream logic to see that one can chose a domain for the variables that only contains truth values.

But now the plot thickens. EWD wrote the Foreword to “The Science of Programming” by David Gries (Springer-Verlag 1981), here referred to as “SoP”. SoP develops “predicate logic” from scratch, without references. Which would be fine if it were the mainstream variety, but it is the exotic boolean term variety souped-up with specific programming-oriented constructs, like the conditional connectives can and cor. Consumer Reports would advise its readers to avoid such a homebrew concoction. EWD must have been both attracted and repelled by the logic of SoP, which could have led to D&S.

Gries, a fan of EWD, understood that logicians would not approve of D&S, took the trouble to knock it into shape. Instead of being pleased and flattered, EWD issued a rebuke in the form of EWD1227.

Börger may well have been correct in seeing no redeeming features in D&S. Yet I see mitigating circumstances.

6.2. Dijkstra and Feijen

WPs were introduced as an improvement on Hoare’s verification method by means of preconditions and postcondition. Not being aware of anything to the contrary, I assume that Hoare envisaged conventional predicate logic as the language for the assertions acting as pre- and postconditions. This works fine for individual variables, but it is not clear how to make assertions involving arrays: Hoare verification makes new demands on logic.

An exception is “A Method of Programming” co-authored by EWD with Wim Feijen (1988, originally published in Dutch in 1984). This book has plenty of neat examples, typically involving arrays, all verified according to Hoare’s method, therefore all using predicate logic and handling arrays in a way worth paying attention to. I mention this to balance my critical remarks on DoP with this work that succeeds in everything it tries to do. Both DoP and Dijkstra/Feijen pay a great deal of attention to arrays. Having watched these struggles one can sympathise with those who want a clean break and start afresh with predicate logic.

7. Advice in Consumer Reports style

7.1. Don’t go for homebrew logics

D&S is not the only example; in general computer people seem to have a penchant for whipping up homebrew logics. See E.F. Codd’s Relational Calculus [12], an obvious mess. I hope that somebody like Börger has reviewed the logic of SoP and the logic of D. Parnas (who allows functions to be partial) [12a], or will do so. In contrast, Alain Colmerauer is to be appreciated as a wise exception. When he needed logic, he got a logician, Robert Kowalski, as a consultant. As a result the programming language Prolog has a pure version that is logically impeccable.

7.2. Don’t forget that logic is hard

Predicate logic has a long history, one that goes back all the way to G. Frege in the 19th century. Many clever people worked on it and several came to grief. Frege was shattered when Russell presented him with the Barber Paradox. In the early 1930s A. Church published a version of predicate logic that is subject to the Kleene-Rosser paradox. Incidents like this motivated A. Tarski to create a logic with a model-theoretic semantics so as to be paradox-proof. Tarski’s work dates from early 1930s, but see the helpful exposition in his “Truth and Proof”, an article that appeared in 1969 in Scientific American. Further reason to be cautious with new systems of logic is the fact that W. Quine’s system in “Mathematical Logic” was claimed to be inconsistent by Curt Christian, whereupon the proof was claimed to be erroneous by Church. So, for a while at least, the jury has been out in this case.

This history shows that when one wants to define program semantics by means of predicate logic one should be very careful. One way of doing that is to stick with the first-order predicate logic, mainstream variety, as presented by any of the undergraduate textbooks mentioned above. It is not only safer and but saves work compared to creating and publishing a homebrew system.

8. The way ahead

8.1. Lessons from the above

The main lesson is to take a hard look at WPs. EWD used them to define the semantics of the guarded-command language. He also pointed out that binary relations are not adequate for such semantics because they do not capture the phenomenon that two non-deterministic programs may agree on all finite computations but one may have infinite computations while the other does not. WPs may be useful if they are the only way to make that distinction. If not, then best forget them, as I yet have to see a convincing example of their use to derive programs from postconditions.

8.2. Hoare, but with assertions first

If WPs don’t help in finding programs, what then? One can lower one’s ambitions from formally deriving programs to mere goal-oriented programming by Hoare’s method of informally proving partial correctness. But the development can and should be made goal-oriented by writing the postconditions before the code that is to ensure that they are satisfied.

Although “A Method of Programming” by EWD and W. Feijen has the lowly status of course notes for an introductory course, it succeeds in presenting a convincing program development method, something lacking in DoP. That, and its assertions handling arrays with a logic of sorts, make D&F worth studying.

8.3. Tweak predicate logic, gingerly

But acknowledge that mainstream predicate logic offers no more than one-sorted logic with individual variables only. This is not adequate for writing assertions. A promising approach is “Formulas as Programs” by Apt and Bezem, [13], who propose to use predicate logic as programming language. They propose modifications of predicate logic that include arrays and multiple sorts for the variables. They manage to do this with minor modifications of mainstream predicate logic. It is encouraging that they do not belong to those who get

completely confused as soon as logicians drag “models” and “interpretations” into the picture.

In fact, they are logicians.

Coda

In my account so far, Structured Programming is just a term invented by EWD that caught on. It was widely interpreted as meaning programming without the goto statement. For EWD it meant a lot more. The year 1968, when the famous letter [14] appeared, marked a turning point in EWD’s career, witness the following quote from his retrospective EWD1308:

In 1968 I suffered from a deep depression, partly caused by the Department, which did not accept Informatics as relevant to its calling and disbanded the group I had built up, and partly caused by my own hesitation what to do next. I knew that in retrospect, the ALGOL implementation and the THE Multiprogramming System had only been agility exercises and that now I had to tackle the real problem of How to Do Difficult Things. In my depressed state it took me months to gather the courage to write (for therapeutic reasons) EWD249 “Notes on Structured Programming” (August 1969); it marked the beginning of my recovery.

His work in subsequent decades shows that he hoped that logic would help in solving the problem of How To Do Difficult Things. It turned out to be difficult to turn this hope into reality.

Acknowledgements

I am grateful to Paul McJones for discussions. This essay was propelled by Paul’s finding things, especially such gems as EWD1227. David Gries made several suggestions for improvement. Krzysztof Apt helped by pointing out my blind spot concerning self-stabilization and by making various suggestions.

References

[1] arxiv.org/abs/cs/0210001
[1a] Philip Frana: “An Interview with Edsger W. Dijkstra”. Communications of the ACM. 53 (8): 41-47.
[2] “Hints on Programming Language Design” by C.A.R. Hoare, Stanford Artificial Intelligence Laboratory, Memo AIM 224, December 1973.
[3] Knuth, Donald E. “Structured Programming with go to Statements.” ACM Computing Surveys (CSUR) 6.4 (1974): 261-301.
[3a] Dijkstra, E.W.: Self-stabilizing systems in spite of distributed control. Commun. ACM 17(11), 643-644 (1974)
[3b] Apt K.R., Shoja E. (2018) Self-stabilization Through the Lens of Game Theory. In: de Boer F., Bonsangue M., Rutten J. (eds) It’s All About Coordination. Springer Lecture Notes in Computer Science, vol. 10865.
[4] E.W. Dijkstra A discipline of programming Prentice-Hall, 1976.
[4a] “Guarded commands, Nondeterminacy, and Formal Derivation of Programs” by Edsger W. Dijkstra, Communications of the ACM, vol. 18 (1975), pp 453-457.
[5] E.W. Dijkstra and C.S. Scholten Predicate Calculus and Program Semantics Springer-Verlag, 1990.
[6] Science of Computer Programming vol. 23 (1994) pp 91-101.
[7] Introduction to Logic, D. Van Nostrand, 1957.
[8] Introduction to Mathematical Logic, D. Van Nostrand, 1964.
[9] Mathematical Logic, Addison-Wesley, 1967.
[10] A Mathematical Introduction to Logic, Academic Press, 1972.
[11] Outline of Mathematical Logic, Reidel, 1974.
[12] “Relational completeness of database sublanguages”, Research Report RJ987, 1972. IBM Research Laboratory, San Jose.
[12a] D.L. Parnas. “Predicate logic for software engineering.” IEEE Transactions on Software Engineering 19.9 (1993): 856-862.
[13] K.R. Apt and M. Bezem. “Formulas as programs.” pp 75-107 in The Logic Programming Paradigm. Springer-Verlag, 1999.
[14] E.W. Dijkstra “Letters to the editor: go to statement considered harmful.” Communications of the ACM 11.3 (1968): 147-148.

Conceptual Integrity: why it matters and how to get it

Maarten van Emden — Fri, 08 Sep 2017 22:41:48 +0000

In “No Silver Bullet” [1] Frederick Brooks addresses the intriguing question of why some programming languages garner fanatical adherents while others are merely tolerated by their users. Brooks’s answer is that the critical criterion is whether a language has in his words “conceptual integrity”. In this article I try to nail down this nebulous concept and see how it can be used as a guide in language design.

Differences in languages of course only matter if there is a choice, and this is the exception rather than the rule. Indeed, students in introductory programming courses are often warned against the misleading impression that in real life they get to write programs. “If you want to get paid, you will likely be debugging or modifying existing code.” In other words there is, in real life, for most computer science graduates, no use for what they have learned in university.

But, given the unsatisfactory state of currently installed software, it is to be hoped that in the future many programs will be written from scratch. This suggests having a critical look at currently existing languages. Such a critical look should eliminate languages that lack conceptual integrity. Like beauty, conceptual integrity is in the eye of the beholder. So when I exclude Java and Fortran, it only means that it is me who fails to see whatever conceptual integrity these languages may have.

When I started my first job, as a programmer, I had the good fortune of having to use an expressive language with a compact specification, namely Algol 60. My only previous experience was reading Daniel McCracken’s “A Guide to Algol Programming”, which helped me through the assignments in the numerical analysis course in university. I solved my problems by leafing through the book in search of something that might be similar to whatever it was that I needed.

In my job, which was at the Mathematical Centre in Amsterdam, things were different. In the first place the examples were better. They consisted of a bundle of a few dozen pages of source code for finely honed library procedures, typically less than a page each. Second, I was advised that I should address any questions I might have to Leo Geurts or Lambert Meertens. They would listen to my questions, but, instead of answering they would open the “Revised Report on the Algorithmic Language Algol 60” [2] (wherever you were at the Mathematical Centre, there always seemed to be a copy within reach), open it at the right page and point to the right place. When I learned to do this, I had graduated beyond copying examples; I had learned to think for myself. Such is the power of a language with conceptual integrity. A way to nail down this nebulous concept is to translate it to: that which makes it possible for an expressive language to have a compact specification.

My next example is Modula 3. I quote from pages 9 and 10 of the book [3], section 1.4.9 “Simplicity”:

In the early days of the Ada project, a general in the Ada Program Office opined that “obviously the Department of Defense is not interested in an artificially simplified language such as Pascal”. Modula-3 represents the opposite point of view. We used every artifice that we could find or invent to make the language simple.

C.A.R. Hoare has suggested that as a rule of thumb a language is too complicated if it can’t be described precisely and readably in fifty pages. The Modula-3 committee elevated this to a design principle: we gave ourselves a “complexity budget” of fifty pages, and chose the most useful features that we could accommodate within this budget. In the end, we were over budget by six lines plus the syntax equations. This policy is a bit arbitrary, but there are so many good ideas in programming language design that some kind of arbitrary budget seems necessary to keep a language from getting too complicated.

What about C? It falls woefully short of the criterion of compact specification, as its standard [4] has over five hundred pages. But in 1978, before there was a standard, all there was to know about C was contained The C Programming Language [5] a small volume consisting mostly of tutorial. I guess that the Reference Manual contained in it was at most the fifty pages it occupies in the second edition. So at one time, C was an expressive language with a compact specification. I don’t know what happened between that and the standard of ten times the size.

“Conceptual Integrity” may be implicit or explicit. The above examples qualify implicitly, implied by the small size of their specification. In the case of Algol 60 and Modula-3 there does not seem to be an underlying concept that you can put a name on. The other possibility is that the concept is identified explicitly. This still leaves open the possibility that the connection with the concept is tenuous and that it evolved over time.

Take for example Lisp. If ever there is a language with fanatical adherents, this is one. An example is the late and great Alan Robinson. He had welcomed Fortran as a God-sent gift after wrestling with assembler on the Univac of the Dupont company. He subsequently left industry and went on to discover the resolution principle for logic. Robinson was familiar with computer implementations of resolution, and found them sadly lacking the elegance of the principle itself. During a visit to Stanford, John McCarthy knocked on his door to show Robinson the resolution theorem-prover he had written in Lisp in a few hours.

A few years later some excited evangelists brought Robinson the news that resolution had led to a new programming language. Robinson was delighted to see some sample programs and an explanation of how resolution was used to execute them. The evangelists took it for granted that henceforth Robinson would program in Prolog. They were disappointed to learn that no, for Robinson Lisp was the one and only programming language and that it could not even be displaced by an elegant embodiment of his own brain child.

A rational reconstruction of the origin of Lisp would be that McCarthy intended to implement lambda calculus and that lists were secondary to this purpose. Actually lists were primary, as a vehicle for symbolic expressions. Of lambda calculus only the lambda notation was adopted, not the calculus. See McCarthy’s own account in [6].

To use functions as arguments, one needs a notation for functions, and it seemed natural to use the lambda notation of Church (1941). I didn’t understand the rest of his book, so I wasn’t tempted to try to implement his more general mechanism for defining functions.

In view of this history, it is surprising how much of Lisp can be construed as lambda calculus. But its dynamic binding of the free variables in lambda expressions does not conform to lambda calculus (nor does the program feature, nor do association lists or property lists). MacLisp, Interlisp, and Common Lisp can be construed as a lambda-calculus skeleton padded with a rich variety of ad-hoc features to create a useful tool.

The key to success in creating such tools is to maintain the right balance between the concept and the add-ons. The splitting up of Lisp into the variants mentioned above can be regarded as an indication that this balance had been lost. Scheme (the programming language) can be viewed as a rational reconstruction of Lisp that gives conformance to lambda calculus a higher priority and builds out from there.

Among the languages classified by Brooks as attracting fanatical adherents is APL [7]. It is a language with an identifiable concept. During the 19th century mathematics developed not only matrices, but also related formalisms such as quaternions, vector algebra, and tensors. Iverson showed that a common generalization covers a surprisingly large part of what application programmers in 1970s wanted to do.

Next to matrix algebra and lambda calculus, predicate logic is worth considering as candidate for the conceptual kernel of a programming language. By around 1970 every September in the United States at least a hundred courses would start, with textbooks such as [8,9,10] treating of first-order predicate logic as a formal language with a well-defined semantics. Many of the students, and maybe some of the instructors, may have thought formal logic to be as old as the hills, what with Euclid and his axioms. Actually, the logic described by these textbooks is not all that much older than computers: the syntax was consolidated in 1928 [11] and the semantics in 1933 [12]. And even with thousands (assuming an average of ten students per course) being introduced to the real thing, logic was slow in being used.

Take Euclid’s system of axioms. After being accepted as God-given for over twenty centuries, the discovery of non-Euclidean geometries not only vindicated Euclid’s fifth axiom, but also raised suspicions that the famous five left loopholes that admitted unintended geometries. By the time the dust had settled there was Hilbert’s system of twenty axioms [13]. Of course Hilbert’s axioms were informally stated: the required formal system was still decades away. And even after it appeared it became an object of study rather than a tool to be used.

When computers appeared it became a natural project to lodge a bunch of axioms in the memory of one and to use the processor to discover theorems following from it. It was disappointing to find that they were not interesting if they were easy to find. The challenge became to find proofs of known interesting theorems.

In the early 1970s Alain Colmerauer and Robert Kowalski discovered a setting in which easy-to-find theorems could be useful. They gave the ancient theorems-from-axioms paradigm a new twist: their axioms were analogous to declarations of procedures such as those found in programming languages and the theorems were analogous to results of calls to the procedures thus declared. From a mathematical point of view such “theorems” were trivial. They were concerned with things like sortedness of lists rather than with defining properties of geometries or algebraic structures like monoids or rings. Colmerauer and his group built a programming language around the idiosyncratic theorem-prover. The resulting language, Prolog, joined the select group of the few languages that gathered a group of devoted adherents.

Thus the first use of predicate logic as a programming language was an example of the use of logic to formalize axiomatic theories. This is not the only use of logic. An axiom is a sentence, that is, it is a formula without free variables. In logic programming the task is to find theorems, that is, other sentences logically implied by the axioms. Another use of logic is to formalize a constraint satisfaction problem. The result of such a formalization is a formula F of logic that is not a sentence, having a non-empty set V of free variables. Here the task of the computer is to find the values, if any, of the variables in V that make F true.

The term “constraint satisfaction problem” needs some explanation. It is an AI-sh term to denote the current state of a long development that started with the symbolic algebra invented in the 17th century and that is still taught to children in school. A typical task is to solve an equation, which is a pair of numerical expressions joined by an equality sign. From a logic point of view such an equation is a formula F with the equality sign as predicate symbol and the two expressions as arguments. The expressions are terms made up of constants, the variables in V, and function symbols denoting numerical operations.

In the beginning of algebra the variables ranged over the reals as universe of discourse and the repertoire of predicates was restricted to the equality symbol. That repertoire was expanded to include inequalities. Algebra was extended to include other universes of discourse: integers, complex numbers, vectors. Adding various ad-hoc small finite universes of discourse were considered in conjunction with disequality as predicate symbol. As a result constraint satisfaction problems expanded to include, for example, graph-colouring problems.

Both logic programming and constraint programming, in their different ways, use logic as a programming language. They both raise the question whether the language of the above-mentioned textbooks exhausts the possibilities of what can be given the semantics of Tarski. Apt and his various co-authors have investigated this in a series of papers [14,15,16]. In “Formulas as programs” [16] a programming language, Alma-0, is described. It is an extension of the language of logic that is inspired by programming languages. It provides arrays and logic versions of the for statement. As far as I know, these useful extensions to logic have not been proposed in the context of logic programming.

A motivation for Alma-0 was the fact that in logic programming repetition has to be implemented by recursion. In Prolog practice repetition is often implemented by a combination of a failure-driven loop and side effects. This is a nice example of how Prolog is a logical core padded out with whatever is necessary to make it into a useful programming tool. Still, Alma-0 is a valuable example of the alternative design strategy that consists of modifying logic itself in such a way that it can still can be given the Tarskian semantics and is also as efficiently implementable as the other Algol-like languages.

Alma-0 lacks recursion, an omission justified by the fact that adequate iteration primitives are available. Still, it seems a worthwhile project to investigate whether recursion can be fitted into the Formulas As Programs approach.

I have subdivided programming languages with conceptual integrity into two main categories: those where this valuable property is implied by the existence of a compact specification (Algol-60, pre-standard C, and Modula 3) and those where it is explicitly present as a formalism that is independent of computing (matrix algebra, lambda calculus, and predicate logic). I have devoted a disproportionate amount of space to the last of these. I hope to have convinced the reader that it is here that most of the unexploited opportunities lie.

Acknowledgments

Thanks to Paul McJones for pointing out the omission of APL in an earlier draft and for numerous other suggestions for improvement.

References

[1] “No silver bullet: essence and accidents in software engineering” by F.P. Brooks, Jr. IEEE Computer magazine, April 1987.
[2] Revised Report on the Algorithmic Language Algol 60, J.W. Backus, …, M. Woodger. Edited by Peter Naur. A/S Regnecentralen, Copenhagen, 1964.
[3] Systems Programming with Modula-3 edited by Greg Nelson. Prentice-Hall, 1991.
[4] The C Standard , by BSI (The British Standards Institution), 558 pages. Wiley, 2003. ISBN-13: 978-0470845738.
[5] The C Programming Language by Brian Kernighan and Dennis Ritchie. First edition, 228 pages. Prentice-Hall, 1978.
[6] “History of Lisp” by John McCarthy. Artificial Intelligence Laboratory, Stanford University, 12 February 1979. Published in History of Programming Languages edited by R. Wexelblat.
Academic Press, 1981.
[7] “Notation as a tool of thought” by K. Iverson. Communications of the ACM Volume 23 Issue 8, Aug. 1980, pp. 444-465.
[8] Introduction to Logic by P. Suppes. Van Nostrand, 1957.
[9] An Introduction to Mathematical Logic by E. Mendelson. Van Nostrand, 1964.
[10] Mathematical Logic by J. Shoenfield. Addison-Wesley, 1967.
[11] Grundzüge der theoretischen Logik D. Hilbert and W. Ackermann. J. Springer, Berlin, 1928.
[12] “Der Wahrheitsbegriff in den formalisierten Sprachen”, by A. Tarski. Studia Philosophica, 1, 1935: 261–405. Translation of “Pojecie prawdy w jezykach nauk dedukcyjnych” (1933) with a postscript added.
[13] Grundlagen der Geometrie by D. Hilbert. Originally co-published by B.G. Teubner, Leipzig, 1899 with Grundlagen der Electrodynamik by E. Wiechert under the title “Festschrift zur Feier der Enthüllung des Gauss-Weber-Denkmals in Göttingen, QA3 F4. Many re-issues of Hilbert’s monograph under the title “Foundations of Geometry”.
[14] “Arrays, bounded quantification and iteration in logic and constraint logic programming” by K. Apt. Science of Computer Programming, 1996, pp 133–148.
[15] “Search and imperative programming” by K. Apt and A. Schaerf. Proc. 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1997, pp 67–79.
[16] “Formulas as programs” by K. Apt and M. Bezem. The Logic Programming Paradigm, K. Apt, V. Marek, M. Truszczynski, and D.S. Warren, editors. Springer, 1999, pp 75–107.

“The unreasonable effectiveness of mathematics in the design and implementation of software systems”

Maarten van Emden — Sat, 22 Jul 2017 15:06:01 +0000

Alas! If only the current state of affairs were such that a future scholar would feel impelled to write a paper with this title, just like in 1960, when Eugene Wigner wrote his widely quoted “The unreasonable effectiveness of mathematics in the natural sciences”. In the hope that at some future time it will improve the sorry state of software, let us consider how mathematics came to be “unreasonably” (i.e. surprisingly, mysteriously) effective in the natural sciences.

Some of Wigner’s predecessors in physics did not consider the effectiveness of mathematics unreasonable. For example Galileo is often quoted with

Philosophy is written in that great book which ever is before our eyes—I mean the universe—but we cannot understand it if we do not first learn the language and grasp the symbols in which it is written. The book is written in mathematical language, …

By 1960, when Wigner wrote his paper, it was clear that Galileo was right, but it is a mystery why Galileo believed this, because in the same sentence he spoils it all by continuing with

… and the symbols are triangles, circles and other geometrical figures, without whose help it is impossible to comprehend a single word of it; without which one wanders in vain through a dark labyrinth.

The book is indeed written in mathematics, in symbolic form, but the symbols do not include triangles or circles. In the following pages I trace how mathematics went through a revolution, characterized by Michel Serfati as “The Symbolic Revolution” [1] spanning Galileo’s time. But the symbolic revolution did not come in time to empower Galileo; the first physicist to benefit was Isaac Newton, who was born around the time Galileo died.

The most conspicuous achievement of the Scientific Revolution (1600-1700) [2] is the infinitesimal calculus. A lot of people think that algebra is many centuries older, but in fact the plain old school algebra as we know it is not much older than the calculus. The invention of algebra was the big event. As John Allen puts it [3]

When we think of engineering mathematics, we usually think of the 17th century calculus of Newton and Leibniz. But their calculus is the frosting, not the cake in modern science. Without a symbolic language for general mathematical ideas, the originators of the calculus would have been hard-pressed to make their advances. The fundamental breakthrough occurred approximately one hundred years earlier with the invention of symbolic algebra.

In fact, we have to correct Allen a bit, as the main contributors to algebra were Viète with his Isagoge (1591) and Descartes with his La Géometrie (1637). Not only mathematicians were impressed by algebra, but even philosophers were bowled over:

They that are ignorant of algebra cannot imagine the wonders in this kind that are to be done by it; and what further improvements and helps, advantageous to other parts of knowledge, the sagacious mind of man may yet find out, it is not easy to determine. This at least I believe: that the ideas of quantity are not those alone that are capable of demonstration and knowledge; and that other and perhaps more useful parts of contemplation would afford us certainty, if vices, passions, and domineering interest did not oppose or menace such endeavours …. The relation of other modes may certainly be perceived, as well as those of number and extension; and I cannot see why they should not also be capable of demonstration, if due methods were thought on to examine or pursue their agreement or disagreement.

This is not Newton writing home to his mother, but John Locke in a book published in 1690 [4].

How did Locke know? Isaac Newton had written, but not published, samples of these algebraic marvels. The only work of Newton’s that was published before 1690, the year of Locke’s Essay … is Philosophiae Naturalis Principia Mathematica (“Principia”, 1687). The curious thing about this book is that Newton did not use algebra, but went to considerable trouble in translating the algebra into the circles and triangles familiar to Galileo, as if unwilling to reveal his secret weapon.

I agree with Allen that algebra was, compared to calculus, the main event. But calculus was the killer application of algebra, especially in the notation introduced by Leibniz. The effectiveness of mathematics in physics became even clearer in the 18th century with the development of analytical mechanics. But here the effectiveness would not be considered mysterious because analytical mechanics could just as well be considered part of mathematics itself.

The mystery sets in when the physics becomes driven by the mathematics. I only see this starting to happen later, with Maxwell’s equations for the electro-magnetic field. Maxwell started with Gauss’s laws about the nature of the electric and magnetic fields by themselves. Then he had Ampère’s law relating current with magnetic field and Faraday’s law relating current with change in magnetic field. When Maxwell, as the first, wrote these laws in the form of a system of interlocking equations, he noticed that the mathematically symmetric counterpart of Faraday’s law was missing. Thus, by making the system of equations symmetric, he added an equation, which turned out to be a law of nature.

Maxwell thought of his equations as describing tensions in an all-pervading medium, the “ether”. Contemporary physicists don’t believe in such things any more. The equations not only survive, but are enthroned in the Standard Model, which unifies all forces of nature except gravitation. Of course, here Maxwell’s equations have a new interpretation. As Weinberg remarks [5], the physics changes, but the equations survive.

Another example of Weinberg’s is the Dirac Equation. The short, slick version of the story is that in 1928 Dirac set out to find a version of the Schrödinger wave equation that is consistent with special relativity. A reason to apply the equation to electrons was the proposal of Goudsmit and Uhlenbeck to explain some mysteries in connection with atomic spectra. They proposed that electrons were spinning particles generating a magnetic moment. This would not only explain the observed structure of the spectral lines, but also explain Pauli’s exclusion principle. Problem was that classical mechanics predicted a gyromagnetic moment g = 1, whereas it would have to be g = 2 according to Dirac’s proposal. Moreover, the electrons would have to be spinning impossibly fast.

The Dirac equation predicted an electron with spin one half just as in the Goudsmit/Uhlenbeck proposal. It also predicted g = 2. So far so good. However. Dirac had set out to find a version of the Schrödinger wave equation that is consistent with special relativity. Dirac’s “equation” is actually a system of four interlocking equations. The successes mentioned were obtained from two of the four. Attempts to interpret the other two resulted in the absurdity of a particle with the properties of the electron, except that it would have the opposite (positive) charge. In 1932 Carl Anderson would find evidence of such a particle in tracks left by cosmic rays in a cloud chamber.

This looks like triumph upon triumph for Dirac’s equation. However, Weinberg remarks [5, p. 255] “The trouble is that there is no relativistic quantum theory of the kind that Dirac was looking for.” The happy ending is that Dirac’s equations are now part of the Standard Model, with a different interpretation. Meanings come and go; some formulas survive.

When Dirac was old and famous he said “A great deal of my work is just playing with equations and seeing what they give.” He wasn’t the first physicist to do so. In the 1890’s it was noticed that Maxwell’s equations for the electro-magnetic field are not invariant under the Galilean transformation, a sort of sanity check that was passed by Newtonian mechanics. This was a matter of mathematical aesthetics, not a problem of physics.

Yet physicists Fitzgerald and Lorentz, I presume out of curiosity, looked for a transformation under which Maxwell’s equation were invariant. This must have involved some serious playing with equations. But Newton’s F = m.a is not invariant under the Fitzgerald/Lorentz transformation. The next episode in this little saga was Einstein, not happy with electro-magnetical field theory and mechanics living in different worlds, wondering how to tweak the F = m.a of Newton’s second law so as to make it invariant under the Fitzgerald/Lorentz transformation. This led to the special theory of relativity, where algebraic play apparently played an important role.

Around the same time, Planck was tackling a mystery in physics, that of the spectrum of black-body radiation. At the low-frequency end the Rayleigh/Jeans formula gave a good fit to the data. At the high end Wien’s formula held sway. Away from these extremes both formulas gave nonsense. Planck’s discovery was a formula that coincided with both laws where they worked, but also over the entire spectrum. You don’t find such a formula by fitting curves to data; it can only happen by playing with equations. The amazing thing is that with the black-body radiation data of 1900 and Planck’s formula, one gets a good value for the constant h, the incredibly small number in Heisenberg’s uncertainty relation that explains so much in quantum mechanics.

I must not make it appear as if the new mathematics of the Symbolic Revolution was obediently marching along in the service of physics. In the 18th century, in the hands of the Bernoulli’s and Euler, mathematics becomes an exuberant romp through new territory, neither avoiding nor subservient to anything in the real world. An intoxicated d’Alembert wrote, I imagine reaching for a new sheet of paper to be filled with formulas, “Allez en avant, la foi vous viendra”, telling us to push ahead without worrying too much about what it all means.

I quoted Locke earlier as saying “They that are ignorant of algebra cannot imagine the wonders in this kind that are to be done by it; …” which, when published in 1690, could only have been a prophecy: how could he have known? One who could was Heinrich Hertz:

It is impossible to study this wonderful theory [Maxwell’s theory of the electro-magnetic field] without feeling as if the mathematical equations had an independent life and intelligence of their own, as if they were wiser than ourselves, indeed wiser than their discoverer, as if they gave forth more than was put into them [6, page 318].

Hertz knew, because he re-worked Maxwell’s equations into the unassailable form in which they have stood ever since.

Wigner’s phrase “The Unreasonable Effectiveness of Mathematics …” sounds too much like mathematics as a handmaiden of physics. As Weinberg claimed, the Great Equations have a life of their own. They stay, the physics comes and goes. In Plato’s cave analogy we mortals can only perceive imperfect shadows cast by the pure forms. The current interpretations are like the shadows. The difference is that the pure forms can be perceived by mortals: the prof just wrote one on the blackboard.

What is it in physics that makes it so susceptible to the applicability of mathematics? Can this property not be shared by the design and implementation of software systems? It may be objected that the latter is subject to many constraints, but then it should be realized that the effectiveness of mathematics in physics is so surprising because of the stringent constraints imposed by experimental outcomes.

Acknowledgements

I am indebted to Paul McJones for his willingness to review a draft and for pointing out that my first version of the final paragraph was wrong.

References

[1] La Révolution Symbolique: la Constitution de l’Écriture Symbolique Mathématique by Michel Serfati. Pétra, 2005.
[2] The Invention of Science: a New History of the Scientific Revolution by David Wootton. Allen Lane, 2015.
[3] “Whither software engineering” by John Allen. Workshop on Philosophy and Engineering. London, 2008.
[4] Essay of Human Understanding by John Locke. Quoted by H. J. Kearney, Origins of the Scientific Revolution (London, 1965), pp. 131ff.
[5] “How Great Equations Survive” by Steven Weinberg. In It Must Be Beautiful, edited by Graham Farmelo. Granta, 2002.
[6] Miscellaneous Papers by Heinrich Hertz; McMillan, 1896.

Is Software Engineering Possible?

Maarten van Emden — Wed, 07 Jun 2017 23:20:11 +0000

When the textile industry arose in the 18th century, craft was the norm in manufacturing. As the industrial revolution progressed, one after another sector of the economy made the transition from craft to industry. In 1968 it was noticed that the creation of software was a craft in a world where industry was the norm. In that year a conference was convened to address that anomaly. Those present saw themselves as participants in a momentous occasion: after this conference, Software Engineering existed, which was not the case before.

In the final paragraph of my 2009 article “Software Engineering: From Craft to Industry?” [8], I ventured to disagree. From the final paragraph:

While the processing of material leaves an irreducible residue of work for humans, in the processing of information any work that is routine instantly vanishes. Extracting the routine part from an information processing task is a creative endeavour. It is called programming. In the building of a software system any time you think you have something routine to be handed over to managed cubicle-dwelling drones, [9], you are missing an opportunity for automation. In the building of a software system there is only room for creative work. It is Craft, irreducibly so.

At the time I had read John Allen’s “Whither Software Engineering?”. I found it fascinating, but dismissed it as unrealistic and I was not convinced of its urgency. This article explains why I changed my mind.

The standard model in software development is, and has always been, to follow the test-debug cycle. I call it the standard model not because of any virtues, but because of the lack of alternatives, unless it would be to prove software correct, a notion universally rejected as utterly unrealistic (but about which more later).

The problem with the standard model can be expressed by a truism that is by now so old that nobody dares any more to utter it, or even to remember it. The problem with truisms is that some of them are true. Driven by today’s dire circumstances I’ll resurrect it here:

Testing can prove the incorrectness of code, but not its correctness.

Ever since the days of Robert Morris, Jr (he of the worm) and Clifford Stoll (he of “The Cuckoo’s Egg”) the (implicit) thinking has been: any time now the software/hardware substrate will be good enough to network everything in sight: student records, bank accounts, patient records, to mention just some of the more ransomable things. There is no need to wait. In the blind drive to network everything in sight, ever higher towers are being built without waiting for a foundation: teetering towers in a swamp.

Such was the spectre in front of John Allen’s eyes ten years ago when he wrote his remarkable paper [1], precariously published in an obscure corner of the internet. He faces the inescapable fact that the only alternative to the test-debug cycle is proof of correctness. He sketches certain advanced concepts in logic, advanced only in the sense that they are beyond the current undergraduate curriculum. According to these concepts, proofs can be constructive, constructive in the sense of also being programs in a suitable programming language. Allen mentions the programming language ML as an example.

“Whither software engineering?” [1] was presented at the 2008 Workshop on Philosophy in Engineering. My guess is that it has been read by few, if any, of those who are in a position to do anything about the parlous state of systems software. Those few have probably dismissed the paper as an extreme example of academic lack of realism: to redo operating systems as proofs in constructive type theory! Yet … I will let a recent paper [2] speak for itself:

FSCQ is the first file system with a machine-checkable proof that its implementation meets a specification, even in the presence of fail-stop crashes. FSCQ provably avoids bugs that have plagued previous file systems, such as performing disk writes without sufficient barriers or forgetting to zero out directory blocks. If a crash happens at an inopportune time, these bugs can lead to data loss. FSCQ’s theorems prove that, under any sequence of crashes followed by reboots, FSCQ will recover its state correctly without losing data.

The authors use the Coq proof assistant with a new variant of Hoare logic [3, 4]. The correctness properties are stated as theorems in the Coq variant of constructive type theory. The proofs are written by the authors and are checked by Coq running as proof assistant. The extraction feature of Coq converts the Coq programs to Haskell, the functional programming language. The resulting programs replace functions in the file system to run unmodified Unix applications such as vim, emacs, git, gcc, make, and a mail server.

There is considerable variation in performance among the various alternative file systems that can be used with Unix. Among the various benchmarks FSCQ is usually the slowest, though not by more than a factor two compared to the average of the other file systems. Thus the FSCQ project shows that Allen’s vision is technically realistic.

Is the FSCQ project the harbinger of a trend that might rescue the teetering towers? The six authors of [2] represent an unusual mix of skills, covering both systems programming and constructive type theory. The project suggests that logic can play the same role in software development as is played by mathematics in the established branches of engineering. It will take a long time before anything can happen on the required scale. The depth of change necessary make one doubt whether this is possible. Allen believes it is. To convince his readers he includes a sketch of the history of engineering.

Allen starts by reviewing the history of what is now called “software engineering”. The term was coined [5] at the conference mentioned earlier that was convened under the pressure of what was perceived at the time as “the software crisis”. It had been noticed that the problems encountered in the established branches of engineering were not nearly as severe as the ones in software development. Therefore, a new discipline called “software engineering” was called into existence, by fiat. Allen’s paper helps us understand the difference between engineering and the simulacrum thus called forth.

Established engineering programs require courses in calculus, physics, and linear algebra in the early part of the curriculum. This in spite of the fact that the students cannot yet see their use. These courses are required because the proven effectiveness of their content in engineering. For example, the behaviour of an antenna can be predicted with the theory of the electro-magnetic field, and this theory can only be understood with calculus and vector analysis. If, as often happens, a prospective “software engineer” is required to take calculus, then it is not because calculus can help to make better software.

It is not clear whether any science can help. It is only recently, with papers such as Allen’s and the one on FSCQ, that some inklings have appeared as to what kind of science can be helpful. It will take time before this clarifies and then it will take time before it is lodged in the curriculum as solidly as mathematics is in the established branches of engineering. Only then will software engineering deserve the name.

“Whither software engineering?” describes how it took a long time for the established branches of engineering to become based on mathematics. The first application of calculus in engineering may have occurred as early as 1742 with the publication by Benjamin Robins of New Principles of Gunnery. This book was adopted in the École Royale du Génie, the engineering school founded in the mid-eighteenth century in Mézières in France.

A hundred years after this, the use of mathematical methods was still controversial, at least in Britain. This was apparent when the first trans-atlantic telegraph cable was laid. The mathematical analysis by the physicist William Thomson indicated that the input signal could be of moderate voltage; the resulting weak output signal could be compensated by making the detector at the receiving end extremely sensitive. The Chief Electrician of the cable company dismissed Thomson’s mathematical analysis as “a fiction of the schools” that was contrary to common sense; common sense which dictated unprecedented high input voltages commensurate with the unprecedented length of the cable. Subjected to such an onslaught the cable failed within a few weeks [6]. Although this, and publications of William Thomson, led to the dismissal of the Chief Electrician, the battle continued for the minds of electrical practitioners between “the practical men” and “the theoretical men” till the end of the 19th century.

The École Royale du Génie, was founded in 1748. This was a step, possibly the first, of placing engineering on a mathematical foundation. Almost 150 years later the transition was not yet complete: in 1893 William Preece was inaugurated as president of the (British) Institute of Electrical Engineers. From his inaugural address:

True theory does not require the abstruse language of mathematics to make it clear and to render it respectable. … all that is solid and substantial in science and usefully applied in practice, has been made clear by relegating mathematical symbols to their proper store place—the study [7]. .

In the face of continuing increasingly disastrous failures, the practical men of today do not seem to be looking for an alternative to software that has as only credential that it has been around the test-debug cycle a number of times. Allen’s paper and the FSCQ system may offer hope for an effective alternative. Do they? If not, is software engineering possible?

Acknowledgements

Thanks to Paul McJones for pointing me to the FSCQ paper and thus providing the motivation to revisit [8].

References

[1] “Whither software engineering” by John Allen. Workshop on Philosophy and Engineering. London, 2008.
[2] “Certifying a File System Using Crash Hoare Logic: Correctness in the Presence of Crashes” by Tej Chajed, Haogang Chen, Adam Chlipala, M. Frans Kaashoek, Nikolai Zeldovich, and Daniel Ziegler. Comm. ACM, vol. 60, no. 4 (April 2017), pp 75–84.
[3] “An axiomatic basis for computer programming” by C.A.R. Hoare. Comm. ACM 12.10 (1969): 576-580.
[4] “Ten years of Hoare’s logic: A survey—Part I” by K.R. Apt. ACM Transactions on Programming Languages and Systems (TOPLAS) 3.4 (1981): 431-483.
[5] Report on a conference sponsored by the NATO Science Committee Garmisch, Germany, 7th to 11th October 1968.
[6] Oliver Heaviside: sage in solitude by Paul Nahin. IEEE Press, 1987, page 34.
[7] Journal of the Institution of Electrical Engineers, Volume 22 (1893), Address of the President, page 63.
[8] “Software Engineering: From Craft to Industry?” by M.H. van Emden, wordpress, 2009, http://tinyurl.com/n69ymao
[9] I am aware of the biologically unfortunate analogy: it should be “worker bees in cubicles”. The Oxford English Dictionary recognizes the figurative use of “drone”, but there it means a non-working member of the community. But “drones” as in “drones in cubicles” lodged itself in the contemporary idiom: on May 20, 2017 this search string registered 1430 hits on Google.

Programming by Gathering Snippets of Truth

Maarten van Emden — Sun, 14 May 2017 03:51:51 +0000

Thank you for your interest. Recent scrutiny has shown that this essay is below the standard I aim at and needs to be repaired.

Explosive Knowledge: Cryptology in the 20th Century

Maarten van Emden — Wed, 29 Mar 2017 03:13:34 +0000

In August 1960 the Pentagon announced that William Martin and Bernon Mitchell had not returned from vacation and said “there is a likelihood that they have gone behind the Iron Curtain”. On September 6 they appeared at a joint news conference at the House of Journalists in Moscow and announced they had requested asylum and Soviet citizenship. They revealed that they had worked for the National Security Agency (NSA). In this way the mission and activities of the NSA were made public for the first time [1]. Although these activities are much more wide-ranging than cryptology, this post will only be concerned with that small part.

All branches of knowledge had vigorously developed in the first half of the 20th century. All of it had been sustained by what I like to call a conversation: an open exchange of knowledge in books and journals. Before World War I this was also true for cryptology; afterwards, traffic on that channel fell silent. By the end of the 20th century the cryptology conversation was intense, wide-ranging, and immensely productive of innovations, of which bitcoin technology is but one example. In this post I trace the chain of events that led cryptology from its dark age, which lasted from 1918 to 1967, to its renaissance. My material is obtained, unless otherwise noted, from Crypto, a book by Steven Levy, published in 2001 [2].

The first of these events is the effect of the 1960 defection of Martin and Mitchell on David Kahn, a journalist for Newsday. Although Kahn was, like many others, an avid cryptology hobbyist and although as a journalist he kept eyes and ears open for anything to do with his pet subject, the existence of the NSA, as revealed by the Martin-Mitchell defection, was a revelation to Kahn.

After writing a background article for the New York Times Book Review, Kahn received offers from publishers to write a book. MacMillan, the one selected by Kahn, sent the manuscript to the Department of Defense for review. In his exposé of the NSA, The Puzzle Palace, James Bamford wrote that “innumerable hours of meetings and discussions, involving the highest levels of the agency, including the director, were spent in attempts to sandbag the book”. The reaction of the Department of Defense was that “publication would not be in the national interest”. When MacMillan did not respond by undertaking to refrain from publication, the director of the NSA met with the chairman of MacMillan, the editor, and the legal counsel to make a personal appeal for three specific deletions. Kahn considered these surprisingly inconsequential, and agreed. In return, the book was allowed to include the statement that it had been reviewed by the Department of Defense.

Kahn’s The Codebreakers [1] never became a bestseller, but sales remained steady for a long time. Its importance is due to the second link in the chain of events recounted here: it was found by the one person who desperately needed it and who was destined to change the course of the history of cryptology. That person was Whitfield Diffie

As a high-school student, Diffie had been fascinated by turning messages into cipher-encrypted mysteries. When he was an undergraduate at MIT, the aura of cryptology was eclipsed by the glamour of modern mathematics. When Diffie graduated in 1965, he found an effective way of evading the draft by taking a job at the Mitre Corporation, also in Cambridge, Massachusetts. His supervisor was a mathematician named Roland Silver. The work was for a project jointly undertaken with the MIT AI Lab, which became Diffie’s work location. This was the time when computer time-sharing systems were still experimental. But they were also used. CTSS, one of the systems, required users to have passwords. Many were opposed, with the result that the password file, in care of the system administrator, kept being hacked. Another timesharing system, confined to an inner circle of hackers, was called ITS, for Incompatible Timesharing System (CTSS stands of Compatible Time Sharing System), did not require passwords: every file was accessible to anyone.

Diffie was strongly in favour of privacy, but was not satisfied with CTSS where he had to trust his password to the system administrator. This reminded him of his boyhood hobby, cryptography. But this only tells you how to encrypt your files. If you want to share these files with someone else, you need to share the key, which could not be done securely in CTSS.

When Diffie discussed this problem with his boss, it transpired that a lot more was known about cryptology than was familiar from the hobbyist literature. Silver could infer this much, without being party to any indiscretions, from his contacts at NSA. Diffie was hit as by a lightning bolt by the twin insights: Cryptography is vital to privacy, clear from his experience with computer time-sharing at the AI lab, and now: Crucial information is being withheld on purpose. In fact, this organization acted as if it were the sole proprietor of the relevant mathematical truths. Diffie was electrified by the challenge to rediscover enough of the mathematics to rescue privacy of computer users, a category of people that, Diffie felt, would soon include many more than the researchers of the AI lab.

By 1969 Diffie was approaching draft cut-off age, so no longer needed the shelter of a defense contractor like Mitre. Diffie found a job at John McCarthy’s AI lab at Stanford. It is hard to overrate McCarthy’s stature in computer science. As a fresh PhD in mathematics he had invented the concept of Artificial Intelligence. As a young faculty member at MIT he discovered/invented the unique programming language LISP and pioneered computer time sharing. At SAIL, the Stanford AI Lab, he presided over a wide range of eclectic path breaking projects. One of the new arrivals, Diffie, found himself in conversations with the boss where they explored concepts beyond file encryption, such as key distribution and digital signatures.

Neither McCarthy nor Diffie knew enough about cryptology to gain any idea about how such concepts could be realized. By 1972 Diffie had read The Codebreakers [1]. With his girlfriend Mary Fischer he crisscrossed the country in search of people who knew or could provide pointers. Kahn responded to his cold call with an invitation to visit and allowed him to copy some reports by William Friedman. A rare event occurred in 1974 when Bill Reeds conducted a seminar at Harvard on cryptology. Being back in Cambridge led to new contacts, like with Bill Mann working on cryptography at BBN on a contract for the ARPAnet. Larry Roberts, the leader at ARPA of this project, had been rebuffed when he approached NSA for help with the necessary cryptography.

Inquiries led to Alan Tritter, a researcher at IBM knowledgeable about Identification Friend or Foe (IFF) devices. The way these devices used encryption nudged Diffie a bit closer to his later joint breakthrough with Hellman. Tritter pointed Diffie to his colleague at IBM, Horst Feistel, who had spent years of research on IFF when at Mitre. It turned out that Feistel had left early to spend the weekend at Cape Cod. Next stop Alan Konheim, the head of the mathematics group. Konheim knew a lot. For such people it is hard to know what they can say, so he said nothing. As a consolation prize Diffie got the suggestion to get in touch with one Martin Hellman, who had worked briefly in the IBM lab.

As it happened, Hellman was at Stanford. Everything fell in place: Hellman and Diffie got on like a house of fire, Diffie and Fischer got to live in the house of McCarthy, who left for a year’s sabbatical. The next year, 1975, Diffie and Hellman got their breakthrough: public-key cryptography was born.

When Kahn started his research in the New York Public Library in 1961, there was a lot to catch up on. Just at the time when publication failed to resume after World War I, a spate of inventions came to fruition. In 1919 Gilbert Vernam was granted a patent on an encrypting teletype, soon enhanced to the truly unbreakable one-time tape method. Independently, four inventors patented rotor machines: Arthur Scherbius (Germany) 1918, Hugo Koch (the Netherlands) 1919, Arvid Damm (Sweden) 1919, and Edward Hebern (US) 1921. Several of these names are associated with multiple patents; the simplest account is in Friedrich Bauer’s book [3]. Given these inventions, the combination of Vernam and rotors was but a small step.

With these breakthroughs the balance between code making and code breaking was gone. Nobody had any idea how to break messages encrypted by rotor machines. Moreover, these machines operated at greater speed and accuracy than the manual methods they replaced. Before World War II, research started on the analysis of rotor machines in Poland and in the US. The work of the Polish group escaped to England just before the German assault on Poland in September 1939 (look under “Rejewski” in [1]). The British started a massive code-breaking operation at that time. Primed by the Polish material and the efforts of top mathematicians such as A.M. Turing and I.J. Good, the British became, in deep secrecy, the most advanced in breaking traffic encrypted by rotor machines. Included in the Polish legacy was the use of “bombes”, mechanical devices for automatically trying out large numbers of hypothetical rotor settings.

Developments in the US between the wars were different, mainly due to one person, William Friedman. He was probably by far the most powerful cryptanalyst in the world. He worked for the military starting in the 1920s. In the 1930s he assembled a small group of well-trained people. By the time the war started in Europe this group was reading traffic encrypted with PURPLE, a rotor machine, the highest grade cipher of the Japanese. The contrast with the British effort is stark: no help from the Poles, without mechanical aids, and with only a small group of people.

The post-World War I developments were not secret in the sense of the Secrecy Act in the UK that was to keep the work in Bletchley Park hidden from view. In 1930 rotor machines were for sale by the owners of the Damm and Scherbius patents. These companies may have advertised the excellence of their methods, but not their substance; it was up to qualified organizations to get in touch and it was they who would be briefed.

Vernam was granted a patent in 1919 for a “Secret Signalling System”. The idea is that one can modify a teletype to transmit the exclusive OR (XOR) of two tapes, one containing the message and the other containing the key. At the receiving end an identical key tape is mounted and combined by XOR with the received encrypted message giving as result the message in the clear. When the key tape is random and has never been used, the Vernam system is secure. The Vernam patent was public, as intended by the founding fathers. Yet its description in The Codebreakers contributed to making this book a dangerous one from the point of view of NSA.

It may well be that Kahn’s book contained the state of the art when its first edition was published. This is a remarkable feat for a book aimed at the general public. The next publication to help end the dark period of cryptology was also aimed at the general public: in May 1973 Scientific American published “Cryptography and Computer Privacy” by Horst Feistel [4]. This article described the first advance in cryptography since 1919: the block cipher. The present encryption standard, AES, is a refined and scaled-up version of the device described by Feistel.

In 1934, when Horst Feistel was twenty years old, he immigrated into the US from Germany and started his studies at MIT. The 1941 declaration of war by by Germany on the US turned Feistel into an enemy alien; he was placed under house arrest. This meant he could move around Boston, but needed permission to visit his mother in New York. On January 31, 1944 his fortunes changed abruptly: the restraints were lifted and he became a US citizen. The next day he was given a security clearance and began work at the Air Force Cambridge Research Center [5].

Feistel had been interested in cryptography since his teens and mentioned this shortly after arriving at his new job. After a few years he had built a cryptography research group at AFCRC. According to [5] “over a period of several years it made a major contribution to modern cryptography, developing the first practical block ciphers”. They believe that it was the NSA who succeeded in shutting down the cryptographic work at AFCRC. The same fate befell Feistel’s attempts to set up a cryptographic group at the MIT Lincoln Lab and at Mitre Corporation, where Feistel moved next. Only when he was hired by IBM Research around 1970, could he pursue without interference his lifelong interest, cryptography.

When Feistel’s article appeared in 1973, it was only the second publication on the subject, after Kahn’s book, since cryptology entered into it dark age fifty years earlier. Soon after, something happened that put cryptography into the centre of the limelight: the 1975 promulgation by the National Bureau of Standards (NBS, now NIST for National Institute of Standards and Technology) of DES, the proposed federal data encryption standard. It turned out that shortly before, NBS had published a competition for an encryption standard. Apparently in that short period entries had closed, had been evaluated, and DES, IBM’s entry had been declared the winner.

This raised several questions:

Who were the other entrants in the competition?
What did their entries look like?
Where did NBS get the expertise to evaluate their designs?
What, if any, was the role of NSA?
Why was the key shorter than the 128 bits of the “Lucifer” cipher described in Feistel’s Scientific American article?
A natural key length for a little brother of Lucifer would have been 64 bits. Why was the key shortened further to 56 bits? (The shortening of “only 8 bits” amounted to the significant reduction in the work factor by a factor of 256.)
What was the rationale behind the random-looking wiring of the S-boxes and the P-boxes? (Here we use “wiring” in the same metaphoric sense as “boxes”.)

Speculation was rife that the whole thing had been rigged between IBM, NBS, and NSA. Grist to the mill of an investigative journalist, who appeared in the form of Steven Levy, whose articles became his Crypto published in 2001 [2]. Some of the questions, though not all, were answered to my satisfaction by his findings.

First a bit of background. Around 1970 banks had become increasingly in need of cryptography, what with interbank funds transfer and automated clearing by telex. They needed guidance, which, in the absence of public research in cryptology, could only be supplied by NSA. They needed standardization: banks did not want to have to rely on in-house research groups and were not interested in competing on security. Only NBS could provide a standard.

As it happened, it was not some Bankers’ Association that set the process in motion, but the company who supplied most of them with technology: IBM. And within IBM it was a contract with Lloyd’s bank of London to provide automatic teller machines [5, p. 66]. Strong encryption was essential, and IBM was on its own. The only expertise existed in NSA, which had probably plenty strong encryption algorithms. But all this was classified, so could not be put in the hands of uncleared users. NSA declined to design a new, unclassified algorithm, possibly concerned that such an algorithm would reveal their design philosophy.

A group in IBM in Kingston, New York, headed by Walter Tuchman got the task of developing the algorithm. He learned about Lucifer on a visit to IBM research in Yorktown Heights and decided to use it, but adapted to the constraints imposed by the need to implement the algorithm in a compact hardware unit. The resulting algorithm became known as DSD-1, which IBM decided to enter in the NBS competition for the federal encryption standard. No matter that the deadline had passed: a call from the right person to the head of NBS sufficed to get the call for entries in the competition re-issued and to get IBM’s DSD-1 accepted.

NBS passed DSD-1 on to NSA, which summoned Tuchman and presented him with a list of demands amounting to the creation of a virtual annex of NSA within IBM to which all further work was to be confined. IBM has no choice in the matter if it were not to abandon the whole project: deployment of the technology would require export licenses. Ergo, by twisting IBM’s arm, DES (which is how DSD-1 was renamed), the federal Data Encryption Standard was finalised in a process under control of NSA.

Let us summarize by reviewing the above questions.

As far as I know there were no other entrants. This is plausible, because the first call for entries closed without attracting any. By finding and employing Feistel IBM was a pioneer in cryptography at the time.
Not applicable; see above.
There was no cryptological expertise at NBS.
The absence of expertise at NBS makes it clear that it was decided a priori that NSA would be involved.
Hardware constraints.
Tuchman gave as reason the need for eight parity bits. Levy quotes several sources who find this implausible.
None even attempted. There was plenty of opportunity for NSA to modify the wiring.

As of 1975 there were still only Kahn (1967) and Feistel (1973) as lone harbingers of the end of the dark age in cryptology. On 17 March 1975, the proposed DES was published in the Federal Register. Public comments were requested; plenty were received. A furore arose about the opaqueness of the process and the eight missing bits of the key. IBM’s failure to provide a rationale for the wiring of the S-boxes caused critics like Diffie and Hellman to let paranoid interpretations run wild. The Washington Post and the New York Times provided plenty of coverage.

1975 was also the year that Diffie and Hellman got their breakthrough in public-key cryptography, as I noted earlier. The concept was published the next year as “New Directions in Cryptography”, IEEE Transactions on Information Theory, November 1976. It was only the concept; no implementation was provided. Ronald Rivest, Adi Shamir, and Leonard Adleman published one in April 1977 as an MIT technical report. It was what has since become famous as RSA public-key cryptography.

The authors took the unusual step of sending a copy of the report to Martin Gardner, who ran the “Mathematical Games” column in Scientific American. Gardner was in the habit of ending his column with some homework for his readers, with feedback in the next issue for selected successful solutions. For the RSA column, which appeared in the August 1977 issue, the puzzle was to solve a brief message encrypted with RSA. Because this time not all of the needed details were in the column, readers were invited to send a stamped, self-addressed envelope to MIT to receive a copy of the report.

Thousands of such requests arrived from all over the world. Before R, S, or A could organize an envelope-stuffing party, things started happening. The program for the IEEE International Symposium on Information Theory at Cornell University, scheduled for October, featured a presentation of the RSA work. IEEE received a letter from one Joseph A. Meyer, not identified by any affiliation, but with a home address and a member number, expressing concern about some of the presentations announced. This was the first time that academics heard of ITAR, the US International Traffic in Arms Regulation, of the fact that cryptographic devices were classified as munitions. Not only devices were deemed to be munitions, but also information facilitating them. And presenting such information in the US with non-US nationals present amounted to export. Violations of ITAR could result in fines, arrests, or jail terms.

Thanking Mr Meyer for the timely warning, IEEE took the position that, as long as they notified the presenters, this was not their problem. The notifications went out. In addition to pondering whether it was prudent to present new work in cryptography with non-US nationals in the room, MIT was presented with the fait accompli of a non-US national, in the form of Adi Shamir, not only having been in the room, but being one of the creators of the new work. And what was to be done about the envelope-stuffing party? Include relevant sections of ITAR? The 35-cent stamps provided by Scientific American readers were not going to be enough.

The administrations of MIT and Stanford decided to stick their necks out and assured the scheduled speakers that they would provide legal defense if needed. In their turn, the speakers decided to stick their necks out and decided to ignore the Meyer letter. The Cornell meeting went ahead as scheduled in October. In December of 1977 the envelope-stuffing party took place, with pizza (as reported in [2]) and beer (as imagined by me). None of the readers solved the message. By the time it was solved, decades later, column was no more, and none of R, S, and A could remember what the message was.

The new flood of publication in cryptology had started and has continued unabated to the present day. What has also continued, at least for the period covered in Levy’s book, was harassment. This took several forms.

ITAR. As an example, Philip Karn obtained permission under ITAR to export Bruce Schneier’s 1994 book Applied Cryptography, but not for the accompanying floppy disk, which contained no material that is not in the book.
Secrecy orders. George Davida had applied for a patent for a stream cipher. A Secrecy Order was placed on it. This implied two things: a patent would not be issued and the application and related research was deemed to be secret, with violation subject to fine, arrest, or jail term. Davida’s patent was based on his own research and had not relied on any classified material. The same was true of Carl Nicolai’s invention, the “Phasor Phone”. A Secrecy Order was also placed on his patent application.
Interference with funding. Adleman, the A of RSA, found that his application to NSF was approved, except with the note that NSA would supply the funds.
Retroactive classification. John Gilmore discovered works on cryptanalyis by William Friedman available in publicly accessible libraries and obtained copies of them. He was notified by the government that any further distribution would violate the Espionage Act, so could result in a fine, arrest, or a jail term.

In all these cases the government backed down, but only after a vigorous campaign by the victims, which involved paying for lawyers, engaging the media, and writing letters to representatives in congress.

Those who continue research in the field profit from these successful counter actions.

References

[1] The Codebreakers by David Kahn. MacMillan, New York, 1967; revised edition, Scribner, New York, 1996.
[2] Crypto by Steven Levy. Viking Penguin, 2001.
[3] Decrypted Secrets by F.L. Bauer. Springer-Verlag, Berlin-Heidelberg, 1997.
[4] “Cryptography and computer privacy” by Horst Feistel. Scientific American 228 (1973): 15-23.
[5] Privacy on the Line: the Politics of Wiretapping and Encryption by Whitfield Diffie and Susan Landau. MIT Press 1998; second edition 2007.
~

Children of the Miracle: from Algol to Prolog

Maarten van Emden — Sat, 18 Mar 2017 21:49:17 +0000

The appearance of Fortran inaugurated a fruitful period in programming languages that was to last until the early 1970s. When, in 1999, E.W. Dijkstra gave the keynote address at the ACM Symposium on Applied Computing in San Antonio, Texas, he gave an overview of what he saw as the large-scale trends in the preceding half century. I quote:

And then the 60s started with an absolute miracle, viz. ALGOL 60. This was a miracle because on the one hand this programming language had been designed by a committee, while on the other hand its qualities were so outstanding that in retrospect it has been characterized as “a major improvement on most of its successors” (C.A.R. Hoare).
…
Several friends of mine, when asked to suggest a date of birth for Computing Science, came up with January 1960, precisely because it was ALGOL 60 that showed the first ways in which automatic computing could and should and did become a topic of academic concern. [1]).

Algol was a miracle as a language. It was short-lived, but it left a momentous legacy that acted in two ways: in the way the Revised Report on Algol 60 describes the language and in the way subsequent language designers were influenced by being shown what a programming language could be. In celebration of Algol 60 I refer to these designers as “Children of the Miracle”.

The first Children of the Miracle were the members of the Simula team. Although that language quickly followed Algol 60 into oblivion, its distinguishing feature, classes, survived as object-oriented programming in the hands of Bjarne Stroustrup [2] as his C++ language. Although that language is no longer among the most widely used object-oriented languages, it is very much alive.

The Simula team was exposed to Algol 60 pretty much in one place and at the same time. It is remarkable that three of the main contributors to Prolog implementation also had Algol as their formative experience, but independently, scattered in space and time. In the remainder of this article I give an account of how they were influenced by Algol.

In the case of Alain Colmerauer I rely on [3, 4, 5, 6]. In 1963 Colmerauer, as a new graduate student, joined a group at the University of Grenoble whose task was to build an Algol-60 compiler for the IBM 7044. Among the various available parsing techniques, the group was attracted to the method of Edgar Irons. Compared to the other recursive approach, that of Booker and Morris, the one of Irons was more generally applicable, but required non-deterministic choice among production rules. This was Colmerauer’s first brush with non-determinism.

After Colmerauer completed his PhD work, in parsing, he remained in Grenoble to work on two other topics. One of these was to implement an extension to Algol 60 according to a proposal by Robert Floyd for a general mechanism that is not restricted to such non-determinism as may arise in parsing and that can be added to any imperative language.

The other topic was the two-level grammar of Adriaan van Wijngaarden. Colmerauer wrote both a parser and a generator such grammars. Van Wijngaarden thought that going beyond context-free grammars is important for better definition of programming languages. Accordingly, these grammars were used in the definition of Algol 68. If context-free grammars seem inadequate for the definition of programming language, then they seem more obviously inadequate for natural-language processing.

At the University of Montreal, where he had moved to in 1967, Colmerauer implemented “Q systems”, a generalization of context-free grammars that has similarities to two-level grammars as well as to the type-0 grammars of the Chomsky hierarchy. Q systems can be used both in parsing and in generating mode. This property makes them attractive for natural-language translation: parse the source-language text to capture a semantic structure, generate target-language text on the basis of that structure.

The dual-mode property of Q systems makes them also an attractive choice for question-answering systems in natural language: parse the assertions to capture their information content, update the semantic structure, parse the question to retrieve information from the semantic structure, and generate answers on the basis of that information. Such a use suggests a specific kind of semantic structure, namely a sufficiently expressive logic. That is, view a question-answering system as a front-end to automatic theorem-proving.

When Colmerauer left Montreal in 1970 to take up a faculty position in the University of Aix-Marseille, he assembled a small group to develop a question-answering system in French. For the theorem-prover they selected J.A. Robinson’s resolution logic, possibly inspired by Cordell Green’s question-answering system QA3 [7] where Lisp is the language for the assertions and queries.

The most promising work in resolution theorem-proving was seen to be happening in the University of Edinburgh. Colmerauer invited Robert Kowalski for a brief visit in 1971, followed by a longer visit in 1972. The expectations of the Marseille group were met by what they learned about SL resolution, a new technique recently developed in Edinburgh. Beyond these expectations there was a surprise: Boyer, Kowalski, and Moore had noticed that positive Horn clauses play a pivotal role in resolution logic: in their presence SL resolution refrained from misbehaviour and these clauses can be read as context-free production rules. This suggested that the rules be expressed in positive Horn clauses, with SL resolution acting in parsing or in generating mode, as required by the question-answering system.

When the Marseille group learned these results, the relevance to their project was apparent: the Q systems, which were modeled on type-0 grammars, could be replaced by an SL-resolution theorem-prover based on productions in the form of positive Horn clauses. In spite of a resemblance to type-2 grammars, these logic grammars are more expressive because of the availability of parameters, reminiscent of the rules of the two-level grammars of Algol 68.

Thus the Kowalski visits resulted in a drastic re-design of the Marseille question-answering system. Instead of Q systems for parsing assertions and queries and for generating answers, with logic restricted to the semantic structures, it became logic for everything: an SL theorem-prover specialized for positive Horn clauses could parse, generate, and make inferences. This theorem-prover was only a step away from a general-purpose programming language.

Colmerauer got a grant for a year with the goal to produce a “man-machine communication system in French”. The strategy: first create the programming language, then write the required system. The group adopted “Prolog” as the name of the programming language. The most incisive account of the logic kernel of Prolog is Kowalski’s [8].

In 1975 the action moved to Budapest, Hungary, where Peter Szeredi completed his first Prolog implementation. As a student in mathematics, Szeredi had been programming since 1966. He started in Autocode on the Elliott 803 and used it to write assemblers for this machine. He next became involved in Algol 68: translated parts of the report into Hungarian. Szeredi is credited with discovering an error in the type system, which was later corrected by introducing “incestuous unions”.

As a result of his involvement with Algol 68, Szeredi became acquainted with the Compiler Definition Language (CDL), developed by Cornelis Koster, one of the authors of the Algol 68 report. CDL is closely related, via affix grammars, to the W-grammars in which Algol 68 is defined. As others did, Szeredi found CDL a congenial medium for software development and he used it for systems programs for a new Hungarian computer.

In 1975 the Fortran source code of the second Marseille Prolog implementation reached Hungary together with a few transparencies by David H.D. Warren explaining the main ideas of this interpreter. By the time another group in Budapest had overcome its problems in porting the Fortran code to the locally available machine, Szeredi had completed a new Prolog implementation written in CDL. He credits the similarity of CDL with Prolog for this quick success. See [9] for more information about Szeredi’s work in connection with Prolog.

As Algol plays such an important part in the causal chain connecting Szeredi’s early programming experience with his Prolog implementations, I count him among the Children of the Miracle.

Starting in 1976 Keith Clark took the lead in implementing a sequence of Prolog-like languages. These languages exploited the non-sequential semantics of Horn clauses and used control structures such as coroutines and guards or used parallelism. Let us look at Clark’s pre-Prolog computing experience.

After completing his undergraduate work in logic and philosophy, Clark continued his academic study by taking the Computer Science Conversion diploma course at Imperial College, London. There were no examinations; one graduated on the basis of a dissertation. At the start of the work he was handed a listing of an implementation of Euler on the IBM 7094, written in Algol 60. “Do something with this.”

Clark extended the language with list processing capabilities and liberalised the syntax to allow var declarations to appear anywhere in procedure bodies. In addition to introducing new primitives, he had to extend the BNF grammar of Euler, which associated abstract-machine code generation with most of the grammar rules, add extra abstract-machine instructions, and change the definition of its abstract interpreter for code sequences in reverse Polish. He ended up producing a usable implementation of the extended Euler. In this way Clark was exposed to a double dose of Algol 60: by understanding the language so as to be able to modify the implementation and by immersing himself in Euler, a language inspired by Algol 60.

The miracle that was Algol 60 exerted its influence through the language itself as well as via derivatives such as Simula, Euler, and Algol 68. For the Prolog pioneers these languages were the formative experience.

Acknowledgements

Thanks to Keith Clark, Paul McJones, and Péter Szeredi for help with this article.

References

[1] Edsger W. Dijkstra: “Computing Science: Achievements and Challenges” (EWD1284) http://tinyurl.com/znbzyd7
[2] Stroustrup, Bjarne. The design and evolution of C++. Addison Wesley, 1994.
[3] Cohen, Jacques. “A view of the origins and development of Prolog.” Communications of the ACM 31.1 (1988): 26-36.
[4] Cohen, Jacques. “A tribute to Alain Colmerauer.” Theory and Practice of Logic Programming 1.06 (2001): 637-646.
[5] Colmerauer, Alain, and Philippe Roussel. “The birth of Prolog.” History of programming languages—II. ACM, 1996.
[6] Kowalski, Robert A. “The early years of logic programming.” Communications of the ACM 31.1 (1988): 38-43.
[7] Green, Cordell. “Theorem proving by resolution as a basis for question-answering systems.” Machine intelligence 4 (1969): 183-205.
[8] Kowalski, R. “Predicate logic as a programming language”. Proc. of IFIP Congress ’74, pp. 569-574, North Holland.
[9] P. Szeredi “The Early Days of Prolog in Hungary”. ALP Newsletter, Vol 17 n. 4, November 2004.

Alain Colmerauer, January 24, 1941 – May 12, 2017.

We mourn a great scientist and a dear friend.

The Essence of Algol

Maarten van Emden — Sat, 12 Nov 2016 20:03:27 +0000

Thank you for your interest. Recent scrutiny has shown that this essay is not up to the standard I aim at and needs to be repaired.

Maarten van Emden, February 2020.

A Programmers Place

“AI”: what’s in a name?

History

What is “intelligence”, anyway?

Expert systems

Alien Intelligence

What is language?

What can we understand?

Alien Intelligence

References

History of “Structured Programming”

History of the goto statement

What is imperative programming?

Floyd’s verification method

Matrix Code

Coda

Acknowledgements

References

A Bridge too Far: E.W. Dijkstra and Logic

1. Introduction

2. Highlights

3. “A Discipline of Programming”

3.1. Neat algorithms

3.2. Guarded commands

4. Remarks on DoP

5. The logic debacle

6. Mitigating circumstances

6.1. Two varieties of predicate logic

6.2. Dijkstra and Feijen

7. Advice in Consumer Reports style

7.1. Don’t go for homebrew logics

7.2. Don’t forget that logic is hard

8. The way ahead

8.1. Lessons from the above

8.2. Hoare, but with assertions first

8.3. Tweak predicate logic, gingerly

Coda

Acknowledgements

References

Conceptual Integrity: why it matters and how to get it

Acknowledgments

References

“The unreasonable effectiveness of mathematics in the design and implementation of software systems”

Acknowledgements

References

Is Software Engineering Possible?

Acknowledgements

References

Programming by Gathering Snippets of Truth

Explosive Knowledge: Cryptology in the 20th Century

References

Children of the Miracle: from Algol to Prolog

Acknowledgements

References

The Essence of Algol