Job-control language, punched cards, and flowcharts are so much relics of the past that those who remember them are starting to die off. There are also research goals of that era that are long forgotten, not because they have been reached or found not useful, but because they are, well, forgotten. In this essay I consider the ancient research goal of *developing code and proof of correctness in parallel*. I find that flowcharts are a good starting point, provided one goes back much further than job-control language.

“Testing can prove a program wrong, but only a logical argument can prove it correct.” This is a powerful incentive to place sufficiently many and powerful assertions in the code and to prove that all of them are valid. Alas, it is harder to find those assertions than writing the code in the first place. For most people this sad fact relegates proof of program correctness to the land of laudable but unrealistic goals. Long ago E.W. Dijkstra took the sad fact as an indication that the proof should be developed *in parallel with *code. In this way, he claimed that the additional effort for the proof would be so much less that it would be offset by the time saved in debugging. Not only would programs be written in less time, but one would have a degree of confidence not achievable by any amount of testing.

The idea seems plausible, but one has to start somewhere. How do you prove correctness of the first step in the development of a program that turns out to require a dozen steps? How can each of these steps be a program by itself that can be proved correct? For a long time I thought that Dijkstra’s 1976 book “A Discipline of Programming” was a preview of the promised land by showing how to do this, not with assertions, but with guarded commands and weakest preconditions. However, a recent re-examination disabused me of that notion. Only the first few in the chapter of toy examples are derived from the weakest precondition. The remaining of the toy examples, and all the programs in the meaty chapters, are examples of programming with guarded commands. Though developed in an exemplarily systematic fashion, they are not derived from a specification. As far as I know neither Dijkstra nor anybody else has made progress towards parallel development of proof and code in the decades that have passed since.

All this misery and disappointment plays in the land of *imperative *programming. In the other kind of programming, the functional kind, things are different. A function in a functional program needs no specification: it is its specification. The specification does not need to be implemented: it is its implementation. Functional programming solves the correctness problem by removing the distinction between specification and code. Here I am of course simplifying the situation to the point of caricature: it all depends what one accepts as specification. A specification of a function that sorts a list should only have to say that the result of sorting is an ordered permutation. But if the functional program you write sorts in time nlog(n), then there is a difference between specification and code. In spite of the unwarranted simplification, functional programmers agree that the caricature “specification = code” is true to a sufficient extent to make functional programming more productive. It does help to be able to ignore computation states and it does help to think at a higher level of abstraction such as functions.

Does this mean we should abstain from imperative programming, where one has to think in terms of states? It seems that one can sweep states under the carpet for a long time in many parts of computer systems, but not all the time everywhere. That imperative programming cannot be totally replaced by functional programming. This is the conclusion of a long and technical paper by Sussman and Steele [0].

If we need to do imperative programming at all, then we might as well try to do it well; as well as functional programming. What is it that makes functional programming tick? It is the fact that there is a language-independent counterpart to the function definitions in the program: functions in the sense of mathematics. Now, suitable bits of imperative program do have a language-independent representation, namely as a *flowchart*. This is easy to forget, given the utter contempt in which flowcharts have been held for as long as anyone can remember.

Through the years I had been seeing remarks to the effect that flowcharts had been invented by von Neumann in the course of his involvement with the first computers in the 1940s. This seemed plausible, matching the anachronism that flowcharts are to anachronisms like vacuum tubes and mercury delay lines. Recently, possibly on the wave of nostalgia triggered by this year’s Turing centenary, I starting reading about computer history. Goldstine, in his “The Computer from Pascal to von Neumann”, gives an account of the writing of the report with von Neumann in which flowcharts were introduced. On page 267 Goldstine reports: “Again in the spring of 1947 the report was still not complete because of a missing notation.” Von Neumann, writing from Los Alamos, suggested a solution to the difficulty: “Let us introduce a new kind of box, called *assertion box*.”

This surprised me, as I had always assumed that assertion boxes in flowcharts were first introduced by Floyd in 1967 [1]. Reason enough, finally, to consult the report [9]. Yes, there are assertion boxes. But, no, there are no flowcharts. Instead there are *flow diagrams*, and they are different from what “flowchart” had come to mean by the time of Floyd’s paper. Among the five kinds of boxes in flow diagrams, assertion boxes and one other kind of box play the role of the assertion boxes of Floyd.

The details no longer matter. What does matter is that for Goldstine and von Neumann * the proof of correctness was regarded as an integral part of the program. *They do not seem to have considered presenting in isolation the part that is needed for transcription to machine instructions. Goldstine and von Neumann take for granted that program and proof of correctness have to be developed in parallel, something suggested as a desirable future possibility by Dijkstra twenty years later and something towards which no progress was made by him or, as far as I know, by anybody else since then.

Knuth and Pardo [2] trace the history of programming formalisms since 1947, which is the history of using the computer as intermediary in producing the machine code. For this it is necessary to have machine-readable input to the program that translates to machine-code. This machine-readable input started to be regarded as a programming language. My guess is that flow diagrams did not long survive export from Princeton; that a simpler notation was wanted. As Floyd ultimately showed, simplification is possible without loss. But the simplification practiced in the 1950s threw out the baby with the bathwater: only that part of flow diagrams was retained that translated to executable code. The assertions and the rest of the proof mechanism was dropped.

Soon after 1947 practice settled to the pattern:

- Flowchart,
- translate by hand to programming language, and
- translate by machine to machine code.

Knuth [6] would point out that this describes only the *creative* use of flowcharts: the initial use before the algorithm is clear in the programmer’s mind. Knuth distinguishes such use from the *expository *use, when the flowchart has to be an accurate description of the algorithm in its final form. Both uses underline the utility of a graphic, language-independent specification mechanism for algorithms.

Dijkstra’s 1968 denunciation of the goto statement caused most people to stop using flowcharts. This happened because the goto statement was the natural primitive for converting the two-dimensional format of the flowchart to the linear format of the programming language. What I propose here is that flowcharts are useful in bringing imperative programming up to the same conceptual level we find in functional programming. The argument will take several steps, as follows.

The above three-stage process is only valid to a first approximation. In the 1960s many programmers had jumped the gun so that the sequence became:

- write in Algol with if-then-else and while-do
- (if pressed by management) translate to flowchart

Was management being incurably obtuse, as usually implied, or is there some merit in a language-independent description?

After 1968, when flowcharts were relegated to the status of the dumb brother of the despised goto, a rearguard action continued in favour of graphical techniques; interesting examples are Johnston’s contour diagrams [7] and the Nassi-Schneiderman diagrams [8].

Functional programming has a declarative aspect and an operational aspect. The declarative aspect is that we can read a function definition in the code as a definition of a function in the mathematical sense. The operational aspect is that there is an evaluation algorithm that can be relied on to produce the function’s value for the arguments supplied by the programmer. The virtues of the evaluation algorithm are that the programmer does not have to write it and that the programmer can rely on it to have been implemented correctly. It would be nice if the programmer does not have to know how it works. This is only true to a first approximation: the programmer does have to know whether evaluation is strict or lazy.

Let’s compare this with imperative programming. Imperative programs have been characterized as *specifications of sets of computations*. “Specifications” may sound abstract, but computations still belong to the operational world. To verify a property of an imperative program is to prove a theorem about a set of computations. Proof belongs in the declarative world.

Are we to consider Fortran, Algol, C, etc as specification languages for sets of computations? If any of these languages is a good specification language for sets of computations, why are the others so different? Flowcharts have the merit of transcending such differences and for that reason deserve attention in the quest for a suitable specification language for sets of computations. This is the merit of Floyd’s work: he added to the flowchart as specification of a set of computations the possibility to prove theorems about the set. What is still lacking is the flowchart itself as a tractable mathematical object, the way that a function is.

These observations suggest a research project that disregards for the moment any quarrels about details of translating flowcharts to existing programming languages (use of goto’s and all that), and investigates the possibility to transform the flowchart to a mathematically tractable object that specifies a set of computations and permits theorems to be proved about it. The outcome of such a project can be found in [3]. That paper describes the final result, but does not give the historical background given here. The following paragraphs provide the connection.

A computation of any imperative program is a sequence of states. The states are the states of the abstract machine underlying (usually implicitly) the language definition. One reason for the differences between imperative languages are the differences between their abstract machines. An advantage of flowcharts as specification language for sets of computations is that its abstract machine abstracts from the differences between abstract machines caused by differences between programming languages.

A good candidate for an abstract machine for flowcharts is the finite-state machine (FSM). The state-transition diagram of an FSM has some resemblance to a flowchart. The nodes in the state-transition diagram represent the states of the FSM. In the flowchart the nodes also stand for states. But there is a difference: transitions in the flowchart often change values of variables and there is no counterpart of variables in the FSM. This suggests as abstract flowchart machine (AFM) a generalization of the FSM: the introduction of a new type of state. In the AFM the state has two components: a *control state* (the one that FSMs have) and a *data state*. Data states are vectors of the values of the variables manipulated by the flowchart. These vectors are indexed by the names of the variables.

Transitions in the AFM are not only a transition between control states, but also between data states. That latter transition can be mathematically characterized by a binary relation on the set of data states. Every edge has as label a binary relation between data states. The AFM has a transition diagram like that of an FSM, except that the directed edges are labeled by binary relations.

The AFM translates only the statement boxes in a flowchart, but not the test boxes. The choice is now: do we adapt the AFM to flowcharts, or do we adapt flowcharts to the AFM? Remember that the purpose of the enterprise is a framework for imperative programs that is as mathematically and conceptually tractable as that of functional programs.

An edge in an AFM labeled by a binary relation modeling an assignment statement has a counterpart in flowcharts: the statement box. But in an AFM there are only edges connecting pairs of nodes. A test box in a flowchart connects *three *nodes: an input node and two output nodes. This is a discrepancy between AFMs and flowcharts. As I regard the former more valuable, the latter have to adapt.

The key to this adaptation is the flexibility of binary relations between data states. One can compose them by relational composition (symbol “;”). They include the identity relation, the relation that does not change the value of any of the variables. They include *subsets* of the identity relation. These are used to model boolean expressions. Thus the test box “if b then S else T”, with input node P and output nodes Q and R, can be represented in AFM by a pair of edges: one from P to Q labeled with the binary relation B;S and one from P to R labeled with (¬B);T. Here B is the subset of the identity relation consisting of all pairs of states in which b evaluates to TRUE and ¬B is the subset of the identity relation consisting of all pairs of states in which b evaluates to FALSE.

AFMs are both a generalization and a simplification of flowcharts. AFMs have in common with FSMs that they are characterized by transition diagrams, which are usually thought of as directed graphs with suitably labeled edges. I prefer to look at these graphs as square matrices in which the rows and columns are labeled with the nodes of the graph and in which the labels of the edges of the graph are the elements of the matrix.

You may wonder why to make a big deal out of a difference in presentation, in this case between graphs and matrices. My reason for preferring matrices is that I think it natural to multiply matrices, but not natural to multiply graphs. I am familiar with matrices representing transformations over vector spaces consisting of vectors of real numbers indexed by integers. In the case of AFMs we have vectors of sets of states (those where the assertions of the AFM are true) indexed by the control states of the AFM. The powers of a code matrix characterize its computations. A vector of assertions that is verified by a code matrix turns out to be fixpoint of the matrix.

Best of all, matrix code supports parallel development of proof and program. The very idea of such parallel development harbours paradox: how can you prove anything initially when there is no program yet? The key lies in the peculiar kind of proof that Floyd introduced: the proof proves that *the program does nothing wrong*. Initially we have a code matrix that does nothing, so does nothing wrong (whatever the specification). This null code matrix is perfectly well-formed from the mathematical point of view.

We can then start by extending the code matrix so that it handles an easy case in such a way that it does nothing wrong. In [3] you may find two programming problems solved in this way, by easy steps starting from nothing.

In summary, we propose matrix code as a formalism for imperative code that supports parallel development of proof and code and combines the operational and declarative aspects in a mathematically satisfactory way. From a mathematical point of view a code matrix is a matrix of binary relations. As binary relations have the algebraic structure of a semiring, AFMs can be seen as matrices representing transformations in a vector space over semirings. These are known as *semimodules*. There are so many examples of semirings and semimodules in various parts of mathematics that these structures are of mathematical interest in their own right [4].

#### Acknowledgment

Thanks to Paul McJones for suggestions that helped improve this essay.

#### References

[0] “The Art of the Interpreter” by G.L. Steele and G.J. Sussman. MIT AI memo 453, 1978.

[1] “Assigning meanings to programs” by Robert W. Floyd. In J.T. Schwartz, editor, *Proc. Symp. Appl. Math.*, pages 19-32. American Mathematical Society, 1967.

[2] “The early development of programming languages” by D.E. Knuth and L.T. Pardo in * A History of Computing in the Twentieth Century; *Nicholas Metropolis, Jack Howlett, and Gian-Carlo Rota, editors, Academic Press, 1980.

[3] “Matrix Code: a Language for Parallel Development of Code and Proof” by M.H. van Emden. http://arxiv.org/pdf/1109.5416.pdf

[4] *Semirings and their Applications *by Jonathan Golan. Springer, 1999.

[6] “Computer-Drawn Flowcharts” by Donald E. Knuth. *Communications of the ACM*, vol. 6(1963), nr. 9, pages 555-563.

[7] “The Contour Model of Block-Structured Processes” by John B. Johnston. *ACM SIGPLAN Notices*, vol. 6 (1971), issue 2, pages 55-82.

[8] “Flowchart techniques for structured programming” by I. Nassi and B. Schneiderman. *ACM SIGPLAN Notices*, August 1973, pages 12-26.

[9] “Planning and coding of problems for an electronic computing instrument” Part II, volume 1, by H.H. Goldstine and J. von Neumann. Pages 80 – 151, volume V of *John von Neumann: Collected Works* A.H. Taub, editor. Pergamon Press, 1963.

June 25, 2012 at 4:39 pm |

“As far as I know neither Dijkstra nor anybody else has made progress towards parallel development of proof and code in the decades that have passed since”

Refinement calculus? Kaldewaij’s “Programming:The Derivation of Algorithms”? Morgan’s “Programming from Specifications”? Gries’s “The Science of Programming”? Feijen and van Gasteren’s “On a Method of Multiprogramming”? Numerous papers by J.M.Morris? Several “Mathematics of Program Construction” conferences? The work done by Hayes et al in Queensland? Countless EWDs by the Great Man himself?