A Programmers Place

Conceptual Integrity: why it matters and how to get it

In “No Silver Bullet” [1] Frederick Brooks addresses the intriguing question of why some programming languages garner fanatical adherents while others are merely tolerated by their users. Brooks’s answer is that the critical criterion is whether a language has in his words “conceptual integrity”. In this article I try to nail down this nebulous concept and see how it can be used as a guide in language design.

Differences in languages of course only matter if there is a choice, and this is the exception rather than the rule. Indeed, students in introductory programming courses are often warned against the misleading impression that in real life they get to write programs. “If you want to get paid, you will likely be debugging or modifying existing code.” In other words there is, in real life, for most computer science graduates, no use for what they have learned in university.

But, given the unsatisfactory state of currently installed software, it is to be hoped that in the future many programs will be written from scratch. This suggests having a critical look at currently existing languages. Such a critical look should eliminate languages that lack conceptual integrity. Like beauty, conceptual integrity is in the eye of the beholder. So when I exclude Java and Fortran, it only means that it is me who fails to see whatever conceptual integrity these languages may have.

When I started my first job, as a programmer, I had the good fortune of having to use an expressive language with a compact specification, namely Algol 60. My only previous experience was reading Daniel McCracken’s “A Guide to Algol Programming”, which helped me through the assignments in the numerical analysis course in university. I solved my problems by leafing through the book in search of something that might be similar to whatever it was that I needed.

In my job, which was at the Mathematical Centre in Amsterdam, things were different. In the first place the examples were better. They consisted of a bundle of a few dozen pages of source code for finely honed library procedures, typically less than a page each. Second, I was advised that I should address any questions I might have to Leo Geurts or Lambert Meertens. They would listen to my questions, but, instead of answering they would open the “Revised Report on the Algorithmic Language Algol 60” [2] (wherever you were at the Mathematical Centre, there always seemed to be a copy within reach), open it at the right page and point to the right place. When I learned to do this, I had graduated beyond copying examples; I had learned to think for myself. Such is the power of a language with conceptual integrity. A way to nail down this nebulous concept is to translate it to: that which makes it possible for an expressive language to have a compact specification.

My next example is Modula 3. I quote from pages 9 and 10 of the book [3], section 1.4.9 “Simplicity”:

In the early days of the Ada project, a general in the Ada Program Office opined that “obviously the Department of Defense is not interested in an artificially simplified language such as Pascal”. Modula-3 represents the opposite point of view. We used every artifice that we could find or invent to make the language simple.

C.A.R. Hoare has suggested that as a rule of thumb a language is too complicated if it can’t be described precisely and readably in fifty pages. The Modula-3 committee elevated this to a design principle: we gave ourselves a “complexity budget” of fifty pages, and chose the most useful features that we could accommodate within this budget. In the end, we were over budget by six lines plus the syntax equations. This policy is a bit arbitrary, but there are so many good ideas in programming language design that some kind of arbitrary budget seems necessary to keep a language from getting too complicated.

 

What about C? It falls woefully short of the criterion of compact specification, as its standard [4] has over five hundred pages. But in 1978, before there was a standard, all there was to know about C was contained The C Programming Language [5] a small volume consisting mostly of tutorial. I guess that the Reference Manual contained in it was at most the fifty pages it occupies in the second edition. So at one time, C was an expressive language with a compact specification. I don’t know what happened between that and the standard of ten times the size.

“Conceptual Integrity” may be implicit or explicit. The above examples qualify implicitly, implied by the small size of their specification. In the case of Algol 60 and Modula-3 there does not seem to be an underlying concept that you can put a name on. The other possibility is that the concept is identified explicitly. This still leaves open the possibility that the connection with the concept is tenuous and that it evolved over time.

Take for example Lisp. If ever there is a language with fanatical adherents, this is one. An example is the late and great Alan Robinson. He had welcomed Fortran as a God-sent gift after wrestling with assembler on the Univac of the Dupont company. He subsequently left industry and went on to discover the resolution principle for logic. Robinson was familiar with computer implementations of resolution, and found them sadly lacking the elegance of the principle itself. During a visit to Stanford, John McCarthy knocked on his door to show Robinson the resolution theorem-prover he had written in Lisp in a few hours.

A few years later some excited evangelists brought Robinson the news that resolution had led to a new programming language. Robinson was delighted to see some sample programs and an explanation of how resolution was used to execute them. The evangelists took it for granted that henceforth Robinson would program in Prolog. They were disappointed to learn that no, for Robinson Lisp was the one and only programming language and that it could not even be displaced by an elegant embodiment of his own brain child.

A rational reconstruction of the origin of Lisp would be that McCarthy intended to implement lambda calculus and that lists were secondary to this purpose. Actually lists were primary, as a vehicle for symbolic expressions. Of lambda calculus only the lambda notation was adopted, not the calculus. See McCarthy’s own account in [6].

To use functions as arguments, one needs a notation for functions, and it seemed natural to use the lambda notation of Church (1941). I didn’t understand the rest of his book, so I wasn’t tempted to try to implement his more general mechanism for defining functions.

In view of this history, it is surprising how much of Lisp can be construed as lambda calculus. But its dynamic binding of the free variables in lambda expressions does not conform to lambda calculus (nor does the program feature, nor do association lists or property lists). MacLisp, Interlisp, and Common Lisp can be construed as a lambda-calculus skeleton padded with a rich variety of ad-hoc features to create a useful tool.

The key to success in creating such tools is to maintain the right balance between the concept and the add-ons. The splitting up of Lisp into the variants mentioned above can be regarded as an indication that this balance had been lost. Scheme (the programming language) can be viewed as a rational reconstruction of Lisp that gives conformance to lambda calculus a higher priority and builds out from there.

Among the languages classified by Brooks as attracting fanatical adherents is APL [7]. It is a language with an identifiable concept. During the 19th century mathematics developed not only matrices, but also related formalisms such as quaternions, vector algebra, and tensors. Iverson showed that a common generalization covers a surprisingly large part of what application programmers in 1970s wanted to do.

Next to matrix algebra and lambda calculus, predicate logic is worth considering as candidate for the conceptual kernel of a programming language. By around 1970 every September in the United States at least a hundred courses would start, with textbooks such as [8,9,10] treating of first-order predicate logic as a formal language with a well-defined semantics. Many of the students, and maybe some of the instructors, may have thought formal logic to be as old as the hills, what with Euclid and his axioms. Actually, the logic described by these textbooks is not all that much older than computers: the syntax was consolidated in 1928 [11] and the semantics in 1933 [12]. And even with thousands (assuming an average of ten students per course) being introduced to the real thing, logic was slow in being used.

Take Euclid’s system of axioms. After being accepted as God-given for over twenty centuries, the discovery of non-Euclidean geometries not only vindicated Euclid’s fifth axiom, but also raised suspicions that the famous five left loopholes that admitted unintended geometries. By the time the dust had settled there was Hilbert’s system of twenty axioms [13]. Of course Hilbert’s axioms were informally stated: the required formal system was still decades away. And even after it appeared it became an object of study rather than a tool to be used.

When computers appeared it became a natural project to lodge a bunch of axioms in the memory of one and to use the processor to discover theorems following from it. It was disappointing to find that they were not interesting if they were easy to find. The challenge became to find proofs of known interesting theorems.

In the early 1970s Alain Colmerauer and Robert Kowalski discovered a setting in which easy-to-find theorems could be useful. They gave the ancient theorems-from-axioms paradigm a new twist: their axioms were analogous to declarations of procedures such as those found in programming languages and the theorems were analogous to results of calls to the procedures thus declared. From a mathematical point of view such “theorems” were trivial. They were concerned with things like sortedness of lists rather than with defining properties of geometries or algebraic structures like monoids or rings. Colmerauer and his group built a programming language around the idiosyncratic theorem-prover. The resulting language, Prolog, joined the select group of the few languages that gathered a group of devoted adherents.

Thus the first use of predicate logic as a programming language was an example of the use of logic to formalize axiomatic theories. This is not the only use of logic. An axiom is a sentence, that is, it is a formula without free variables. In logic programming the task is to find theorems, that is, other sentences logically implied by the axioms. Another use of logic is to formalize a constraint satisfaction problem. The result of such a formalization is a formula F of logic that is not a sentence, having a non-empty set V of free variables. Here the task of the computer is to find the values, if any, of the variables in V that make F true.

The term “constraint satisfaction problem” needs some explanation. It is an AI-sh term to denote the current state of a long development that started with the symbolic algebra invented in the 17th century and that is still taught to children in school. A typical task is to solve an equation, which is a pair of numerical expressions joined by an equality sign. From a logic point of view such an equation is a formula F with the equality sign as predicate symbol and the two expressions as arguments. The expressions are terms made up of constants, the variables in V, and function symbols denoting numerical operations.

In the beginning of algebra the variables ranged over the reals as universe of discourse and the repertoire of predicates was restricted to the equality symbol. That repertoire was expanded to include inequalities. Algebra was extended to include other universes of discourse: integers, complex numbers, vectors. Adding various ad-hoc small finite universes of discourse were considered in conjunction with disequality as predicate symbol. As a result constraint satisfaction problems expanded to include, for example, graph-colouring problems.

Both logic programming and constraint programming, in their different ways, use logic as a programming language. They both raise the question whether the language of the above-mentioned textbooks exhausts the possibilities of what can be given the semantics of Tarski. Apt and his various co-authors have investigated this in a series of papers [14,15,16]. In “Formulas as programs” [16] a programming language, Alma-0, is described. It is an extension of the language of logic that is inspired by programming languages. It provides arrays and logic versions of the for statement. As far as I know, these useful extensions to logic have not been proposed in the context of logic programming.

A motivation for Alma-0 was the fact that in logic programming repetition has to be implemented by recursion. In Prolog practice repetition is often implemented by a combination of a failure-driven loop and side effects. This is a nice example of how Prolog is a logical core padded out with whatever is necessary to make it into a useful programming tool. Still, Alma-0 is a valuable example of the alternative design strategy that consists of modifying logic itself in such a way that it can still can be given the Tarskian semantics and is also as efficiently implementable as the other Algol-like languages.

Alma-0 lacks recursion, an omission justified by the fact that adequate iteration primitives are available. Still, it seems a worthwhile project to investigate whether recursion can be fitted into the Formulas As Programs approach.

I have subdivided programming languages with conceptual integrity into two main categories: those where this valuable property is implied by the existence of a compact specification (Algol-60, pre-standard C, and Modula 3) and those where it is explicitly present as a formalism that is independent of computing (matrix algebra, lambda calculus, and predicate logic). I have devoted a disproportionate amount of space to the last of these. I hope to have convinced the reader that it is here that most of the unexploited opportunities lie.

Acknowledgments

Thanks to Paul McJones for pointing out the omission of APL in an earlier draft and for numerous other suggestions for improvement.

References

[1] “No silver bullet: essence and accidents in software engineering” by F.P. Brooks, Jr. IEEE Computer magazine, April 1987.
[2] Revised Report on the Algorithmic Language Algol 60, J.W. Backus, …, M. Woodger. Edited by Peter Naur. A/S Regnecentralen, Copenhagen, 1964.
[3] Systems Programming with Modula-3 edited by Greg Nelson. Prentice-Hall, 1991.
[4] The C Standard , by BSI (The British Standards Institution), 558 pages. Wiley, 2003. ISBN-13: 978-0470845738.
[5] The C Programming Language by Brian Kernighan and Dennis Ritchie. First edition, 228 pages. Prentice-Hall, 1978.
[6] “History of Lisp” by John McCarthy. Artificial Intelligence Laboratory, Stanford University, 12 February 1979. Published in History of Programming Languages edited by R. Wexelblat.
Academic Press, 1981.
[7] “Notation as a tool of thought” by K. Iverson. Communications of the ACM Volume 23 Issue 8, Aug. 1980, pp. 444-465.
[8] Introduction to Logic by P. Suppes. Van Nostrand, 1957.
[9] An Introduction to Mathematical Logic by E. Mendelson. Van Nostrand, 1964.
[10] Mathematical Logic by J. Shoenfield. Addison-Wesley, 1967.
[11] Grundzüge der theoretischen Logik D. Hilbert and W. Ackermann. J. Springer, Berlin, 1928.
[12] “Der Wahrheitsbegriff in den formalisierten Sprachen”, by A. Tarski. Studia Philosophica, 1, 1935: 261–405. Translation of “Pojecie prawdy w jezykach nauk dedukcyjnych” (1933) with a postscript added.
[13] Grundlagen der Geometrie by D. Hilbert. Originally co-published by B.G. Teubner, Leipzig, 1899 with Grundlagen der Electrodynamik by E. Wiechert under the title “Festschrift zur Feier der Enthüllung des Gauss-Weber-Denkmals in Göttingen, QA3 F4. Many re-issues of Hilbert’s monograph under the title “Foundations of Geometry”.
[14] “Arrays, bounded quantification and iteration in logic and constraint logic programming” by K. Apt. Science of Computer Programming, 1996, pp 133–148.
[15] “Search and imperative programming” by K. Apt and A. Schaerf. Proc. 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1997, pp 67–79.
[16] “Formulas as programs” by K. Apt and M. Bezem. The Logic Programming Paradigm, K. Apt, V. Marek, M. Truszczynski, and D.S. Warren, editors. Springer, 1999, pp 75–107.