<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>A Programmers Place</title>
	<atom:link href="http://vanemden.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://vanemden.wordpress.com</link>
	<description>Observations, Reviews, and Essays</description>
	<lastBuildDate>Wed, 11 Jan 2012 05:12:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='vanemden.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>A Programmers Place</title>
		<link>http://vanemden.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://vanemden.wordpress.com/osd.xml" title="A Programmers Place" />
	<atom:link rel='hub' href='http://vanemden.wordpress.com/?pushpress=hub'/>
		<item>
		<title>McCarthy&#8217;s recipe for a programming language</title>
		<link>http://vanemden.wordpress.com/2011/10/31/mccarthys-recipe-for-a-programming-language/</link>
		<comments>http://vanemden.wordpress.com/2011/10/31/mccarthys-recipe-for-a-programming-language/#comments</comments>
		<pubDate>Mon, 31 Oct 2011 10:24:42 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=451</guid>
		<description><![CDATA[On October 23, 2011 John McCarthy died. He is remembered as the inventor of Lisp, for contributions to the theory of computation, and for contributions to artificial intelligence. In this essay I argue that he did more than inventing Lisp; even more than originating the concept of high-level programming: he showed to those willing to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=451&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>On October 23, 2011 John McCarthy died. He is remembered as the inventor of Lisp, for contributions to the theory of computation, and for contributions to artificial intelligence. In this essay I argue that he did more than inventing Lisp; even more than originating the concept of high-level programming: he showed to those willing to see it <em>a recipe </em>for designing high-level languages. I support my argument by turning the clock back to 1950 and imagining how McCarthy&#8217;s recipe could have been followed at that point in time. By taking this unusual point of view I hope to bring out more clearly McCarthy&#8217;s contribution to programming languages.</p>
<p><span id="more-451"></span></p>
<p>How can it be relevant to go back so far in history when computing changes so fast? The impression of rapid development is entirely caused by the break-neck speed of hardware development. Compared to this, software moves but slowly. By the mid-1970s the main programming paradigms had been born: imperative, functional, object-oriented, and logic programming. Most of the languages in use today are descendants of these four paradigms. Paul Graham [Note HundredYearLanguage] explains the difference between hardware and software by observing that a programming language is not a technology but a notation. A notation reflects human thought habits and these change slowly.</p>
<p>Functional programming is regarded as a more or less pure form of the lambda calculus; logic programming programming as a more or less pure form of the predicate calculus. In this essay I imagine us back in 1950. Although the calculi existed at the time I imagine us (busy engineers) ignorant of such esoteric matters. But I do imagine us fired up by the invention of the <em>subroutine</em> so that we want to use this invention to the hilt. I will show that we would have invented the high-level <em>languages</em> in the form of subroutines acting on high-level <em>memories</em>.</p>
<p>To get started let me tell how I see the main protagonists of the story.</p>
<ul>
<ul>
<li>Imperative programming</li>
</ul>
</ul>
<p>All computation consists of the execution of commands. These are of two types: (1) those that change a state consisting of contents of memory registers and (2) those that transfer control. This branch of the evolutionary tree began with machine code and developed into more and more sophisticated assemblers, Fortran, Algol, and C. A side branch from Algol became Simula, Smalltalk, and later object-oriented languages, where the state is articulated into the separate states of dynamically created objects.</p>
<ul>
<li>Functional programming</li>
</ul>
<p>All computation consists of the evaluation of expressions of the lambda calculus. These are either constants or composite expressions specifying the application of a function to the values of its arguments. The program consists of function definitions and a single expression whose evaluation initiates program executions.</p>
<ul>
<li>Logic programming</li>
</ul>
<p>All computation consists of attempting to prove an instance of a formula (in the sense of predicate logic) with free variables. In case of success the result of the computation is available as a substitution for the variables. A program consists of definitions of predicates and a single formula to be proved.</p>
<p>Of course these brief characterizations omit a lot of detail. But it is remarkable that the rapid development of logic programming in 1971 – 1974 was from the beginning based on predicate calculus. The development of functional programming, which I reckon to have started with Lisp in 1959 and which only reached the ideal of functional programming with Scheme in 1975, followed a more tortuous route.</p>
<p>In 1958, when McCarthy embarked on his new programming language, he had decided on the use of lambda notation, but he ignored the lambda calculus itself. &#8220;To use functions as arguments, one needs a notation for functions, and it seems natural to use the lambda-notation of Church. I didn&#8217;t understand the rest of the book, so I wasn&#8217;t tempted to try to implement his more general mechanism for defining functions.&#8221; [Note McCarthyHistory]</p>
<p>Thus the Lisp implementation of 1959 was not intended to be an implementation of the lambda calculus; it is surprising how close it later turned out to be. When it became clear what an implementation of the lambda calculus could look like, Sussman and Steele discovered, by a roundabout route [Note SussmanSteele], that Lisp, modified by replacing dynamic scope with lexical scope, is an implementation of lambda calculus. Hence my dating the origin of functional programming at 1975, preceded by sixteen years of close approximations in the form of the various and widely used dialects of Lisp.</p>
<p>Landin had identified lambda calculus as the basis of a programming language (ISWIM) independently of, and earlier than, Scheme. Although Landin had supplied SECD as an abstract machine for it, I believe that the Scheme of 1975 was the first implementation of functional programming. The SECD machine is in one of the papers of Landin published in 1964 and 1966. I believe the first implementation based on it was Henderson&#8217;s Lispkit Lisp of 1980.</p>
<p>By the end of the century functional programming had a variety of widely used forms: Scheme, ML, Haskell, CAML. Yet all were recognized as being based on lambda calculus, a mathematical formalism that existed before computers.</p>
<h4>The discrete charm of the subroutine</h4>
<p>In spite of this overwhelming display of the power of abstract mathematics over programming I like to imagine how functional and logic programming could have been invented independently of lambda or predicate calculus before even Fortran existed. It could have happened like this. Just like E.W. Dijkstra launched a crusade against the goto statement in 1968, a similarly charismatic guru could have done this in 1948. Just as Dijkstra did in 1968, this guru could have written</p>
<blockquote><p><em>For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce.</em></p></blockquote>
<p>In 1948 the implied message would have been that the <em>closed subroutine</em>should be the first choice for transfer of control, with &#8220;closed&#8221; meaning that the subroutine only interacts with its environment through its parameters.</p>
<p>Turing&#8217;s ACE report of 1946 [Note TuringACE] proposes subroutines under the name &#8220;subsidiary operation&#8221;. What may be the first book on programming [Note WilkesWheelerGill] is organized around the concept of subroutine. As the book was published in 1951 I can imagine this early guru to have had the insight in 1948 that <em>programs be structured around subroutines:</em> that they should consist of subroutine definitions with the sole exception of a single call to initiate the entire program; that each subroutine definition should consist of subroutine calls embedded in a minimum of glue code for the marshalling of arguments in preparation of each call. Let us suppose that the guru would have called this <em>Principled Programming</em>.</p>
<p>There are two ways in which a closed subroutine can interact with its environment: modify the arguments (&#8220;procedure&#8221; in Algol or &#8220;void function&#8221; in C terminology) or cause a single result of the body&#8217;s computation to replace the subroutine call (&#8220;function&#8221; in Algol or &#8220;nonvoid function&#8221; in C). It is in the spirit of Principled Programming to require that a subroutine use only one of these interaction modes. It is in line with human nature that the Principled Programming movement would split up into two warring factions each of which insisting on the superiority of one of these modes: functional versus procedural programming.</p>
<p>When the first Fortran implementation was delivered in 1957, the main innovation was that of the <em>expression</em>. It was this feature that convinced McCarthy of the superiority of Fortran over IPL-V. To continue our minimalist fantasy, let us imagine that the concept of subroutine were only enriched with expressions. In functional programming this eliminates the need for any glue code between subroutine calls in a subroutine body. The body is changed to a single expression and the subroutines calls are changed to subexpressions. Program execution consists expression evaluation only. In effect the entire program consists of a single big expression, broken up into mentally digestible chunks by subroutine definitions. The process of subroutines calling subroutines is occasionally interrupted by a subroutine body consisting of an expression without subexpressions, a literal that directly denotes a value.</p>
<p>Expressions eliminate the need for glue code in functional subroutine programming, thus rendering it pure. The elimination of glue code would leave pure procedural programs with procedure bodies consisting only of procedure calls. In this way the chain of procedures calling procedures can only be stopped by procedures with empty bodies. This is not possible, because their execution would have no effect. Thus the advent of expressions makes functional subroutine programming pure, leaving procedural programming with the need for glue code. Among the devotees of Principled Programming, victory for functional programming.</p>
<p>Yet, with Prolog, pure procedural programming was realized. So there must be procedures with empty bodies, as indeed there are. But calling such procedures causes something to happen because Prolog cheats: when a procedure is called, body execution is not the only thing that happens; it is preceded by unification of the arguments of the call with the parameters of the definition. An argument in a function call is an expression typically containing an unbound variable. Unification with the corresponding parameter in the definition causes this variable to become bound. It is typically bound to an expression containing unassigned variables, making further bindings possible.</p>
<p>The variables of Prolog are called <em>logical</em> variables to indicate that they are created unbound and can only be bound once in the same computation thread. A binding can only be undone by backtracking to restore the computation state at the time of binding and continuing the thread some other way.</p>
<h4>Three types of memory</h4>
<p>Back to reality. Since 1948 imperative programming has evolved to an eclectic mixture of control constructs such as if-, case-, for- and while statements. Memory has differentiated into global, local, and heap memory. In spite or these differences, they are basically memory registers, albeit with a more convenient naming system than raw addresses would provide. The pure kinds of programming admit only the subroutine call as control construct and can be characterized as</p>
<pre>     FUNCTIONAL PROGRAMMING = SUBROUTINES + EXPRESSION MEMORY
          LOGIC PROGRAMMING = SUBROUTINES + RATIONAL-TREE MEMORY</pre>
<p>Here &#8220;rational-tree&#8221; is the name used by Colmerauer for expressions (that is, trees) in which the leaves are constants or single-assignment pointers to rational trees. Here we see that the two types of programming are made possible by assuming a suitable form of high-level memory, a different type for each type of programming. With this insight, let us review the birth of Lisp.</p>
<h4>Graham&#8217;s account of what McCarthy did</h4>
<p>Graham has written a summary of McCarthy&#8217;s original Lisp paper. Great as the original undoubtedly is, it benefits from the thirty years of hinds<br />
ight that Graham was able to give it. The summary can be summarized by starting with a number of questions:</p>
<ol>
<li>What is a simple and flexible data structure?</li>
<li>What is a textual expre<br />
ssion to represent such a data structure as simply as possible?</li>
<li>What is a small and complete set of operations and tests on the data structure?</li>
<li>What is a small and complete set of logical operations?</li>
<li>How to represent definitions of subroutines in the data structure?</li>
<li>How to represent calls to these subroutines in the data structure?</li>
</ol>
<p>McCarthy gave the following answers:</p>
<ol>
<li>A possibly empty binary tree.</li>
<li>A dotted pair or NIL.</li>
<li>CONS, CAR, CDR, EQ, ATOM</li>
<li>LAMBDA, LABEL, COND, QUOTE</li>
<li>As a binary tree. The left subtree denotes<br />
the name of function being defined. The right subtree denotes the expression denoting the value of the function. The context provided by reading the source text tells the interpreter which expressions are definitions and which is the expression to be evaluated initially.</li>
<li>As a binary tree. The left subtree denotes the function, the right subtree denotes its arguments.</li>
</ol>
<p>Graham then delivers the punch line: McCarthy&#8217;s two mutually recursive functions, <em>apply</em> and <em>eval</em>, defined in terms of the primitive functions that specify the value of an arbitrary expression. Eval served as the first Lisp interpreter.</p>
<p>The method followed by McCarthy can be generalized to a recipe for an abstract machine. What makes McCarthy&#8217;s machine abstract is that it assumes all memory to have the form of binary trees. The abstract machine consisting of binary-tree memory with CAR, CDR, CONS, ATOM as instructions is interestingly different from the abstract machines for other languages, such as Prolog and Java. In the latter languages an occasional programmer like myself is not capable of writing a procedure or function in abstract machine code. In Lisp, on the other hand, CAR, CDR, CONS, ATOM are the most common functions in <em>every</em> program. This makes Lisp both low-level and high-level, the key to its versatility. &lt; p&gt; In this way we can reconstruct pure functional programming with subroutines from the ground up. How about pure procedural programming? It is an interesting exercise to accept the above five questions as a recipe for the design and implementation of a logic programming language.</p>
<p>What the above way of telling the story of Lisp suggests is that <em>it is the memory model that makes programming high-level</em>. The experience with logic programming and Prolog suggests that rational trees are a suitable starting point. The fact that rational trees enable pure procedural programming makes this interesting enough to apply McCarthy&#8217;s recipe to this choice of memory model. As result we should get an interesting reconstruction of logic programming, one of the few truly novel programming paradigms in the, by now, sixty-year history of the art.</p>
<h4>Notes</h4>
<p>[Note HundredYearLanguage]<br />
&#8220;<a href="http://www.paulgraham.com/hundred.html"> The Hundred-Year Language</a>&#8221; by Paul Graham.<br />
[Note McCarthyHistory]<br />
&#8220;History of Lisp&#8221; Stanford AI Laboratory memo 1979 by J. McCarthy, page 6.<br />
[Note SchemeRevisit]<br />
&#8220;The first report on Scheme revisited&#8221; Sussman and Steele, 1998.<br />
[Note SussmanSteele]<br />
AIM-349 MIT Artificial Intelligence Laboratory, 1975.<br />
[Note TuringACE]<br />
&#8220;Proposal for the development in the Mathematics Division of an automatic computing engine&#8221;, by A.M. Turing, Report E882 Executive Committee NPL 1946. Reprinted April 1972 as NPL report Com.Sci. 57. (Turing proposed subroutines under the name &#8220;subsidiary operation&#8221;.)<br />
[Note WilkesWheelerGill]<br />
Wilkes, Maurice V. et al &#8220;The Preparation of Programs for an Electronic Digital Computer&#8221;, 1951 This book is organized around the concept of subroutine.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/451/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/451/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/451/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/451/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/451/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/451/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/451/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/451/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/451/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/451/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/451/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/451/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/451/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/451/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=451&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2011/10/31/mccarthys-recipe-for-a-programming-language/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>Scruffies and Neats in Artificial Intelligence</title>
		<link>http://vanemden.wordpress.com/2011/09/11/scruffies-and-neats-in-artificial-intelligence/</link>
		<comments>http://vanemden.wordpress.com/2011/09/11/scruffies-and-neats-in-artificial-intelligence/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 05:51:46 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=444</guid>
		<description><![CDATA[In a previous essay [0] I traced the Lighthill Affair to the tension between the scruffies and the neats in Artificial Intelligence. As a reminder the official [1] definition of these terms: &#8221; &#8230; the neats — those who think that AI theories should be grounded in mathematical rigor — versus the scruffies — those [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=444&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In a previous essay [<a href="#lighthill">0</a>] I traced the Lighthill Affair to the tension between the scruffies and the neats in Artificial Intelligence. As a reminder the official [<a href="#russellNorvig">1</a>] definition of these terms:</p>
<blockquote><p>&#8221; &#8230; the neats — those who think that AI theories should be grounded in mathematical rigor — versus the scruffies — those who would rather try out lots of ideas, write some programs, and then assess what seems to be working.&#8221;</p></blockquote>
<p>For a few lines this is a pretty good characterization. But I think it only scratches the surface. In this essay I will explore the contrast in temperament and attitude that exists along several dimensions and is found elsewhere in science.</p>
<p><span id="more-444"></span></p>
<p>One of these dimensions is the one in which <em>pious</em> contrasts with <em>naughty</em>. The choice of these terms bears explanation. In reading a scientist like Dijkstra I get the impression that for him science is the highest, most noble calling that a human can aspire to. Hence, &#8220;pious&#8221;. The other type can be no less excellent as a scientist, but does not exude the pious aura. Consider Richard Feynman who started to wonder why, or whether, the human sense of smell is so much less sensitive than a dog&#8217;s. After some encouraging results that showed he was a better sniffer than humans are generally believed to be, he formed the hypothesis that dogs are only better at sniffing trails because they have their nose closer to the ground. So he crawled around the rug on his hands and knees, sniffing, to find out whether he could tell the difference between where he had walked and where he hadn&#8217;t [<a href="#surely">2</a>]. Because of this episode I don&#8217;t classify Feynman among the pious scientists. (by the way,  the hypothesis was rejected, even when he had done the walking barefoot. In line with popular belief, the dog <em>was </em>a better sniffer than its owner.)</p>
<p>I might have used &#8220;prankish&#8221; rather than &#8220;naughty&#8221;, were it not for a passage in an essay by my favourite venture capitalist. The essay I have in mind [<a href="#Graham">3</a>] discusses what to look for in a start-up company in search of funding. Graham lists five points, among which number four is <em>naughtiness</em>.</p>
<blockquote><p>Though the most successful founders are usually good people, they tend to have a piratical gleam in their eye. They&#8217;re not Goody Two-Shoes type good. Morally, they care about getting the big questions right, but not about observing proprieties. That&#8217;s why I&#8217;d use the word naughty rather than evil. They delight in breaking rules, but not rules that matter.</p></blockquote>
<p>It is this that I recognize in accounts of Feynman. As another example in science, consider Irving John (&#8220;Jack&#8221;) Good. He was the General Editor of a 413-page volume with the title <em>The Scientist Speculates: an Anthology of Partly-Baked Ideas</em> [<a href="#pbi">4</a>]. Many of the 123 entries have intriguing titles. A sampling:</p>
<ul>
<li>pbi #48 Explosive Telepathic Fields</li>
<li>pbi #85 Life in the Sun [i.e. on the surface of]</li>
<li>pbi #79 Multipurpose Plants</li>
<li>pbi #76 Steak from Sawdust</li>
<li>pbi #67 A Problem for the Hedonist</li>
<li>pbi #67 A Method for Encouraging Clairvoyance in Rats</li>
<li>pbi #40 Self-Organizing Pumps and Barges</li>
<li>pbi #39 A Splendid National Investment.</li>
</ul>
<p>Would any self-respecting person deign to be associated with such a prankish enterprise? Look at this partial list of contributors: Bruno de Finetti, Dennis Gabor, Wassily Leontief, J.E. Littlewood, N.W. Pirie, John Maynard Smith, C.H. Waddington.</p>
<p>And, not surprisingly, we find contributions from Marvin Minsky and Donald Michie. The latter started off the present article indirectly via its predecessor about Michie and Longuet-Higgins, two very different temperaments, of whom it was hoped that their complementary talents would result in fruitful collaboration.</p>
<p>Michie was an experimentalist, but an unusual one. His lack of computer access spurred him in 1965 to invent an experimental set-up named MENACE (&#8220;Matchbox Educable Nought-And-Crosses Engine&#8221;), a demonstration of the BOXES learning algorithm in the form of a contraption that, on closer inspection, turns out to be an agglutination of 256 salvaged matchboxes requiring the operator to transfer coloured beads between boxes [<a href="#menace">5</a>].</p>
<p>As I argue in [<a href="#vEwordpress">6</a>], scruffy in AI got its imprint from the MIT hackers as described by Steven Levy [<a href="#hackers">7</a>]. They were not only AI pioneers, but also computer pioneers as well as pranksters. This last aspect showed in their allergy to passwords and locks (of the hardware kind). Back to Feynman, who was their soul brother in this respect [<a href="#surely">2</a>].</p>
<p>Enough about naughty versus pious. In physics there is a lively animosity between experimentalists and theorists. Here follow some illustrations.</p>
<p>When Edsger Dijkstra was asked how he got into computing, he would tell that he was a student in &#8220;<em>theoretical </em>physics&#8221;. I suspect that the University of Leiden did not have a theoretical option in its physics program, but that some students had already differentiated in their minds.</p>
<p>When Christopher Longuet-Higgins descended into the Department of Machine Intelligence in Edinburgh, he had earned the freedom to do so by important results in <em>theoretical </em>chemistry.</p>
<p>One of my favourite pop-physics books is the one by Leon Lederman [<a href="#lederman">8</a>]. I recommend it, in spite of the god-awful title (no doubt a concoction of the publisher&#8217;s). I learned some physics from it. I also enjoyed his characterization of theorist versus experimentalist. Lederman keeps tracks of points scored by both sides: theorists anticipated the positron, the pion, the antiproton, and the neutrino, particles detected soon after in experiments. These predictions fed the already considerable arrogance of the theorists (&#8220;They need us to tell them what they are seeing.&#8221;). Theorists failed to anticipate the tau-lepton, the upsilon and the muon. As regards the latter, Isidor Rabi is famous for having remarked, in arch-theorist mode, &#8220;Who ordered <em>that</em>?&#8221; Lederman, after a dutiful homily about theorists and experimentalists using their complementary skills to uncover the secrets of the universe, is more convincing about the more mundane aspects of the differences between the two tribes:</p>
<blockquote><p>Today we have two groups of physicists both with the common aim of understanding the universe but with a large difference in cultural outlook, skills, and work habits. Theorists tend to come in late to work, attend grueling symposiums on Greek islands or Swiss mountaintops, take real vacations, and are at home to take out the garbage more often. [...] They tend to worry about insomnia. [...] Experimentalists don&#8217;t come in late: they never went home. [...] Sleep is when you can curl up on the accelerator floor for an hour.</p></blockquote>
<p>You get the flavor. Yet the poor experimentalists are not drop-out theorists. Nor do they leave for easier lives in the financial sector. It looks like experimentalists can&#8217;t help being what they are.</p>
<p>It will suffice to remind the reader of the notorious &#8220;Pauli effect&#8221; [<a href="#pauliEff">9</a>], the conclusive evidence of the vast difference between theoretical and experimental physicists.</p>
<p>Could it be that the difference is one of temperament and that a similar difference is found in computer science and AI? I think so, and I think that it explains the vast difference between the Longuet-Higginses and someone like Minsky, who not only did a lot of important things in AI, but also invented, built, and patented a new kind of microscope [<a href="#minskyMicro">10</a>].</p>
<p>But, as one of my esteemed correspondents objects, Longuet-Higgins <em>loved</em>, and did, experiments; elegant little experiments. That illustrates my point: for a theorist experiments have to be elegant. If a question can&#8217;t be settled this way, the theorist thinks the time is not ripe. An experimentalist can&#8217;t wait. If nobody knows how to keep it elegant and little, then the expermimentalist tries to get the money, the facilities, and the people for a big messy experiment.</p>
<p>Sooner or later discussions about clashing temperaments will return to the Snow-Leavis Affair, alias the Two Cultures debate. It started with the 1959 Rede lecture in Cambridge University in which scientist, novelist, and civil servant Charles Snow described and deplored the fact that there seemed to be an unbridgeable gap between the scientific and literary scholars. The literary critic F.R. Leavis reacted to Snow&#8217;s lecture with a savage attack. Although fellow literary scholars thought the tone of Leavis&#8217;s article deplorable, they tended to agree that Snow&#8217;s analysis was shallow.</p>
<p>After half a century it is difficult to appreciate the furore arising from Snow&#8217;s lecture. Yet it is worth revisiting in the form of [<a href="#twoCultures">11</a>], especially for Stefan Collini&#8217;s preface, from which I quote:</p>
<blockquote><p>The literary critic, habitually attending to the fine texture of verbal detail, can at times barely be persuaded that something is being said at all if it is being said badly. It is almost a truism of the critic&#8217;s working practice that the conventional distinction between form and content is misleading in literature: a work is those words in that order — one cannot blithely assume some &#8220;meaning&#8221; behind them which failed to get itself expressed properly but which is nonetheless the &#8220;message&#8221; of the text.</p></blockquote>
<p>This fastidiousness has persisted into the twenty-first century. Many of the humanities lectures I have recently attended consist of the lecturer <em>reading </em>the paper, not &#8220;talking&#8221; to a rapid succession of Powerpoint slides. The verbatim record of the talk (if one would ever come to light) would be a jumble of words depending on a great deal of goodwill for any intelligibility. The text that was read is a joy to read after the occasion. In such fastidiousness I recognise the theorist, the neat: if an experiment cannot be done neatly, then it should not be done at all; we are not ready for it.</p>
<p>Dijkstra&#8217;s THE operating system [<a href="#THEsystem">12</a>], was an elegant little experiment. Of course it took half a dozen people a few years of hard work to write the assembler code for the Electrologica X8. But compared to contemporary operating systems (other than Unix) THE was both elegant and little. In physics there is a constant interaction between the experimentalists and theorists in the sense that a physics experiment only counts as such when theorists agree that it makes sense. In software the scruffies not only dominate, but there do not seem to be any neats around even to criticize, let alone restrain, the ubiquitous big, messy experiments. The reason is of course that, unlike in physics, some of the big, messy experiments generate big, neat piles of money.</p>
<h4>References</h4>
<p>[<a name="lighthill"></a>0] <a href="http://vanemden.wordpress.com"> A Programmer&#8217;s Place</a> February 18, 2011.<br />
[<a name="russellNorvig"></a>1] <em>Artificial Intelligence: A Modern Approach</em> by Stuart Russell and Peter Norvig. Prentice-Hall, First edition 1995, page 21.<br />
[<a name="surely"></a>2] <em>&#8220;Surely, You&#8217;re Joking, Mr Feynman!&#8221;</em> by Richard P. Feynman, as told to Ralph Leighton. W.W. Norton, 1985.<br />
[<a name="Graham"></a>3] <a href="http://www.paulgraham.com/founders.html"> What We Look for in Founders</a><br />
[<a name="pbi"></a>4] <em>&#8220;The Scientist Speculates: an Anthology of Partly-Baked Ideas&#8221;</em> Irving John Good, ed. Heinemann, 1962.<br />
[<a name="menace"></a>5] &#8220;BOXES: an experiment in adaptive control&#8221; D. Michie and R.A. Chambers. In: <em>Machine Intelligence</em> vol. 2. Ella Dale and Donald Michie, eds. Oliver and Boyd 1968.<br />
[<a name="vEwordpress"></a>6] <a href="http://vanemden.wordpress.com"> The MIT style in Artificial Intelligence 1958 – 1985 </a><br />
[<a name="hackers"></a>7] <em>Hackers</em> by Steven Levy. Doubleday, 1984.<br />
[<a name="lederman"></a>8] <em>The God Particle</em> by Leon Lederman with Dick Teresi. Dell, 1993.<br />
[<a name="pauliEff"></a>9] <a href="http://en.wikipedia.org/wiki/Pauli_effect"> The Pauli Effect </a><br />
[<a name="minskyMicro"></a>10] <a href="http://web.media.mit.edu/~minsky/papers/ConfocalMemoir.html"> <em>Memoir</em></a> <em>on Inventing the Confocal Scanning Microscope</em> by Marvin Minsky.<br />
[<a name="twoCultures"></a>11] <em>The Two Cultures</em> by C.P. Snow, with introduction by Stefan Collini. Cambridge University Press, 1993.<br />
[<a name="THEsystem"></a>12] <a href="http://en.wikipedia.org/wiki/THE_multiprogramming_system"> THE multiprogramming system</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/444/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/444/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/444/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/444/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/444/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/444/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/444/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/444/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/444/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/444/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/444/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/444/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/444/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/444/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=444&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2011/09/11/scruffies-and-neats-in-artificial-intelligence/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>The MIT style in Artificial Intelligence 1958 – 1985</title>
		<link>http://vanemden.wordpress.com/2011/06/07/the-mit-style-in-artificial-intelligence-1958-%e2%80%93-1985/</link>
		<comments>http://vanemden.wordpress.com/2011/06/07/the-mit-style-in-artificial-intelligence-1958-%e2%80%93-1985/#comments</comments>
		<pubDate>Tue, 07 Jun 2011 13:46:23 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=418</guid>
		<description><![CDATA[The MIT style in Artificial Intelligence 1958 – 1985 In a previous article I illustrated the tension in Artificial Intelligence (AI) between two mentalities, generally referred to as &#8220;scruffy&#8221; and &#8220;neat&#8221;. I started by accepting the characterization by Norvig and Russell [1]. &#8220;&#8230; the neats — those who think that AI theories should be grounded [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=418&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h4>The MIT style in Artificial Intelligence 1958 – 1985</h4>
<p>In a previous <a href="http://tinyurl.com/3s8k8f4">article</a> I illustrated the tension in Artificial Intelligence (AI) between two mentalities, generally referred to as &#8220;scruffy&#8221; and &#8220;neat&#8221;. I started by accepting the characterization by Norvig and Russell [<a href="#russellNorvig">1</a>].</p>
<blockquote><p>&#8220;&#8230; the neats — those who think that AI theories should be grounded in mathematical rigor — versus the scruffies — those who would rather try out lots of ideas, write some programs, and then assess what seems to be working.&#8221;</p></blockquote>
<p>Critics of my article have opened up so many dimensions of scruffy versus neat that it is hard to know where to start. From my disorganized thoughts one recurrent theme emerged: the distinctive MIT style during the formative years of AI.</p>
<p><span id="more-418"></span></p>
<p>&nbsp;</p>
<p>In my previous article I explained the Lighthill affair as a conflict between AI proper and those who were stuck in cybernetics. That was in 1972. But cybernetics was there before there was AI and it&#8217;s not surprising that that was where some AI people started.</p>
<p>The year was 1946, the place was Cambridge, Massachusetts. The chronicler [<a href="#histComp">2</a>] was Garrett Birkhoff (1911 – 1996). Birkhoff made his reputation in the 1930s as a brilliant young mathematician of the pure variety. His talent was mobilized during World War II and applied to fluid dynamics. Cambridge was abuzz with the developments that Norbert Wiener (1894 – 1964) was soon to synthesize under the label &#8220;cybernetics&#8221;. As a Harvard mathematician, Birkhoff was not part of this, but confesses himself [<a href="#histComp">2</a>] a fascinated &#8220;voyeur&#8221;. He identifies as seminal event a public lecture featuring Wiener, John von Neumann (1903 – 1957), and Nicolas Rashevsky (1899 – 1972). The latter was a Chicago professor known for promoting the concept of mathematical biology. The confident organizers had booked an auditorium seating five hundred. Birkhoff estimated an overflow crowd of a thousand. Such was the start of cybernetics. It is safe to assume that the publication of &#8220;Cybernetics&#8221; by Wiener in 1948 and of &#8220;The Mathematical Theory of Communication&#8221; by Claude Shannon and Warren Weaver in 1949 were eagerly anticipated.</p>
<p>In the US the second world war had mobilized scientific talent and resources on a large scale. This effort launched novel developments such as operations research, game theory, automata theory, time series analysis, stochastic processes, information theory. By 1945 the innovation machine had gotten up to speed and was to continue unabated for over a decade. Electronics was the technology that made many things practical that had hitherto been confined to theory. In turn, theoretical studies were stimulated by the technological developments. The impact of electronics ranged widely, from the automation of computing to experimental investigation of the nervous systems of animals and humans. To some farsighted researchers there were many interconnections between these new developments. &#8220;Cybernetics&#8221; was the term coined to characterise the heady brew of synergies.</p>
<p>Hard on the heels of these novelties another wave was coming. Turing talked (and Shannon wrote [<a href="#shannChess">3</a>]) about <em>programming a computer</em> to play chess, rather than the trajectory of an artillery shell. Turing famously wrote [<a href="#turingThink">4</a>] about the corollary of a computer being programmed to <em>think</em>. In the early 1950s Shannon and McCarthy wanted to convene a symposium, not about cybernetics, but about the new New Thing: programming a computer to display human-level intelligence. McCarthy proposed as title &#8220;Artificial Intelligence&#8221;. Shannon could hardly disagree, as author of the first article about programming a computer to play chess. But he argued for a less provocative term and got his way: &#8220;Automata Studies&#8221;. As McCarthy later ruefully observed, &#8220;&#8230;, and guess what we got &#8230; automata studies&#8221; [<a href="#TVdebate">5</a>].</p>
<p>When the next opportunity presented itself, McCarthy had learned not to fly any false flags. The grant application proposed a &#8220;Summer Research Conference on Artificial Intelligence&#8221;. It was held in Dartmouth College in 1956 [<a href="#dartmouth">14</a>]. &#8220;Artificial Intelligence&#8221; stuck. Then, and ever since, many were unhappy with the term, but ended up using it for lack of a better alternative. This time the meeting was successful. In many of the participants and their discussions we recognize the real thing, in addition to some cybernetics and the inevitable automata studies.</p>
<p>From our current vantage point it is difficult to appreciate how insanely ambitious the idea of <em>thinking computers</em>was compared to the hardware and software technology of the time. In 1956 computers were established as machines for scientific and for administrative applications. Such applications thrived: they were of interest to organizations with lots of money and these applications fitted comfortably inside the constraints imposed by the technology of the time. The hardware and software was developed at Univac and IBM for these purposes and delivered to corporate and university customers accordingly. For people like those who came to the Dartmouth Summer School the software of the time and the way the computers were run was intensely frustrating.</p>
<p>Before anything could happen in the direction of AI much practical development had to be done. In Dartmouth McCarthy was not only aware of this, but had the most advanced software research program. He had correctly identified the incipient Fortran compiler at IBM as the most promising starting point. He extended the compiler to handle the list data structure invented by Newell and Simon at RAND in Santa Monica.</p>
<p>While at Harvard professor Birkhoff was a fascinated bystander there was in his department an equally fascinated person, an undergraduate, who was not content to remain a bystander: he built SNARC, the first randomly wired neural network learning machine. Marvin Minsky continued to a PhD in mathematics in Princeton, with the thesis &#8220;Neural Nets and the Brain Model Problem&#8221;. He returned to Cambridge to work in Oliver Selfridge&#8217;s cybernetics group at the MIT Lincoln Laboratory in 1956 [<a href="#comtex">7</a>].</p>
<p>In 1958 the AI project was launched at MIT, with Minsky and McCarthy as founding faculty. McCarthy had come to the conclusion that souped-up Fortran was not satisfactory for the Advice-Taker project, nor was the Algol 58 that McCarthy was involved in. He needed to develop his own language. Upon his arrival at MIT he found the resources to do so: an IBM 704 and support for his programmers S. Russell and K. Maling [<a href="#stoyan">8</a>].</p>
<p>That researchers at MIT had to depend on a machine bought from IBM suggests that MIT was behind IBM in computer development at the time. This is only true of the nonclassified part of MIT. Behind the veil of military security MIT had developed the most advanced computers of its time: Whirlwind, followed by TX0. The latter machine was a test vehicle for novel technology: transistors for the processor and ferrite cores for memory. When the test was successful, a more powerful version was developed, the TX2. The redundant TX0 was transferred in 1958 to the MIT Research Laboratory for Electronics, the home of the AI project.</p>
<p>Up till that time the Tech Model Railroad Club was just one of the student clubs on the MIT campus. TMRC had evolved an elaborate model railroad installation. For one part of the club the installation was a relief from technology. These students delighted in making pretty landscapes and in neatly painting the engines. The invisible part of the installation attracted the other part of the club: the freshest of freshmen and the nerdiest of nerds. They developed the underside of the installation, an insanely elaborate switching system with the complexity of a telephone switching office and indeed built with such equipment [<a href="#hackers">9</a>]. Russell, though not an MIT student, became a member of TMRC.</p>
<p>When the TX0 emerged in 1958 from its security-enforced seclusion it was not long before the more tech-crazed TMRC members discovered it as a playground that was even more interesting than the underside of the TMRC installation. And it was not long before their single-minded obsession with complex systems made some of them into virtuoso programmers and systems engineers at a level as high as anywhere else. Such expertise was in demand, as the typical prof and even grad student at the time was clueless in this area [<a href="#hackers">9</a>].</p>
<p>The IBM installation only allowed experimentation at the level of user programs, which included McCarthy&#8217;s work. But it did not allow hooking up cameras or robotic hardware, or the experimentation with systems software that would make this useful. In the TMRC punks hanging out around the TX0 Minsky saw expertise badly needed for the AI laboratory. The hackers, undergraduates in danger of dropping out of their academic program, represented the extreme end of scruffy. They competed for computer access with the representatives of neat: graduate students working on their theses and profs looking for opportunities to publish in the scholarly literature [<a href="#hackers">9</a>].</p>
<p>As an illustration of the kinds of things going on at MIT around 1960, consider David Silver, one of the hackers. He joined the group as a fourteen-year old drop-out from grade six, after twice having skipped a grade backwards. The hackers first tolerated Silver as a sort of mascot. It wasn&#8217;t long before he had cobbled together a floor-roving robot controlled from his software that performed tasks that the graduate students were writing about in their theses and pronouncing infeasible on the basis of solid surveys of the scholarly literature. The graduate students were most upset. Experiments without underlying theory were bad enough. And this punk was gobbling up precious machine time for activities that weren&#8217;t even experiments (where were the designs? where were the records?). But Minsky felt that the hacker contributions were vital to keeping AI real [<a href="#hackers">9</a>].</p>
<p>McCarthy was not impressed with the TMRC hackers hanging around at the TX0. The feeling was mutual: McCarthy had an ulterior motive behind his programming. For the hackers programming was its own justification; a silly hack was as good as anything else to enjoy programming and to display virtuosity. McCarthy likened them to &#8220;ski bums&#8221; [<a href="#hackers">9</a>]. Yet, as McCarthy acknowledges, one of the hackers made an important contribution to the implementation of Lisp. But then, this was McCarthy&#8217;s own Steve Russell, who had become a hacker in the sense of joining TMRC. The AI project was struggling with a compiler for Lisp, with no end in sight. McCarthy had shown Russell his paper about Lisp as an alternative formalism for specifying computable functions, one based on symbolic expressions rather than numbers. The required universal function was EVAL. Russell suggested programming it on the IBM 704.</p>
<blockquote><p>&#8230; This EVAL was written and published in the paper and Steve Russell said, look, why don&#8217;t I program this EVAL and you remember the interpreter, and I said to him, ho, ho, you&#8217;re confusing theory with practice, this EVAL is intended for reading not for computing. But he went ahead and did it. That is, he compiled the EVAL in my paper into 704 machine code fixing bugs and then advertised this as a LISP interpreter which it certainly was, so at that point LISP had essentially the form that it has today, the S-expression form &#8230; (McCarthy quoted in [<a href="#stoyan">8</a>])</p></blockquote>
<p>I infer that the compiler project went on the back burner for a while.</p>
<p>This feat, destined to become History, did not deflect Russell from his involvement with hacks. In 1962 he programmed the PDP-1 so his friends could play Space War. At the time, a neat hack to show off the graphic display of the PDP-1 and to try out a new symbolic interactive debugger. In retrospect, the neat hack was also the invention of the very concept of something destined to grow into an industry in its own right.</p>
<p>The hackers were totally absorbed in the act of programming and in the exploration of the capabilities of the computer hardware they found and which they urgently needed to modify. They found in their own small group all the audience they needed. No need to publish anything. This attitude, though the exact opposite of the typical academic&#8217;s, was congenial to Minsky, who was later to reminisce:</p>
<blockquote><p>&#8230;when it was our strategy in those early days to be unscholarly; we tended to assume, for better or for worse, that everything we did was so likely to be new that there was little need for caution or for reviewing literature or for double-checking anything. As luck would have it, that almost always turned out to be true [<a href="#comtex">7</a>].</p></blockquote>
<p>Could it be that Minsky was infected by the hackers&#8217; example? It looks like Minsky&#8217;s attitude was the root of the specific MIT style in AI, a style that was later to be, if not denounced, at least branded, as &#8220;scruffy&#8221;.</p>
<p>Compare McCarthy&#8217;s Advice-Taker project [<a href="#mccarthyNPL">12</a>]. His approach was the epitome of &#8220;neat&#8221;: its knowledge base was to be expressed in formulas of predicate logic; the planning was to be the outcome of logical inference. McCarthy considered the existing programming languages, insofar as there were any, unsuitable for implementing such a system of logic. Hence his effort in developing a new language, the one that became Lisp. McCarthy must have assumed that somehow a suitable inference system would show up once Lisp was available. It didn&#8217;t. However, Lisp opened up so many exciting new possibilities that lack of progress in the Advice-Taker didn&#8217;t matter. These early years of Lisp were the years of the undocumented firsts that Minsky mentions in [<a href="#comtex">7</a>]; there is also documented work [<a href="#semIP">10</a>].</p>
<p>It was during this period that resolution inference appeared [<a href="#robinson">11</a>]. At first sight it seemed exactly what Advice Taker had been waiting for. The possibilities opened by resolution set off a wide-spread effort to realize the goal of Advice-Taker: given a description in logic of a world state and of a goal to be achieved, generate a constructive proof that the goal is achievable. The constructive nature of the proof implies that the sequence of actions to be performed can be picked out from the proof. In short, a proof as a plan. And automatically generated. That was the dream.</p>
<p>At MIT, however, the AI group may not have tried too hard to harness resolution to this task [<a href="#robinsonIntvw">13</a>]. After all, if you have Lisp, why would you let logic stand in the way of programming a planning agent? Carl Hewitt&#8217;s answer to this question was Planner, untypically for MIT, not a Lisp program, but the design for a <em>language</em>. Planner was intended for implementation in Lisp, and a subset named Micro-Planner was implemented. It was an extension of Lisp, enriching the control structure with the addition of automatic backtracking. This was also the choice in Prolog, which appeared soon after. Sussman, one of the implementers of Micro-Planner, came to the conclusion that the fully automatic backtracking implied in &#8220;planning&#8221; was unsatisfactory and thought some <em>conniving</em> on the part of the programmer would be a more realistic approach for the automatic generation of plans; hence the language Conniver, created in collaboration with Drew McDermott. When conniving also turned out to be too ambitious, backtracking was thrown out altogether, so that the programmer had to resort to <em>scheming</em>. With Guy Steele, Sussman designed Schemer to achieve the desired level of programmer control. And guess what, Schemer was essentially Lisp, except that it was the long overdue lexically scoped version of Lisp. Thus the planning problem was brought back to what it should have been all the time according to the characteristic MIT approach: a problem to be solved in Lisp. The ITS operating that Sussman and Steele were working with did not allow filenames longer than six characters, so that &#8220;Schemer&#8221; was truncated to &#8220;Scheme&#8221;, now a standardized and much-loved programming language, whose roots in planning have mostly been forgotten.</p>
<p>It is of course not wise to try to tie down the heyday of the MIT style in AI to an exactly specified period. In the title I chose 1958 for the start because that was the year that Minsky started the AI project at MIT. That was also the year when he presented a wide-ranging overview of AI [<a href="#minskyNPL">6</a>]. In the beginning Minsky spoke mainly through his students: in 1969 &#8220;Semantic Information Processing&#8221; appeared, edited by Minsky and with an article contributed by him. The bulk of the book consists of papers based on the PhD theses of Minsky&#8217;s first batch of students. The next milestone is an AI Memo co-authored with Seymour Papert [<a href="#aimemo">15</a>]. Reading these two, the extreme of Scruffy, makes it hard to realize that Minsky could be as Neat as any. Not long before <em>Semantic Information Processing</em> he had published <em>Computation: Finite and Infinite Machines</em> and the year after, with Seymour Papert, <em>Perceptrons: an Introduction to Computational Geometry</em>. As an undergraduate Minsky had studied with Andrew Gleason, a mathematician&#8217;s mathematician [<a href="#gleason">17</a>]. My guess is that Minsky could imagine becoming a mathematician like Gleason, but set his sights, if not higher, at least further.</p>
<p>For the end date in the title I picked that of the publication of Minsky&#8217;s &#8220;The Society of Mind&#8221; [<a href="#socMind">16</a>]. It starts like this:</p>
<blockquote><p>This book tries to explain how minds work. How can intelligence emerge from nonintelligence? To answer that, we&#8217;ll show that you can build a mind from many little parts, each mindless by itself.I&#8217;ll call &#8220;Society of Mind&#8221; this scheme in which each mind is made of many smaller processes. These we&#8217;ll <em>agents</em>. Each mental agent by itself can only do some simple thing that needs no mind or thought at all. Yet when we join these agents in societies — in certain very special ways — this leads to true intelligence.</p></blockquote>
<p>Of course, when you write a book &#8220;that tries to explain how minds work&#8221;, then you get in the way of the psychologists. For the better part of a century, psychology had tried very hard to be a real science. Physics and chemistry had dominated the intellectual scene so much that &#8220;real science&#8221; was identified with these fields. Physics and chemistry got great by never accepting apparent complexity at face value: there always had to be something simple hiding behind. As a result, for almost half a century, behavioralism reigned supreme in psychology. Only shortly before the appearance of AI, psychology freed itself from this stranglehold. Minsky dared to ask: what if the mind is not inherently simple? What if, behind this apparent complexity, there is &#8230; complexity? What if this complexity takes the form of a changing collection of many, many agents connected in intricate and changing patterns?</p>
<p>Behaviourist psychology, with its urge to conform to &#8220;real science&#8221; and its emphasis on the Scientific Method, was Neat. Minsky, in his bid to discover how the mind works, no holds barred, was Scruffy.</p>
<h4>Acknowledgments</h4>
<p>Thanks to Paul McJones, Alan Robinson, and Steve Russell for suggestions and corrections.</p>
<h4>References</h4>
<p>[<a name="russellNorvig"></a>1] <em>Artificial Intelligence: A Modern Approach</em> by Stuart Russell and Peter Norvig. Prentice-Hall, First edition 1995, page 21.<br />
[<a name="histComp"></a>2] Article by Garrett Birkhoff in <em>History of Computing</em> edited by N. Metropolis, J. Howlett, and Gian-Carlo Rota. Academic Press, 1980.<br />
[<a name="shannChess"></a>3] Claude E. Shannon: Programming a Computer for Playing Chess, Philosophical Magazine, Ser.7, Vol. 41, No. 314, March 1950.<br />
[<a name="turingThink"></a>4] Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460.<br />
[<a name="TVdebate"></a>5] <a href="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov">BBC television 1973</a>.<br />
[<a name="minskyNPL"></a>6] &#8220;Some methods of artificial intelligence and heuristic programming&#8221; in <em>Mechanisation of Thought Processes</em> Symposium No. 10, National Physical Laboratory, Her Majesty&#8217;s Stationery Office, 1959.<br />
[<a name="comtex"></a>7] &#8220;Introduction to the COMTEX Microfiche Edition of the Early MIT Artificial Intelligence Memos&#8221; by Marvin L. Minsky, AI Magazine, 1983: 19 – 22.<br />
[<a name="stoyan"></a>8] &#8220;Early Lisp History (1956 — 1959)&#8221; Herbert Stoyan. LFP &#8217;84: Proceedings of the 1984 ACM Symposium on LISP and functional programming.<br />
[<a name="hackers"></a>9] &#8220;<em>Hackers</em>&#8221; by Steven Levy. Doubleday, 1984.<br />
[<a name="semIP"></a>10] Marvin Minsky: &#8220;<em>Semantic Information Processing</em>&#8220;, MIT Press, 1968. Slagle&#8217;s <a href="http://dspace.mit.edu/handle/1721.1/11997">thesis</a>.<br />
[<a name="robinson"></a>11] J.A. Robinson: &#8220;A Machine-Oriented Logic Based on the Resolution Principle&#8221;, Journal of the ACM, 1965.<br />
[<a name="mccarthyNPL"></a>12] &#8220;Programs with common sense&#8221; in <em>Mechanisation of Thought Processes</em> Symposium No. 10, National Physical Laboratory, Her Majesty&#8217;s Stationery Office, 1959.<br />
[<a name="robinsonIntvw"></a>13] But see an <a href="http://www.aarinc.org/Newsletters/089-2010-10.html"> interview </a> with Robinson for an interesting peek behind the scenes.<br />
[<a name="dartmouth"></a>14] <a href="http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html"> Dartmouth 1956 proposal </a><br />
[<a name="aimemo"></a>15] &#8220;Artificial Intelligence: Progress Report&#8221; by Marvin Minsky and Seymour Papert, Artificial Intelligence Memo No. 252, January 1, 1972.<br />
[<a name="socMind"></a>16] <em>The Society of Mind</em> by Marvin Minsky. Simon and Schuster, 1985.<br />
[<a name="gleason"></a>17] See the interview with Andrew M. Gleason in <em>More Mathematical People</em> edited by D.J. Albers, G.L. Alexanderson, and C. Reid, Harcourt Brace Jovanovich, 1990.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/418/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/418/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/418/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/418/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/418/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/418/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/418/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/418/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/418/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/418/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/418/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/418/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/418/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/418/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=418&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2011/06/07/the-mit-style-in-artificial-intelligence-1958-%e2%80%93-1985/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov" length="169265179" type="video/quicktime" />
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>From the Chronicles of Scruffy Versus Neat: the Lighthill Affair</title>
		<link>http://vanemden.wordpress.com/2011/02/18/from-the-chronicles-of-scruffy-versus-neat-the-lighthill-affair/</link>
		<comments>http://vanemden.wordpress.com/2011/02/18/from-the-chronicles-of-scruffy-versus-neat-the-lighthill-affair/#comments</comments>
		<pubDate>Sat, 19 Feb 2011 06:43:21 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=389</guid>
		<description><![CDATA[The 1973 Lighthill Affair was an Affair in the sense of the Dreyfus and Profumo Affairs. And although it was scaled down to teacup size, it was big enough to make it into a textbook published twenty years later: &#8230; the Lighthill Report, which formed the basis for the decision by the British government to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=389&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The 1973 Lighthill Affair was an Affair in the sense of the Dreyfus and Profumo Affairs. And although it was scaled down to teacup size, it was big enough to make it into a textbook published twenty years later:</p>
<blockquote><p>&#8230; the Lighthill Report, which formed the basis for the decision by the British government to end support of AI research in all but two universities. (Oral tradition paints a somewhat different and more colorful picture, with political ambitions and personal animosities that cannot be put into print.) [<a href="#russellNorvig">1</a>]</p></blockquote>
<p>In this article I will put in print some of the things hinted at here, and elaborate on the issues that have remained topical.</p>
<p><span id="more-389"></span></p>
<p>I first heard the word “Lighthill” in November 1972, soon after I started my research fellowship with the Department of Machine Intelligence at the University of Edinburgh. As Jean Michie said to me: “What, you haven&#8217;t heard of <em>Lighthill</em>? That man who wants to do away with us?” I hadn&#8217;t, but it wasn&#8217;t before long I learned that Jean was right about his wanting to do away with the Department of Machine Intelligence and, what was more, about his having the clout to bring this about.</p>
<p>The “Lighthill” referred in the conversation with Jean came to mean the publication of a report, followed by a <a href="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov"> televised debate</a> at the Royal Institution in London. The proceedings gave the impression of a movie Crown Prosecutor presenting of his case. Sir James Lighthill, FRS, Lucasian Professor at Cambridge University pronounces from behind a lectern on a platform elevated above the other participants of the debate, seated in the pit below: Donald Michie, John McCarthy, and Richard Gregory. The debate was actually only the second half of the proceedings; the first half consisted of an oration by Lighthill standing on his platform. Science policy or theater?</p>
<p>At the time the Department of Machine Intelligence, Michie&#8217;s creation, had become a world centre of AI. To give you an idea, I&#8217;ll list (as far as memory serves) of visitors to the department I met over the years I was there [<a href="#people">2</a>]. A year later the Department of Machine Intelligence at the University of Edinburgh was reduced to three people: Donald Michie, professor of Machine Intelligence, one secretary, and one technician. The remainder joined the newly created Department of Artificial Intelligence or left for jobs elsewhere. All this on the basis of the report of a single person, ignorant of the research area. It is extraordinary that a body dispensing public funds, such as the UK Science Research Council, could have proceeded in this manner.</p>
<p>If a report were necessary, it should have been written by an American expert. Only in the US was a suitable depth of expertise available. Moreover, the search for a British expert would have been complicated by the difficulty of finding someone not connected via the old boy network. Not only was the reporting “expert” British, but also the old boy connection was present with a vengeance: Lighthill was one of, what I shall call, “The Winchester Four”, described in an <a href="http://tinyurl.com/4atx9kw">interview</a> with Freeman Dyson:</p>
<blockquote><p><em> IOPScience: </em> You excelled in mathematics at school. Were those happy days?<br />
<em> Freeman Dyson: </em> Yes, on the whole. We were very lucky because everything was screwed up by the war [World War II] — I remember in my last year at Winchester having only seven hours of classes a week. It was wonderful — we were free to get our own education. The teaching was fairly good, but didn&#8217;t make much difference — we learned much more from each other than we did from the teachers. There were four of us, who were about the same age, who became fellows of the Royal Society — the Longuet-Higgins brothers, Sir James Lighthill and I.</p></blockquote>
<p><!-- The Winchester Four stayed close in their interests: all became mathematical physicists.  Freeman Dyson would have shared, with Sin-Itiro Tomonaga, Julian Schwinger, and Richard Feynman the 1965 Nobel Prize for their contributions to quantum electrodynamics if the rules for the prize would have allowed to share the prize among more than three.  Christopher Longuet-Higgins established his scientific reputation by applying the mathematics of quantum theory to chemistry.  Michael Longuet-Higgins distinguished himself with contributions to fluid dynamics and the physics of waves.  I shall keep them apart by referring to them, public-school style, as Longuet-Higgins major and Longuet-Higgins minor.  --> The Winchester Four stayed close in their interests: all became mathematical physicists. All four became Fellows of the Royal Society. If there would have been something like Nobel Nominees, then Freeman Dyson and Christopher Longuet-Higgins would have been among those; Dyson in quantum electrodynamics and Longuet-Higgins major (public-school style, to avoid confusing him with the younger brother) in quantum chemistry.</p>
<p>Longuet-Higgins major joined Donald Michie and Richard Gregory in founding the Department of Machine Intelligence in Edinburgh. Soon after this, Gregory left Edinburgh. By the early seventies an acrimonious dispute had developed between Longuet-Higgins (from now on all references to L-H will be to Longuet-Higgins major) and Michie. The latter, who had taken the initiative in the department&#8217;s founding, believed that it was time for Longuet-Higgins to assume the admittedly onerous duties as the head of the department. Longuet-Higgins denied that this could be expected of him.</p>
<p>The dispute with Michie was not the only reason for Longuet-Higgins to be dissatisfied with the Department of Machine Intelligence. Artificial Intelligence (from now on “AI”) suffered under the tension between two schools of thought, briefly denoted as the Neats and the Scruffies, described as follows by Russell and Norvig [<a href="#russellNorvig">1</a>]:</p>
<blockquote><p>“&#8230; the neats — those who think that AI theories should be grounded in mathematical rigour — versus the scruffies — those who would rather try out lots of ideas, write some programs, and then assess what seems to be working.”</p></blockquote>
<p>His mastery of mathematics had given Longuet-Higgins a deep understanding of quantum mechanics, which, in turn made it possible for him to explain certain chemical phenomena that had been missed by earlier followers of Linus Pauling&#8217;s program of explaining the nature of the chemical bond in terms of quantum mechanics [<a href="#nature">8</a>]. In AI or out, Longuet-Higgins was the epitome of Neat. By the early seventies Scruffy had become the dominant mode in AI. To the distress of Longuet-Higgins, Machine Intelligence in Edinburgh was not all Neat.</p>
<p>Such was the situation when it transpired that the UK Science Research Council had selected for its grand review of AI not an AI researcher, nor even a computer scientist, but, astonishingly, one of the Winchester buddies of Longuet-Higgins. One can imagine that we (that is, those whose jobs were about to be terminated) were not unbiased readers of the, to us, notorious Lighthill Report.</p>
<p>Many years have passed. Most of those forced out found other jobs, mostly far away from Edinburgh. I trust that for the others these exciting events have become a distant memory, as they have for me. A memory consisting of Sir James Lighthill as a pompous idiot who lent himself to produce a flaky report to serve as a blatantly inadequate cover for a hatchet job.</p>
<p>Last Christmas I was reading the autobiography of Laurent Schwartz (1915–2002), the great French mathematician. Schwartz was an amazing phenomenon. As a mathematician he was unusually versatile, spanning the entire range from probability and applied mathematics to Bourbaki (he even <em>was</em> a Bourbaki). For his efforts in opposing the Algerian and Vietnam wars I regard him as a French version of Pauling, or Chomsky.</p>
<p>Schwartz is mainly remembered for his discovery of <em>distributions</em>, a theory that mathematically modelled certain hitherto pseudo-mathematical objects that were found indispensable in physics; the Dirac delta function, for example. Schwartz made the discovery in 1944, in recently liberated Paris. It was a few years before the new theory gained a following. As usual it was among the best young mathematicians that it was first appreciated. Schwartz mentions one <em>James Lighthill</em>.</p>
<p>This, and the passage of time, caused me to review my 1973 impression of Sir James as a “pompous idiot”. I found the book by Lighthill [<a href="#genFun">4</a>] on distributions, purchased in 1958 by Victoria College (now University of Victoria). It is a gem: brief, elegant, accessible; a great way not only to learn the advanced concept of distribution, but also a good introduction to basics like Fourier series and integrals. A biography I dug up outlined an impressive career including practical applications in aerodynamics and directing the Royal Aircraft Establishment.</p>
<p>And I got the [<a href="http://www.chilton-computing.org.uk/inf/literature/reports/lighthill_report/p001.htm">Lighthill Report</a>]. Did I find valuable insights that I missed in 1973? Alas not. Yet, with the wisdom of hindsight, and knowing how Neat versus Scruffy has played out over the decades, it was worth re-reading the report. What I now read is a tract by a Neat on the attack, setting out to root out Scruff.</p>
<p>Allow me to elaborate on Neat and Scruffy as described by Russell and Norvig. Yes, the Neats believed that AI theories needed to be grounded mathematically. But I think it is more enlightening to say that to be Neat was to be <em>stuck in cybernetics</em>. “Cybernetics” is the term coined by Norbert Wiener and used as the title of his 1948 book introducing the concept. What inspired the new word was the recently discovered parallel between feedback control in electrical circuits and animal nervous systems. That Wiener mentions digital computers at all in a 1948 book shows that he was unusually well informed and perceptive (at the time “computer” meant, if not a person, an <em>analog</em> electronic computer).</p>
<p>In 1948, Wiener&#8217;s book was a conceptual breakthrough. The problem was that by 1970 the next conceptual breakthrough — the computer as new universe — had already happened, but the majority of mathematicians and scientists were still living in a world bounded by cybernetics. Admittedly, by then computers (now unambiguously digital) were vastly more useful than they were in 1948, but they still played a minor role in cybernetic research. For the Neats, the significance of improvements in computer technology was that electrical engineering systems could be automated in a more sophisticated way. This is laid out in the Advanced Automation section (“area A”) of the Lighthill Report. As befits a cybernetician, Lighthill applauds the use of computers for studying natural nervous systems (area C in the Lighthill Report). The report places some AI activities in areas A or C. The Neats had been imaginative enough to include post-Wiener additions to the mathematical arsenal, like resolution theorem-proving as method for problem-solving and planning. Not surprisingly the report was forced to include a third category, which it named area B. Here went everything that didn&#8217;t fit the cybernetics framework. Like robotics.</p>
<p>The Neats continued to regard the computer as an instrument useful only for the advancement of Wiener&#8217;s vision of cybernetics. I think the Scruffies deserve more credit than their characterization by Russell and Norvig. What distinguishes them is their sense of the computer being more than an instrument subservient to existing disciplines. To them the computer represents a phenomenon in its own right, one that promises to shed new light on endlessly debated conundrums like the nature of intelligence, thought, and consciousness. As Archimedes is reputed to have said: “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world,” so the implied motto of the Scruffy was “Give me a PDP-10 with Lisp and I will blast wormholes through the space of the traditional sciences.”</p>
<p>McCarthy disdained decades of psychology research in learning. “I am interested in the phenomenon where you teach, not by multiple training actions, but by <em>telling, once</em>, and the subject then knowing it forever — something that only happens in humans.” Michie exulted in bypassing control theory, much approved of by Lighthill as a gloriously successful and sophisticated branch of applied mathematics, to produce controllers for unstable systems via the admittedly stupid BOXES algorithm.</p>
<p>As a whole, AI at MIT was Scruffy and particularly so when it came to representing and using knowledge in computers. In this respect Stanford and Edinburgh were Neat: they embraced mathematical logic as a formal framework. They welcomed resolution because it was within mathematical logic as well as easy to implement on a computer. MIT viewed mathematical logic as a distraction. They believed that the resolution system could not accommodate the necessary pragmatics. Accordingly they advocated “procedural embedding of knowledge” by bypassing mathematical logic and creating a new language [<a href="#planner">6</a>].</p>
<p>The disagreement led to a confrontation at the 1970 Machine Intelligence workshop in Edinburgh. The workshop programme announced “The Irrelevance of Resolution”, a talk by Seymour Papert, one of the leaders of the AI group at MIT. Instead, one Gerry Sussman showed up with the message that he was sent by Papert to give the talk in his place. It looked like Sussman had no experience giving talks in faraway countries. He seemed nervous about going into the bastion of resolution theorem-proving and telling the assembled experts how totally misguided their work was. As a result he started by overdoing the confrontational aspect. A few minutes into the talk, Longuet-Higgins walked out of the room, pensively puffing his pipe.</p>
<p>In 1972 Minsky and Papert published what I view as the manifesto of Scruffy [AIM-252]. The summary on page 2 intones:</p>
<blockquote><p><tt> Thinking is based on the use of<br />
SYMBOLIC DESCRIPTIONS<br />
and description-manipulating processes<br />
to represent a variety of kinds of KNOWLEDGE<br />
— about facts, about processes,<br />
about problem-solving, and about computation itself,<br />
in ways that are subject to<br />
HETERARCHICAL CONTROL STRUCTURES<br />
— systems in which control<br />
of the problem-solving programs<br />
is affected by heuristics<br />
that depend on the meaning of events.<br />
</tt></p></blockquote>
<p>The words are verbatim from AIM-252; I have been so free as to add ventilation [<a href="#ventProse">7</a>], as called for by the solemnity of the occasion. Since that time we have learned that this is compatible with Neat; at the time it seemed the essence of anti-Neat.</p>
<p>The manifesto brings the computer beyond the role of enabler of cybernetics, which is where Neat had become stuck. It is a bold step putting the computer at the centre of a new world (let&#8217;s call it AI) where, for instance, it becomes possible to think (productively) about thinking. To establish AI as a subject in its own right it was initially necessary to repudiate heritage, even where it could be useful. In the early 1970s the Scruffies had to be unnecessarily confrontational, just as a teenager needs to be unnecessarily obnoxious. But the author of the Lighthill report did not have the strength of a wise parent. He was enraged by what he saw and constructed a report calculated to kill as much of Scruffy as possible.</p>
<h4>Acknowledgments</h4>
<p>Thanks to Alan Robinson for helpful criticisms, to Michael Levy for finding the Lighthill Report, and to André Vellino for finding the televised debate. The text benefited greatly by editorial input from Eva van Emden (http://editing.vanemden.com/)</p>
<h4>References</h4>
<p>[<a name="russellNorvig"></a>1] <em>Artificial Intelligence: A Modern Approach</em> by Stuart Russell and Peter Norvig. Prentice-Hall, First edition 1995, page 21.<br />
[<a name="people"></a>2] Here follows a list of AI researchers of whom I remember right now that I met them at the Department of Machine Intelligence between 1969 and 1975: Harry Barrow, Woodrow Bledsoe, Daniel Bobrow, Robert Boyer, Peter Buneman, Alan Bundy, Alain Colmerauer, John Darlington, Edward Elcock, Cordell Green, Patrick Hayes, Carl Hewitt, Gérard Huet, Robert Kowalski, Christopher Longuet-Higgins, David Luckham,  John McCarthy, Zohar Manna, Donald Michie, Robin Milner, Ugo Montanari, J Moore, Nils Nilsson, Seymour Papert, Ira Pohl, Alan Robinson, Laurent Siklossy, Aron Sloman, Gerald Sussman, Austin Tate, Richard Waldinger, and David Warren.<br />
[<a name="wormer"></a>3] I asked a friend whose life is in quantum chemistry, unsullied by AI, whether he had heard of H.C. Longuet-Higgins. His reply: “In my opinion Longuet-Higgins has made a very important contribution (1963) to theoretical chemistry by explaining that the symmetry of a molecule is not described by a point group, but by a permutation-inversion group. The latter group is in general unnecessarily large, and L-H has shown that only the subgroup of &#8216;feasible operations&#8217; (a concept invented by L-H) is significant. This is a very subtle piece of theory that remains a mystery to most chemists. &#8230; The point of L-H is that you should see the nuclei of atoms as quantum mechanical particles, and most chemists cannot do that, they are so used to the structure of molecules that they cannot understand that nuclei can tunnel through a potential barrier (whereas most chemists can understand that for electrons). Chemists say: you have trans-butadiene and cis-butadiene and those molecules are different. L-H says, no they are two wave functions with maxima in the trans and the cis configuration and the wave functions overlap: trans-butadiene is a &#8216;little bit&#8217; cis and vice versa.”<br />
[<a name="genFun"></a>4] <em>An Introduction to Fourier Analysis and Generalised Functions</em> by M.J. Lighthill. Cambridge University Press, 1958.<br />
[<a name="planner"></a>6] See Wikipedia article.<br />
[<a name="ventProse"></a>7] See <a href="http://tinyurl.com/46vfzyt">Ventilated Prose</a>.<br />
[<a name="nature"></a>8] <em>The Nature of the Chemical Bond</em> by Linus Pauling, 1939.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/389/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/389/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/389/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/389/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/389/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/389/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/389/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/389/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=389&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2011/02/18/from-the-chronicles-of-scruffy-versus-neat-the-lighthill-affair/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://www.aiai.ed.ac.uk/events/lighthill1973/1973-BBC-Lighthill-Controversy.mov" length="169265179" type="video/quicktime" />
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>Another Scoop by Dijkstra?</title>
		<link>http://vanemden.wordpress.com/2011/01/15/another-scoop-by-dijkstra/</link>
		<comments>http://vanemden.wordpress.com/2011/01/15/another-scoop-by-dijkstra/#comments</comments>
		<pubDate>Sun, 16 Jan 2011 00:27:38 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=366</guid>
		<description><![CDATA[Edsger W. Dijkstra (1930&#8211;2002) is known and remembered for many things in programming and in computing. But it seems that the problem of efficiently generating the sequence of prime numbers (2, 3, 5, 7, &#8230;) is not among them. I recently re-read his 1972 Notes on Structured Programming [5] and noticed for the first time, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=366&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Edsger W. Dijkstra (1930&#8211;2002) is known and remembered for many things in programming and in computing.  But it seems that the problem of efficiently generating the sequence of prime numbers (2, 3, 5, 7, &#8230;) is not among them.  I recently re-read his 1972 <em>Notes on Structured Programming</em> [<a href="#dijkstra">5</a>] and noticed for the first time, tucked away in a few lines, a remarkable insight that resolves the stand-off between the Sieve of Eratosthenes (efficient in terms of time, but not memory) and the method of Trial Division (efficient in terms of memory, but not time).</p>
<p><span id="more-366"></span></p>
<h4>Primes: from tables to programs</h4>
<p>First some background.  Prime numbers have been bugging (some) humans for many centuries.  The great Gauss himself was among those who fell under the spell of the primes.  For lack of a published table of primes he made one up for himself; he pored over its columns in search of a pattern.  In despair he gave up, concluding that it looked like the numbers were generated by some demon throwing dice.  And, bingo, this probabilistic model led him [<a href="#dusautoy">7</a>] to the famous conjecture about the distribution of primes that turned out to be correct, but took almost a century to be proved.</p>
<p>Given this history, it is not surprising that those bitten by the bug of the primes welcomed the computer as an alternative to printed tables.  Jacob Philip Kulik (1793 &#8211; 1863) devoted twenty years to a manuscript from which a printed table of the prime numbers up to eleven million was prepared [<a href="#sierpinski">1</a>].  Errors were corrected in the process; unkind souls allege that many remain.  By the time computers arrived, more dependable sources existed, though few and far between.  The process by which they were created inspired more confidence than poor Kulik slaving away in solitude, but worries about accuracy remained.</p>
<p>With the computer, a new phenomenon arose: the possibility of a <em>program</em> that generated in a matter of minutes (now seconds) all primes up to eleven million, the range of Kulik&#8217;s table.  Now, rather than rely on authority, we can convince ourselves of the correctness of the table by studying the page or so of code that it takes to generate the first <em>N</em> primes.  Still, even now there are trusting souls who put their faith in tables.  For example, I found on the web an applet claiming to list the primes up to a hundred million, with that range served up in convenient chunks of a thousand at a time.  But wouldn&#8217;t you rather rely on twenty lines of code that do the same job?  (Appendix <a href="#appendixA">A</a>).</p>
<p>The function in Appendix A is simple, and wasteful.  For example, soon after eliminating all factors 2, it tries to divide by 4.  It continues spending much of its time performing such superfluous operations.  Yet it is quite effective for undemanding tasks like serving up the factorizations of, say, a thousand numbers.  For reasons of uniformity, the code in Appendix A has been dumbed down to just indicating primality.</p>
<h4>Trial Division versus Sieve of Eratosthenes</h4>
<p>When we eliminate the redundancy of the function in Appendix A, we arrive at the method of Trial Division, which only tries the successive primes as potential divisors.  The primes that it needs are generated by itself in an earlier iteration.  Briefly, Trial Division works like this.  Starting from the last prime found, generate as candidate primes the successive odd numbers and test them for divisibility by all the primes in succession.  If no divisor is found by the time you reach the prime that is just greater than the square root of the candidate, you can stop and add the candidate as next prime to the table (Appendix <a href="#appendixB">B</a>).</p>
<p>Trial Division is one of the stock examples in programming courses. Another is the Sieve of Eratosthenes, another algorithm that generates the sequence of primes.  It is one of the earliest algorithms and was invented thousands of years ago.  The name of the algorithm suggests that you imagine a sieve into which you drop all the numbers.  You shake the sieve, and then the prime numbers come falling through.  It helps to imagine the sieve as a superposition of sub-sieves: S2, S3, S5, S7, &#8230;  Sub-sieve S2 catches all multiples of 2, S3 catches all multiples of 3, and so on for 5, 7, 11, &#8230;  Each sieve keeps back the multiples of the prime it is dedicated to, so that only primes get through.</p>
<p>The process can be illustrated by the following bit of ASCII art, depicting a sort of Morse-code rendering of the Sieve of Eratosthenes.  <a name="Figure1"> </a></p>
<pre>   ____________________________________________________________________________

       0000000011111111112222222222333333333344444444445555555555666666666677
       2345678901234567890123456789012345678901234567890123456789012345678901

   S2  -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
   S3  .-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..
   S5  ...-....-....-....-....-....-....-....-....-....-....-....-....-....-.
   S7  .....-......-......-......-......-......-......-......-......-......-.
   S11 .........-..........-..........-..........-..........-..........-.....

   Figure 1.
   The Sieve of Eratosthenes consisting of superposition of the sieves S2, S3,
   S5, S7, and S11. Each column consisting of dots only corresponds to a prime.
   ____________________________________________________________________________</pre>
<p>Read the top two lines vertically, giving the numbers to be sieved: 02, 03, &#8230;, 10, 11, &#8230;, 70, 71.  By default, numbers are rendered as dots.  In the line for S2 the dot at each multiple of 2 has been replaced by a dash; the one after that has a dash at each multiple of 3, and so on for 5, 7, and 11.  As a result, any column with dots only indicates that the number labeling the column is a prime.</p>
<p>The number theorists Derrick Lehmer (father and son) built devices based on bicycle chains with rods at certain significant intervals.  The assembly could be set up in such a way that it acted as a sieve of Eratosthenes, or as a generator of solutions of a Diophantine equation [<a href="#lehmers">2</a>].</p>
<p>To simulate the Sieve on a computer you could use an array starting with all zero bits and write ones every <em>p</em> locations, for <em>p</em> equal to 2, 3, 5, 7, and so on.  Where Trial Division uses divisions, the Sieve generates the various sieves by means of additions.  The Sieve used to be preferred because divisions were slower than additions.  In the old days not all computers had hardware to perform the divisions required by Trial Division, so there could be a huge difference in speed.</p>
<h4>The Assembly-Line algorithm</h4>
<p>If you want to generate all primes up to <em>N</em>, then the Sieve method requires an amount of memory proportional to <em>N</em>.  Trial Division needs memory for the table of the first primes, and this table need only have a size of the square root of <em>N</em>. So here is the stand-off: the Sieve is faster, while Trial Division needs less memory.</p>
<p>The difference between the two methods is an example of a phenomenon that is not restricted to computing.  Imagine that you have to manufacture a large number <em>N</em> of items.  Suppose that each item requires <em>M</em> operations A, B, C, &#8230; to be performed on it.  Imagine <em>N</em> in the thousands and <em>M</em> to be about the square root of <em>N</em>.  Suppose furthermore that each operation requires some start-up time. For example you need to consult the cheat sheet on how to perform the operation and you have to get the tools ready.  As a result it takes considerably less time to repeat the same operation on successive items than the same number of different operations on the same item.</p>
<p>This situation suggests two strategies:</p>
<ol>
<li> minimize time &#8212; line up all <em>N</em> items and perform on each in turn A, then on each in turn operation B, and so on.</li>
<li> minimize number of items in process &#8212; perform operations A, B, C, &#8230; on the same item, and get it out of the way.</li>
</ol>
<p>Of course we want a method that combines the advantages of both approaches.</p>
<p>Primes to be generated can be likened to items to be manufactured.  The operations are ensuring non-divisibility by 2, by 3, by 5, and so on.  The Sieve method is like laying out all items simultaneously.  As a result, ensuring non-divisibility can be done fast: only additions are needed.  Trial Division is like performing all operations on a single item before proceeding to the next.  Because ensuring non-divisibility by a given prime is done in isolation from the same operation on the other items, the division operation is needed, which is more expensive than addition.  As in manufacturing, the disadvantages of both strategies suggest looking for a way that combines the advantages of both approaches.</p>
<p>In manufacturing the <em>assembly line</em> has been invented to combine speed with reducing the number of in-process items.  In an assembly line there is one worker for each operation, who only performs that operation on the items as they are successively presented by a traveling belt.  Operations are performed at maximum speed.  The device  of the assembly line reduces the number of in-process items from <em>N</em> to <em>M</em>.</p>
<p>Let us translate the idea of the assembly line to the Sieve of Eratosthenes as illustrated in Figure <a href="#Figure1">1</a>.  Suppose we want to check whether N = 36 is a prime.  The workers on the assembly line are represented as the set of multiples of 2, 3, 5, 7, and 11 that are in column N or the first such to the right.  For N = 36 these are 36, 40, 42, and 44.  As one of the multiples is equal to N, N is dismissed as non-prime.  N is incremented.  Any multiples that are to the left of N are incremented by the corresponding prime; in this case 36 is incremented to 38 in its role as multiple of 2 and to 39 in its role as multiple of 3.  Now we have that N = 37 is less than the least multiple under consideration; this proves it to be prime.</p>
<p>The operations here are addition and identification of the least of the multiples.  This latter operation is potentially significant: for N  at eleven million there are over three thousand multiples. Therefore the multiples should be organized as a heap.</p>
<p>See Appendix <a href="#appendixC">C</a> for further explanation and a C++ implementation.</p>
<h4>Whence the Assembly-Line algorithm?</h4>
<p>What should we acknowledge as the source of the Assembly-Line algorithm?  Before I noticed Dijkstra&#8217;s hidden gem, the only source I knew for the assembly-line algorithm was a 2008 paper by Melissa O&#8217;Neill [<a href="#ONeill">3</a>].  O&#8217;Neill refers to the analogue of the assembly line as &#8220;just-in-time&#8221;.  This sounds like an analogy with manufacturing, but it is not.  &#8220;Just-in-time&#8221; manufacturing refers to coordination between factories, not to what happens within a single factory.  Apart from this discrepancy, &#8220;just-in-time&#8221; is an apt characterization of a balanced assembly line.</p>
<p>O&#8217;Neill gives no source for the Assembly-Line algorithm, so I assume she is one of the inventors of it.  The book by Crandall and Pomerance [<a href="#cranPom">4</a>] seems unaware of it.  Their pseudo-code for the Sieve is unclear about details.  I surmise they were not aware of the Assembly-Line algorithm because of their comment on page 121 &#8220;The biggest computer limitation on sieves is the enormous amount of space they can consume.&#8221;</p>
<p>The earliest description of the Assembly-Line algorithm that I know of is Dijkstra&#8217;s essay &#8220;Notes on Structured Programming&#8221; [<a href="#dijkstra">5</a>].  Dijkstra&#8217;s discovery may not have received the attention it deserved because it was not presented as a novel solution for an important problem in prime-number generation.  Dijkstra just wanted to describe in detail how to go about systematically developing an algorithm, <em>some</em> algorithm.  He picked the generating of prime numbers as an example that is somewhat interesting, yet short enough to fit in the report he was writing.</p>
<p>For his example of systematic program development Dijkstra chooses the Trial Division algorithm.  In an aside he remarks that the friends to whom he showed the draft objected that &#8220;Everybody knows that the Sieve is more efficient&#8221;, and refused to read further.  After meticulously motivating each tiniest step, Dijkstra almost completes an algorithm for the Trial Division method.  The last step is the actual division.</p>
<p>At the last step, implementing how to test for divisibility the candidate for the next prime, Dijkstra fills in the obvious modulo instruction.  This would have resulted, if transposed to the C programming language, in something like the code in Appendix <a href="#appendixB">B</a>.  But then he says (page 37): &#8220;To give the algorithm an unexpected turn we shall assume the absence of a convenient remainder computation.&#8221; He continues with a brief outline of how he could continue the development of his algorithm to one that maintains a location for each prime and ensures that each of these locations contains the least multiple of the prime that is not less than the number under consideration.  These multiples need only an addition to be updated; a division is never needed.  Yet he never needs more multiples of primes in memory than the square root of the size of the table to be produced.</p>
<p>There it is: among many other things, the Master invented prime-number generation according to the assembly-line principle.  The only thing that is missing is a mention of the need to repeatedly find the smallest of these multiples; that is, the need of a suitable data structure such as a heap.</p>
<h4>The Assembly-Line algorithm in action</h4>
<p>It may be that in the early seventies the expected speed of the Sieve was only attributed to the speed of addition.  Since then all algorithms have been subjected to asymptotic analyses.  As [<a href="#ONeill">3</a>] describes, the Sieve is not just faster by a constant factor, but is also asymptotically faster.  Encouraged by this scientific nugget, I implemented (Appendix <a href="#appendixC">C</a>) the Assembly-Line algorithm, using STL, the standard library for C++, for the required heap data structure.  I compared its results with the those of the program in Appendix <a href="#appendixB">B</a> on the re-computation of Kulik&#8217;s table, all prime numbers up to 11 million.</p>
<p>To my surprise, Trial Division, though asymptotically slower, does it about five times as fast as the Assembly-Line algorithm.  Of course, being asymptotically faster does not mean faster on <em>small</em> inputs.  Apparently, for this problem, 11 million is still a <em>small</em> input, too small for the asymptotic difference in speed to prevail.</p>
<h4>Winding up</h4>
<p>In winding up I need to report that, to prevent this from getting even longer, I omitted another fascinating skirmish of Dijkstra&#8217;s with prime numbers.  This is Chapter 17 (&#8220;An exercise attributed to R.W. Hamming&#8221;) in <em>A Discipline of Programming</em> [<a href="#discipline">6</a>].  Here he uses again what I call the assembly-line idea, but without acknowledging any connection with prime numbers and missing the opportunity to use addition.</p>
<p>Another time, perhaps.</p>
<h4>References</h4>
<p>[<a name="sierpinski">1</a>] Waclaw Sierpinski (Andrzej Schinzel, ed.): <em>Elementary theory of numbers</em>.  Elsevier, 1988, page 119.<br />
[<a name="lehmers">2</a>] http://en.wikipedia.org/wiki/Lehmer_sieve (December 31, 2010)<br />
[<a name="ONeill">3</a>] O&#8217;Neill, Melissa E., &#8220;The Genuine Sieve of Eratosthenes&#8221;, Journal of Functional Programming, Published online by Cambridge University Press 9 October 2008.<br />
[<a name="cranPom">4</a>] Richard E. Crandall and Carl Pomerance: <em>Prime numbers: a computational perspective</em>.  Springer-Verlag, 2005.<br />
[<a name="dijkstra">5</a>] O.J. Dahl, E.W. Dijkstra, and C.A.R. Hoare: <em> Structured Programming</em>.  Academic Press, 1972.<br />
[<a name="discipline">6</a>] E.W. Dijkstra: <em>A Discipline of Programming</em>.  Prentice-Hall, 1976.<br />
[<a name="dusautoy">7</a>] Marcus du Sautoy: <em>The Music of the Primes</em>.  HarperCollins, 2003.</p>
<h4><a name="appendixA">Appendix A</a></h4>
<p>A C++ program to print the prime numbers in a block of one thousand.</p>
<pre>#include &lt;iostream&gt;

using namespace std;
typedef unsigned nat; // natural number

bool prime(nat n) {
  nat n0 = n, factor = 2, product = 1;
  while (factor*factor &lt;= n) {
    while (n%factor == 0) {
      n /= factor; product *= factor;
    }     factor++;
  }
  return ((n != 1) &amp;&amp; (n == n0));
}
void primes(nat n) {   if (n &gt;= 1000)
  for (nat i = n-999; i &lt;= n; i++)
    if (prime(i)) cout &lt;&lt; i &lt;&lt; endl;
}</pre>
<h4><a name="appendixB">Appendix B</a></h4>
<pre>typedef unsigned nat; // natural number

void prTable(nat p[], nat n) {
// Place the first n prime numbers in p[0..n-1]
// using Trial Division.
  nat k;
  assert(n &gt; 1);
  p[0] = 2; p[1] = 3;
  k = 2;
  while (k &lt; n) {
  // p[0..k-1] are first k primes
    nat cand = p[k-1]+2;
    nat j = 0;
    while (p[j]*p[j] &lt;= cand) {
    // p[0..j] do not divide cand
      j++;
      if (cand%p[j] == 0) {
        j = 0; cand += 2;
      }
    }
    p[k++] = cand;
  }
  // p[0..k-1] are first k primes
  // k == n
}</pre>
<h4><a name="appendixC">Appendix C</a></h4>
<p>The Assembly-Line algorithm.</p>
<p>To determine whether N is a prime, we maintain as a heap S(N), a set containing a multiple of each of the primes up to the ceiling of the square root of N.  We can increment or decrement a multiple; this means increment or decrement (if possible) by the prime corresponding to the multiple.  We say that a multiple in S(N) is &#8220;too big&#8221; (with respect to N) if it can be decremented without making it less than N.  We say that a multiple in S(N) is &#8220;too small&#8221; (with respect to N) if it is less than N.  The theorem that is at the basis of the Assembly-Line algorithm says that N is prime if no multiple in S(N) is too big and if N is less than the least multiple; that is, less than the top of the heap.</p>
<p>An algorithm that starts with S(N) that does not contain a multiple that is too big can maintain this property by only incrementing multiples that are too small.  This suggests as algorithm for determining the next prime at or after N.</p>
<pre>Ensuring none of its multiples is too big,
create S(N) as a heap; // smallest multiple at top

while (N &gt;= top of heap) {
// There may be a multiple that is too small.
  while (N &gt; top of heap) {
    // There is a multiple that is too small.
    increment top of heap
  } // No multiple is too big.
  if (N == top of heap) {
    // N is not a prime; try next N.
    N = N+2;
    // No multiple is too big.
  }
}
// No multiple is too big and N &lt; top of heap
// Therefore N is prime.</pre>
<p>C++ code:</p>
<pre>#include &lt;iostream&gt;
#include &lt;vector&gt;
#include &lt;algorithm&gt;
#include &lt;utility&gt;

using namespace std;
typedef unsigned nat; // natural number
typedef pair&lt;nat,nat&gt; pnn;

bool gt(pnn x, pnn y) {
  return x.first &gt; y.first;
}
void primes(nat table[], nat n, nat lim) {
// place first n primes in table[0..n-1]
// ensure p[n-1] &lt;= lim
  table[0] = 2;
  table[1] = 3;
  vector&lt;pnn&gt; heap; vector&lt;pnn&gt;::iterator i;
  nat max = 0; // max prime for heap
  while (max*max &lt;= lim) max++;
  heap.push_back(pnn(2*2,2));
  heap.push_back(pnn(3*3,3));
  push_heap(heap.begin(), heap.end(), gt);
  nat N = 5, count = 2;
  while (count &lt; n) {
    while (N &gt;= heap.begin() -&gt; first) {
      while (N &gt; heap.begin() -&gt; first) {
        // increment least multiple by corresponding prime
        pop_heap(heap.begin(), heap.end(), gt);
        vector&lt;pnn&gt;::iterator last = heap.end()-1;
        last -&gt; first +=  last -&gt; second;
        push_heap(heap.begin(), heap.end(), gt);
      }
      if (N == heap.begin() -&gt; first)
      // N not a prime; try next candidate
        N+=2;
    }
    // N &lt; heap.begin() -&gt; first
    // N is prime
    table[count++] = N;
    if (N &lt;= max) {
      heap.push_back(pnn(N*N,N));
      push_heap(heap.begin(), heap.end(), gt);
      // new pair absorbed in heap
    }
    // N is a prime; try next candidate anyway
    N += 2;
  }
}
</pre>
<p>Thanks to Paul McJones and to Tomas Bednar for pointing out an error in the first posted version.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/366/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/366/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/366/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/366/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/366/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/366/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/366/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/366/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/366/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/366/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/366/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/366/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/366/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/366/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=366&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2011/01/15/another-scoop-by-dijkstra/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>Unprotected Numerical Computation and Its Safe Alternatives</title>
		<link>http://vanemden.wordpress.com/2010/12/11/unprotected-numerical-computation-and-its-safe-alternatives/</link>
		<comments>http://vanemden.wordpress.com/2010/12/11/unprotected-numerical-computation-and-its-safe-alternatives/#comments</comments>
		<pubDate>Sun, 12 Dec 2010 04:57:36 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=346</guid>
		<description><![CDATA[Conventional numerical computation is marvelously cheap, and it seems to work most of the time. The computing profession has long neglected to develop methods appropriate for those situations where cheapness is not an overriding concern and where &#8220;seems to work most of the time&#8221; is not good enough. Many textbooks in numerical analysis start by [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=346&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Conventional numerical computation is marvelously cheap, and it seems to work most of the time.  The computing profession has long neglected to develop methods appropriate for those situations where cheapness is not an overriding concern and where &#8220;seems to work most of the time&#8221; is not good enough.</p>
<p>Many textbooks in numerical analysis start by showing that in certain situations the small errors caused by the use of floating-point arithmetic can rapidly grow and render the result of the computation meaningless.  Such textbooks describe conditions under which mathematical algorithms can safely be transplanted to floating-point hardware: that the problem be well-conditioned and that the algorithm be stable.  In the early days computing centres employed numerical analysts to make sure that none of the scarce processor cycles got wasted on meaningless results caused by ill-conditioned problems or unstable algorithms.</p>
<p>Much has changed since then.  Thousands of scientists and engineers have on their desks gigaflops of computing power, and there is not a numerical analyst in sight.  Although an even larger part of problem-solving is entrusted to the computer, the same fragile methodology is followed.  And still the only justification is that conventional numerical computation seems to work most of the time.</p>
<p><span id="more-346"></span></p>
<p>Robust alternatives have existed for decades.  These are not magical cures for ill-conditioned problems and unstable algorithms.  They are robust in the sense that it is impossible, or near-impossible (depending on the alternative), that an incorrect result is returned without a clear warning.  A responsible response in such a case is to critically examine the model (is it well-conditioned?) and the algorithm (is it stable?).</p>
<p>Such robustness comes at the price of increased demands on the processor(s) and memory.  So far practitioners have invariably preferred to use the abundant increase in computing capacity for running the same problem with a finer mesh rather than for the security attainable by robust methods.</p>
<p>In science this preference is not likely to change: by nature, scientists feel at home with Rube Goldberg devices held together by sealing wax and string.  Engineers operate in a different world.  In a court of law, a professional engineer can only appeal to his &#8220;best effort&#8221; in using software that is not worse than what everybody else uses.  In turn, these widely accepted packages can only appeal to being best efforts in selecting algorithms that are numerically stable.  It is up to the user to select an algorithm appropriate to the problem and to ensure that the problem is well-conditioned.  Even for a numerical analyst this is sometimes hard to tell; harder than solving the problem itself.  The time may eventually come that these facts penetrate to the engineer&#8217;s mind.</p>
<p>The above should not be a revelation.  A promotional document of Intel says it well:</p>
<blockquote><p><em> The standard doesn&#8217;t provide a guarantee that the answers are the &#8220;right&#8221; answer. That may require careful error analysis. However the standard provides many mechanisms that make it easier for a program to get the &#8220;right&#8221; answer on conforming computers from different vendors. &#8212; John Crawford [<a href="#intel">1</a>] </em></p></blockquote>
<p>Like many manufacturers who warn that you &#8220;<em>may</em> need to consult your physician&#8221; for safe use of the product, Intel says that your problem &#8220;<em>may</em> require careful error analysis&#8221;.  Yes, and who can tell you whether your problem actually does fall into this dread category?  That would have to be one of those rare and expensive numerical experts.</p>
<p>Before we continue, let me try to state briefly what I mean by &#8220;conventional numerical computation&#8221;.  Let&#8217;s say we are modeling an equilibrium state.  It could be the state of a petroleum refinery or of an aircraft in flight.  Although these are dynamical systems, suppose we are interested in the static state of equilibrium.  The state variables have to satisfy a system of equations (let&#8217;s say, nonlinear algebraic equations), and we need to solve the system.  Unless the system is trivial, an algorithm for solving it will be iterative.  That is, the algorithm itself is a <em>dynamical system</em>.  For the algorithm to be considered at all, it has to be a dynamical system that evolves by its own dynamic law to an equilibrium state, and this needs to coïncide with the solution of the system of equations.</p>
<p>This is what numerical analysts do for a living: invent such dynamical systems in the form of algorithms.  In our example, this could be the n-dimensional version of Newton&#8217;s method.  A proper textbook will not only describe the algorithm, but also conditions under which it will converge.  What&#8217;s missing from the conditions is the assumption that holds for most of the entire textbook: all variables range over the real numbers and all operations are the ones defined for the real numbers.  That is, the algorithm itself is an <em>abstract</em> dynamical system; its properties hold in the abstract setting.  The properties have been proved with some manipulation of formulas.  In this manipulation it is taken for granted that certain laws, like associativity of addition and multiplication, hold. And they do: the proof is given for the abstract dynamical system.</p>
<p>So far so good.  Now comes the sleight of hand: every real variable is replaced by a floating-point variable.  Moreover, every <em>operation</em> on the reals is replaced by &#8230;  by what? That&#8217;s hard to say more exactly than: the operation on floating-point numbers that <em>has the same name</em> as one of the operations on the reals.  This is what you are up to: prove that the resulting concrete dynamical system, the one that takes the computer through its paces, converges, and converges to a state near to the state of its abstract counterpart.</p>
<p>In the most favourable case, the abstract dynamical system has been proved correct.  Most people don&#8217;t even realize that what goes on in the computer is not what has been proved correct.  That it is not just a matter of the abstract dynamical system&#8217;s variables being jiggled a bit by rounding, but where even the operations themselves are perturbed.  This is what I call <em>unprotected numerical computation</em>.  The fact that it works most of the time is a bloody marvel.</p>
<p>Before this is getting too long, I should do two things.  One is to show some concrete examples of computations going off the rails.  The other is to give you some idea of safe alternatives to unprotected numerical computation.</p>
<p>All users appreciate that answers can be off because rounding errors, though small, can accumulate.  Some reason like &#8220;My computation takes about half an hour, say a thousand seconds. At a gigaflop per second that adds up to 10^12 operations.  Double-length arithmetic gives 10^-15 relative error.  Hmm &#8230; could I get a total relative error of 10^-3?  Not good &#8230; but wait!  These errors cancel each other most of the time, so I&#8217;ll end up not much worse than 10^-15 relative error&#8221;.  As chapter one of the numerical analysis textbook explains, this error model is not satisfactory.  Appendix <a href="#appA">A</a> shows an example (attributed to S. Rump of Hamburg) where a few <em>dozen</em> double-length operations are enough to wreak total havoc.</p>
<p>What gives away the evil expression in Appendix A is the fact that the results for single length and double length are different.  However, as Parker [<a href="#mca">2</a>] shows, this is no guarantee against error.  See Appendix <a href="#appB">B</a>.</p>
<p>Are there alternatives to the Conventional Method?  I&#8217;m aware of two: interval arithmetic [<a href="#baker">4</a>] and Monte Carlo arithmetic [<a href="#mca">2</a>, <a href="#mca1">3</a>].  Let&#8217;s start with the latter.  Remember that what motivated the competing computer manufacturers to collaborate on a standard for floating-point arithmetic was that their products gave different answers for the same problem while none of the manufacturers was confident that their own were any better.  As long as everyone&#8217;s results were the same, all would be well, at least on the sales front.</p>
<p>The big boys in numerics in pre-standard time were CDC, IBM, and Cray.  All three had bad floating-point arithmetic compared with the standard to come, and bad in different ways. At least theoretically it was possible for an inmate of an IBM installation to go over to the neighbours with a CDC and ask for the favour of running the program.  If the results didn&#8217;t differ too much, then you could be reasonably confident that the results were not far from correct.  But now your IBM is IEEE standard and whatever the neighbours have, theirs is IEEE standard too.  To the industry&#8217;s great relief, it speaks with One Voice.</p>
<p>Admittedly, running over to the neighbours for a check is not practical.  And, as Intel&#8217;s Palmer said in the above quote, the standard makes it easier to develop safe alternatives to unprotected computation.  Monte Carlo arithmetic [<a href="#mca">2</a>,<a href="#mca1">3</a>] is a case in point.  It uses the idea of running over to the neighbours for a check, but thanks to the standard, you can get a mild version of a different arithmetic right in your own processor, even without interrupting your computation.  Monte Carlo arithmetic makes use of the standard&#8217;s provision to change rounding mode under program control while the program is running.  Repeated runs can then be regarded as a sample and analysed statistically.  I count this as a safe alternative because an unduly high standard deviation casts doubt on the validity of the result.</p>
<p>The other safe alternative is interval arithmetic [<a href="#baker">4</a>].  Here one operates on intervals of numbers rather than on individual numbers.  Let [x,y] assume x not greater than y and let it stand for the interval of all numbers from x to y.  Then, for example, interval subtraction turns out to be [a,b]-[c,d] = [a-d,b-c].  The bounds a-d and b-c in the result are computed in floating-point arithmetic and are in general subject to rounding error.  The standard makes it possible to compute a-d in rounding mode towards minus infinity and b-c in rounding mode towards plus infinity.  Rounding introduces uncertainty about a result.  By picking the rounding modes in this way, we are assured that, in spite of the unavoidably introduced uncertainty, the result interval contains all possible values.  In this sense interval arithmetic is a safe alternative.</p>
<p>Of course, in a problematic problem, neither Monte Carlo nor interval arithmetic can give you the solution.  But either will tell you that you have a problem.  In Monte Carlo arithmetic in the form of an excessive standard deviation; in interval arithmetic in the form of an excessively wide interval.  For example, interval evaluation of function f1 in Appendix A results in an interval with a width in the order of 10^21; a clear warning, if nothing else.</p>
<p>The excessive width is due to catastrophic cancellation of figures.  Suppose we do not have significant cancellation, but the typical large number of operations that we have our gigaflops for.  Then the accumulation of the small rounding errors does become a problem, because in interval arithmetic they are all in the same direction for each of the bounds.  As a consequence, the result interval, though correct, is much too wide.  It is so wide because it reflects the (unlikely) worst case of the unprotected version of the same computation.  In this respect Monte Carlo arithmetic is better: it gives a single result with an error under statistical control.</p>
<p>I would not recommend merely intervalized versions of conventional algorithms.  Interval arithmetic is of great interest, but only because of the algorithms it makes possible that are not even dreamt of in conventional computation.  For example, algorithms that give intervals for all zeros of a system of nonlinear equations; algorithms for the hardest case of mathematical programming: constrained global optimization for objective functions with large numbers of local extrema.  When used with the variant of interval arithmetic that is known as <em>interval constraints</em> [<a href="#numerica">5</a>] one can even mix in integer variables in the type of optimization just mentioned.</p>
<p>
It is time to dump the fiction that we can perform the error analysis that would be necessary to justify unprotected numerical computation. The safe alternatives are safe in that they warn of trouble and that they can be relied on in the absence of warning. To get this certainty we have to live with the possibility that the safe method gives a warning while unprotected execution gives a correct result.</p>
<h3>References</h3>
<p>[<a name="intel">1</a>] www.intel.com/standards/floatingpoint.pdf<br />
[<a name="mca">2</a>] &#8220;Monte Carlo Arithmetic: exploiting randomness in floating-point arithmetic&#8221; by D. Stott Parker, Research Report CSD-970002, UCLA Computer Science Department, 1997.<br />
[<a name="mca1">3</a>] &#8220;Monte Carlo arithmetic: how to gamble with floating-point and win&#8221; by D. Stott Parker, Brad Pierce, and Paul R. Eggert; Computing in Science and Engineering, Vol. 2, No. 4, pp. 58-68, July/August 2000.<br />
[<a name="baker">4</a>] <em>Introduction to Interval Analysis</em> by R.E. Moore, R.B. Kearfott, and M.J. Cloud; SIAM, 2009.<br />
[<a name="numerica">5</a>] <em>Numerica: A Modeling Language for Global Optimization</em> by P. Van Hentenryck, Y. Deville, L. Michel; MIT Press, 1997.</p>
<h3>Acknowledgements</h3>
<p>Thanks to Dr E.J. van Kampen of the Technical University in Delft for checking out some results in interval arithmetic.</p>
<p><a name="appA"></a></p>
<h3><a name="appA">Appendix A</a></h3>
<p>In the C++ program below the function f1 requires a few dozen floating-point operations. The expression in f1 happens to simplify to the one in function f0. If the two functions give different results, as they do here, then the latter is more likely correct.</p>
<pre>#include &lt;stdio.h&gt;

template&lt;class flpt&gt;
flpt f1(flpt x, flpt y) {
  flpt x2 = x*x;
  flpt y2 = y*y, y4 = y2*y2, y6 = y2*y4, y8 = y4*y4;
  return (333.75-x2)*y6 + x2*(11*x2*y2-121*y4-2) +
         5.5*y8 + x/(2*y);
}
template&lt;class flpt&gt;
flpt f0(flpt x, flpt y) { return (-2+x/(2*y)); }
int main() {
  printf("%e  %e  %e\n",
         f1&lt;float&gt;(77617,33096),
         f1&lt;double&gt;(77617,33096),
         f0&lt;double&gt;(77617,33096)
        );
}
/* Output:
-2.341193e+29  1.172604e+00  -8.273961e-01
*/</pre>
<p><a name="appB"></a></p>
<h3><a name="appB">Appendix B</a></h3>
<pre>#include &lt;iostream&gt;
#include &lt;iomanip&gt;
using namespace std;

template&lt;class flpt&gt;
flpt f(flpt x0, flpt x1) {
  return (111.0 - 1130.0/x1 + 3000/(x0*x1));
}

int main() {
  cout &lt;&lt; setiosflags(ios::fixed) &lt;&lt; setprecision(8);
  float f0, f1, tempf; double d0, d1, tempd;
  const int N = 30;
  f0 = d0 = 2;
  f1 = d1 = -4.0;
  for (int i = 1; i &lt;= N; i++) {
    tempf = f1; tempd = d1;
    f1 = f&lt;float&gt;(f0,f1); d1 = f&lt;double&gt;(d0,d1);
    f0 = tempf; d0 = tempd;
  }
  cout &lt;&lt; f1 &lt;&lt; "  " &lt;&lt; d1 &lt;&lt; endl;
}
/* Output:
100.00000000  100.00000000
*/</pre>
<p>This C++ program computes the same short iteration in both single and double length, with good agreement. However, the true value is</p>
<p>990176025870222717970867 / 164874117215934539909207 =</p>
<p>6.0056484877&#8230;</p>
<p>See Parker [<a href="#mca">2</a>].</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/346/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/346/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/346/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/346/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/346/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/346/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/346/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/346/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/346/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/346/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/346/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/346/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/346/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/346/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=346&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2010/12/11/unprotected-numerical-computation-and-its-safe-alternatives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>A Standard of Excellence</title>
		<link>http://vanemden.wordpress.com/2010/12/08/a-standard-of-excellence/</link>
		<comments>http://vanemden.wordpress.com/2010/12/08/a-standard-of-excellence/#comments</comments>
		<pubDate>Thu, 09 Dec 2010 04:56:37 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=338</guid>
		<description><![CDATA[When an industry-wide standard gets established, it is often lamented as a regression to the mediocre, if not outright bad. VHS vs Sony Beta comes to mind. This time, however, I come to praise a standard, not to bury one. I am going to talk about a success from that era: IEEE standard 754 for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=338&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When an industry-wide standard gets established, it is often lamented as a regression to the mediocre, if not outright bad. VHS vs Sony Beta comes to mind.  This time, however, I come to praise a standard, not to bury one.  I am going to talk about a success from that era: IEEE standard 754 for binary floating-point arithmetic.</p>
<p><span id="more-338"></span></p>
<p>&nbsp;</p>
<p>Though the standard was only ratified in 1985, most of the work had been done several years before.  Up till the late 1970s the computing community was resigned to the situation in which the manufacturers that dominated the market for scientific computers fielded processors with execrably bad implementations of the main tool: floating-point arithmetic.  Users had to live with &#8220;features&#8221; such as:</p>
<ul>
<li> Numbers that tested as nonzero in a comparison, but behaved as zero in division.</li>
<li> Numbers that underwent non-negligible change when multiplied by 1.0.</li>
<li> X-Y could evaluate to zero for different X and Y.</li>
</ul>
<p>The whole idea of floating-point arithmetic was that you could just replace the real variables in a mathematically verified algorithm by floating-point numbers.  In reality programmers had to insert code like X := (X+X)-X; at critical spots [<a href="#oldMan">1</a>].</p>
<p>At the time there was no shortage of experts who could tell what was wrong and how to improve it.  One of the best was William Kahan at Berkeley, who was in fact consulting for IBM, with little effect on the egregious shortcomings of their processors, as quality of floating-point processing was not regarded as a competitive advantage.  The major players at the time, IBM and CDC, both had bad floating-point arithmetic.</p>
<p>Here we have a glimpse of the situation:</p>
<blockquote><p>The [CDC] 6600 was the first commercial supercomputer, outperforming everything then available by a wide margin. While expensive, for those that needed the absolutely fastest computer available there was nothing else on the market that could compete. When other companies (namely IBM) attempted to create machines with similar performance, he [Seymour Cray] increased the challenge by releasing the 5-fold faster CDC 7600.  [<a href="#wikipedia">2</a>]</p></blockquote>
<p>As you can see, <em>speed</em> was all that matters.  Speed was something that the VPs in charge of multimillion dollar purchases could understand.  They neither knew nor cared what was done at that speed.</p>
<p>To find state-of-the-art floating-point arithmetic one had to look elsewhere, in the unlikely direction of a cheap mini-computer, in this case the PDP-11 from DEC.  To the relief of those who cared about quality in computer arithmetic, the subsequent VAX (bigger and faster) had arithmetic that was no worse.  But the big players like CDC and IBM, and, by this time, also Cray Computer, felt no need to do anything about their deficient floating-point arithmetic.</p>
<p>While DEC could safely be ignored around 1980, Intel was not even noticed as a potential source of processors.  One of the few companies that took Intel seriously was Intel.  They planned a floating-point co-processor and retained Kahan as consultant.  He advised them to adopt the specifications of the arithmetic of the DEC VAX.  This was state-of-the-art, and much better than that of the big players. It was also insanely ambitious for a single chip with a mere 40,000 transistors, which is what Intel had available.</p>
<p>However, Intel declined to follow Kahan&#8217;s advice.  They did not want the state of the art: they want the <em>best possible</em>.  Kahan, working with J. Coonen and H. Stone, produced a specification of a floating-point arithmetic that was the experts&#8217; dream.  By the early eighties, when the dinosaurs still roamed the earth [<a href="#cray">4</a>], Intel had implemented this in 40,000 transistors.</p>
<p>By 1985, when the IEEE standard 754 was finally ratified, there were half a dozen implementations on the market, ominously for the then supercomputers, all these were single-chip CMOS (co-)processors.</p>
<p>In this success story Kahan [<a href="#oldMan">1</a>] and other participants emphasize the altruism displayed by their colleagues.  The IBM representative was supportive, even though the products of his company at the time were nowhere near compliant.  But I can&#8217;t help taking a less rosy view of the situation: all participants had a vested interest in avoiding the situation that the same numerical program would give different results on their different computers.  I can&#8217;t help thinking that they were more worried about results <em>seen</em> to be wrong than about them actually <em>being</em> wrong.  As long as the machines wouldn&#8217;t contradict each other, all would be fine.</p>
<p>A program giving significantly different results on different machines is likely to be unsound numerically; it may well be that the different results are all wrong, including the result obtained by an IEEE-754 compliant machine.  Such a program needs to be analyzed by an expert and rewritten.  This would be expensive. The &#8220;old anarchy&#8221;, much deplored by those who celebrate IEEE 754, could actually be a valuable warning that the algorithm underlying the program is not stable or that the problem is ill-conditioned.  Not welcome news.  The Brave New World of IEEE 754 prevents such warnings from ever arising.</p>
<p>The Standard Methodology is:</p>
<blockquote><p>Design an algorithm in terms of real numbers that is stable. Replace the reals by floating-point numbers; replace operations on reals by the ones on the floating-point numbers that have the same name.  Apply to a model that is numerically well-conditioned.</p></blockquote>
<p>The problems with this are (a) that the stability depends on properties of the reals (like associativity of operators) that their floating-point counterparts do not have and (b) that it is often not easy, even for an expert, to determine whether a model is well-conditioned.  For example, for some computations the condition depends on a certain eigenvalue of a matrix.  And it may be that determining that eigenvalue is harder than the problem you want to solve in the first place.</p>
<p>Thus, a valuable supplement to the Standard Methodology may be to run your program program on a CDC 6600 as well as on an IBM 360, both bad, and in different ways.  If the results do not differ significantly, then this can be used to boost confidence.  Otherwise, it should be &#8220;back to the drawing board&#8221;.  Intel, in a document celebrating their heroic role in IEEE 754, includes a very apt warning:</p>
<blockquote><p>&#8220;The standard doesn&#8217;t provide a guarantee that the answers are the &#8220;right&#8221; answer. That may require careful error analysis. However the standard provides many mechanisms that make it easier for a program to get the &#8220;right&#8221; answer on conforming computers from different vendors.&#8221; &#8212; John Crawford [<a href="#intel">3</a>]</p></blockquote>
<p>Yes, and you need an expert for the required careful analysis.  Every engineer has gigaflops on his desktop, but there&#8217;s not an expert in sight.</p>
<p>Enough of these dark thoughts.  What is so wonderful about IEEE 754?  Admirers of the standard may give different answers; here I&#8217;ll give mine.</p>
<ul>
<li> The reasonable accuracy achievable by a guard digit, which had been known for decades, but ignored by most of the industry, is guaranteed.  The guarantee can be expressed as: if the floating-point result of an operation coincides with the real-valued result, then that is the result; otherwise it is one of the bounds of the smallest floating-point interval containing the real-valued result.  And you can pick which of the bounds you want; see next.</li>
<li> Rounding mode is not fixed.  It can be upward, downward, towards zero, or to nearest.  This can be changed at runtime.</li>
<li> One of the quirks of number systems is that you have to have an asymmetric collection of numbers if you, correctly, want a single zero.  Then you necessarily have a zero in the middle and symmetrically on either side the positive and negative numbers.  If you want symmetry, then you must have either no zeroes, or two, a positive and negative one.  This latter option seems to be inherent in the notion of a floating-point number system, so it&#8217;s not surprising that IEEE 754 has a positive and a negative zero.But the standard provides for the natural counterparts of the zeros, which are the infinities, positive and negative, so that division by a zero of a nonzero finite number gives you infinity of the right sign and so that division of such a number by infinity gives the zero of the right sign.  In this way computation can continue after division by zero if that makes sense in your model, or can be trapped if not.</li>
</ul>
<p>In a future post I hope to be able to say something about the Standard Methodology of numerical computation and its alternatives.  In the <a href="#appndx">Appendix</a> I present in 27 lines the salient details of IEEE 754.  I feel that these are also the most important details of IEEE 754-2008, the current version.</p>
<h3>References</h3>
<p>[<a name="oldMan">1</a>] http://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html<br />
[<a name="wikipedia">2</a>] http://en.wikipedia.org/wiki/Seymour_Cray<br />
[<a name="intel">3</a>] www.intel.com/standards/floatingpoint.pdf<br />
[<a name="cray">4</a>] The last of Seymour Cray&#8217;s companies, Cray Computer Corporation, filed for bankruptcy in 1995 (from Wikipedia article &#8220;Seymour Cray&#8221;).  Behind the current &#8220;Cray&#8221; computers is a succession of other companies that passed on the rights to the &#8220;Cray&#8221; marque.  <a name="appndx"><br />
</a></p>
<h3><a name="appndx"> Appendix </a></h3>
<p><a name="appndx"> </a> By now the 1985 IEEE 754 has been replaced IEEE 754-2008.  It&#8217;s a fat wad of paper not particularly encouraging for those whose curiosity has been aroused by the story so far.  Apart from some niggling details, the original is a subset of the current standard.  It so happens that I advocate a model of computing, about which more later, that allows one even to ignore most of the much smaller old standard.  All that I want to know about floating-point arithmetic is the part of the standard described in the following few lines of ASCII-art, lifted straight out of my code:</p>
<pre>/* Precis of the IEEE standard 754 format
Layout of fields:
Single-length  31-31: sign  30-23: exponent  22-0: significand
Double-length  63-63: sign  62-52: exponent  51-0: significand
Least significant bit: 0
Most significant bit: 31 (single-length), 63 (double-length)

Single-length                Double-length               Value
------------------------   ---------------------------
 exponent    significand   exponent        significand
U     U-Bias              U      U-Bias
=====================================================================
0     -127        0       0      -1023         0         0
0     -127      nonzero   0      -1023        nonzero   see Denormalized
1-254 -126-+127 anything  1-2046 -1022-+1023  anything  see Normal
255   +128       0        2047   +1024         0        infinity
255   +128      nonzero   2047   +1024        nonzero   NaN

Sign bit:  1 ~ negative    0 ~ positive
Significand: the bits after the binary point
Bias:    single-length, 127       double-length, 1023
Emin:    single-length, -126      double-length, -1022
Columns U: value of exponent field as unsigned integer
Columns U-Bias: as above, with Bias subtracted
Normal: (1+0.significand)*2^(exponent-bias)
Denormalized: (0.significand)*2^Emin
*/</pre>
<p>The single-length format occupies 32 bits, numbered from 31 for the most significant to 0 for the least.  Double length has 64 bits, numbered similarly.  Emin is the least value of the exponent.  Bias is the number to be subtracted from the exponent field (interpreted as unsigned number) to get the exponent.</p>
<p>The exciting thing about this little table is that it lets you have a peek under the hood of the computer on your desk.  Let&#8217;s have a look at what one tenth looks like in IEEE 754.  In the following C program one tenth in single-length floating-point is aliased to an unsigned integer and then printed out in hexadecimal.  This way we get to see the bits, which we then interpret with above table.</p>
<pre>#include&lt;stdio.h&gt;

int main() {
union fltInt {float F; unsigned U;} x;
x.F = 0.1;
printf("%f %x\n", x.F, x.U);
}
/* Output:
0.100000 3dcccccd
*/</pre>
<p>Hm &#8230; can we make sense of 3dcccccd?  First lay it out in bits:</p>
<pre> 3    d    c    c    c    c    c    d
0011 1101 1100 1100 1100 1100 1100 1101</pre>
<p>Then re-arrange according to single-length floating-point format:</p>
<pre>sign  exponent    significand
0     01111011    10011001100110011001101</pre>
<p>To make sense of the significand (the fractional part after the binary point) we have to realize that what we see here is <em>not</em> the one that belongs to one tenth, because you cannot represent it as a binary numeral.  As one tenth is a rational number, it is an infinitely repeating binary numeral.  As ten only has small prime factors, there have to be several repetition cycles here.  Hence 3dcccccd has to be the rounded version of</p>
<pre>sign  exponent    significand
0     01111011    10011001100110011001100110011001...ad inf</pre>
<p>The sign bit is 0, which means positive (see table).  The exponent is 123 as unsigned integer; -4 after subtracting the bias of 127 (see table).  According to formula Normal in the table, the value is</p>
<pre>    + 2^{-4} * 1.10011001100110011001100110011001...ad inf
=   + 2^{-4} * (1 + (1 + 9/16 + 9/(16^2) + ...)/16)
=   + 2^{-4} * (1 + 3/5)
=   + 1/10.</pre>
<p>The infinite binary expansion of 1/10 is between 3dcccccc and 3dcccccd.  It is closer to the latter.  The default rounding mode is &#8220;round to nearest&#8221;.  As the program did not change the rounding mode, we should find 3dcccccd, which we did.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/338/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=338&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2010/12/08/a-standard-of-excellence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>From Formulas to Algorithms</title>
		<link>http://vanemden.wordpress.com/2010/11/29/from-formulas-to-algorithms/</link>
		<comments>http://vanemden.wordpress.com/2010/11/29/from-formulas-to-algorithms/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 05:18:00 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=295</guid>
		<description><![CDATA[Before computers, computation went like this: a scientist invented a formula; the user plugged in suitable values and evaluated the resulting variable-free expression. The user typically did not have the expertise to judge the correctness of the formula. Though the compilers of formula books tried to cover all important cases, it was common that a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=295&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Before computers, computation went like this: a scientist invented a formula; the user plugged in suitable values and evaluated the resulting variable-free expression.  The user typically did not have the expertise to judge the correctness of the formula.  Though the compilers of formula books tried to cover all important cases, it was common that a user failed to find one for the situation at hand.  In such compilations one can find formulas for Annuity Whose Present Value Is One, Return on Investment, Moment of Inertia (for circular sheet, hollow circular cylinder, and dozens of other shapes), Curved Surface of Cone, Net Present Value, and many others (but maybe not the one you need right now).</p>
<p>Initially, computers reflected the age of the formula: FORTRAN comes from FORmula TRANslator. But FORTRAN was found more useful for writing algorithms in which formulas played a minor role.   Used in the right way, an algorithm can be self-explanatory in a way that the old-style formulas were not.  The formulaic paradigm was authoritarian in the sense that the user did not get an explanation of the formula found in the book.  Thus the algorithmic paradigm is less authoritarian: access to the source code implies access to an explanation, not of a formula, but of the number you are getting.  Yet relics from the age of formulas linger.  In this article I discuss two of these and use them to illustrate the power of the algorithmic approach.</p>
<p><span id="more-295"></span></p>
<p>Calendrical calculations are, more than those in other areas, encrusted in ancient lore and antiquated methods.  Such anachronisms were the motivation for Dershowitz and Reingold to write <a href="#DandR"><em>Calendrical Calculations</em></a>.  They set out to make the subject computational and use an algorithmic language.  Yet they remain to some extent under the thrall of the formulaic paradigm.  For example, for the standard (&#8220;Gregorian&#8221;) calendar, they derive from more complex formulas, after considerable algebraic manipulation, that the number of days in months 1,&#8230;,m-1 is given by the floor of (367m-362)/12. To prevent all this algebra becoming more complicated, the assumption was made that February always has 30 days, so this still needs to be corrected later.</p>
<p>Let us compare this approach with the algorithmic alternative.  To start with, the formula for the number of days in the completed months before a given date was needed as stepping stone to the serial number of the day in the year, starting with January 1 as day 1. In the algorithmic approach we might as well tackle that directly, for example with:</p>
<pre>int dayNum(int d, int m, int y) {
  while (m &gt; 1) { m--; d += monthLen(m, y); }
  return d;
}</pre>
<p>My claim is that this is easier to explain than the formula.  By &#8220;explain&#8221; I don&#8217;t mean a mathematical correctness proof.  Many pages of formalism have been expended on justifying such code mathematically, with formally defined Variants and Invariants.  The formalism tends to be more forbidding than the algebra of <em>Calendrical Calculations</em>.  No, the power of the algorithmic approach is that the thinking behind the algorithm does not need to be formalized. For a programmer the justification does not even become conscious as the code is written.  If the reader needs help, it can be given in the form of prose in the vein of:</p>
<blockquote><p>The sum of d and the number of days in the months 1,&#8230;,m-1 is constant.  Before entry into the loop this constant is the number that we want the function to compute.  The loop maintains the value of the constant while driving m down.  As the loop exits we have m equal to one, so that d is equal to the constant.</p></blockquote>
<p>Of course the algorithm needs a function for the lengths of the months.  Our grandparents (as did <em>their</em> grandparents before them) remembered these by:</p>
<blockquote><p>Thirty days hath September,<br />
April, June, and November;<br />
All the rest have thirty-and-one,<br />
Save February, with twenty-eight days clear,<br />
And twenty-nine in each leap year,</p></blockquote>
<p>which is almost C:</p>
<pre>    int monthLen(int m, int y) {
      switch (m) {
        case 4: case 6: case 9: case 11: return 30;
        case 2: return leap(y) ? 29 : 28;
        default: return 31;
      }
    }</pre>
<p>Of course we still need a function for telling leap years.  Even after deciding to use C rather than algebra, we have a rearguard action to fight.  In textbooks one often finds something like</p>
<pre>int leap(int y) {
  return (y%400 == 0) || (y%4 == 0 &amp;&amp; y%100 != 0);
}</pre>
<p>Whether it is helpful to express it this way depends on how one remembers the rule.  A common way to remember it is as a rule with exceptions, which in turn have exceptions.  To be specific: a year is normally not a leap year, except when its number is divisible by 4, but it&#8217;s normal again when divisible by 100, except when divisible by 400.  The following is a direct translation of this way of remembering the Gregorian calendar:</p>
<pre>   int leap(int year) {
     if (year%400 == 0) return 1;
     if (year%100 == 0) return 0;
     if (year%4 == 0) return 1;
     return 0;
   }</pre>
<p>This completes what I want to say about an algorithmic alternative to that particular formula of <em>Calendrical Calculations</em>.  That formula at least has an easily accessible justification.  In my next example this is not the case, though the temptation to use a formula is more understandable.  Googling &#8220;Zeller&#8217;s Congruence&#8221; gives evidence of the use in programs of a formula as a quick and easy way to determine the day of the week for a given date in &lt;day,month,year&gt; format.</p>
<p>In the 19th century Christian Zeller [<a href="#G">G</a>] invented such a formula to compute the day of the week with y the year, m the month, and d the day of the month:</p>
<pre>    ((d+(((m+1)*26)/10)+k+(k/4)+(j/4))-(2*j))%7.</pre>
<p>Here k = y%100 and j = y/100; all divisions are integer divisions.  The value of the formula represents Saturday as 0, and the other days of the week accordingly.  (Actually, the month and year may need to be tweaked a bit; see the code further on.) As the formula maps an infinity of dates to a finite domain of seven values, it can be thought of as establishing an <em>equivalence</em> relation among dates.  It&#8217;s not clear where &#8220;congruence&#8221; comes from, but that is what it seems to be called.</p>
<p>Zeller&#8217;s formula represents the formulaic paradigm: it solves the problem with a minimum of computation, it is not easy to prove correct, and, I guess, it required considerable ingenuity to invent. In the formulaic paradigm this effort is locked up in a black box to be preserved as intellectual capital for future use. Its use requires mindless and obedient substitution of values followed by the prescribed arithmetical operations.</p>
<p>I propose to use algorithms in two ways, minor and major.  The minor one is to prove the correctness of Zeller&#8217;s formula; the major one is to do away with the formula and substitute an algorithm for it.  This substitution has the advantage that the algorithm is more intelligible than the formula.  It will turn out to have the disadvantage that more computation is needed to obtain the answer.  But the difference in computational requirement is not noticeable when a computer is used to execute the algorithm.</p>
<p>How can I hope to prove the correctness of Zeller&#8217;s formula when I despair of even understanding it? It turns out to be easy to show that only a finite number of dates need to considered and that it is, for contemporary computers, a negligible computational task to evaluate the formula for each of these dates.</p>
<p>To see that the checking of the formula can be reduced to a finite number of cases, it useful to view the rule for leap years in the Gregorian calendar as a system of nested cycles rather than nested exceptions:</p>
<ul>
<li> The innermost cycle consists of three normal years followed by a leap year; let&#8217;s call this a <em>quad</em>.  The length of the quad is 4*365+1, which is 1461 days.</li>
<li> Wrapped around the quads is the next cycle: the century.  It consists of 24 normal quads followed by one that is exceptional in being one day shorter. Let&#8217;s call this cycle a <em>cent</em>.  The length of the cent is 25*|quad|-1, which is 36524 days.</li>
<li> The outermost cycle consists of three normal cents followed by an exceptional one, which is one day longer. I call this cycle a <em>clav</em>, in honour of Clavius [<a href="#G">G</a>], the astronomer of Pope Gregory XIII.  Clavius was, I suspect, the brains behind the Gregorian calendar.  The length of the clav is 4*|cent|+1, which is 146097 days.</li>
</ul>
<p>Let me note, if I may indulge in a short digression, that this gives 146097/400 = 365.2425 days as the length of a year averaged over a clav, and, by the cyclical nature of the calendar, this is also the average over all eternity.  As the average length of the year is actually 365.2422 days, we see that Clavius, with his simple rule, managed to create a calendar that is, on average, only off by 0.0003 days per year: it takes around three thousand years to be off by a single day.</p>
<p>After this digression I am ready to state</p>
<p><strong>Theorem 1</strong>: The formula of Zeller gives the day of week for every day of the Gregorian calendar.</p>
<p><strong>Proof:</strong><br />
We first reduce the theorem, which makes an assertion for an infinite number of dates, to a finite number of special cases.  This can be done by observing that the length of a clav is divisible by 7. Hence two dates fall on the same day of the week if they differ only in the year and if this is in such a way that the difference in year numbers is divisible by 400 (you can call this Van Emden&#8217;s Congruence, if you must).  Thus the correctness of Zeller&#8217;s Congruence only needs to be checked for the 146097 days of a single clav.</p>
<p>In mathematics it is a well-known stratagem to make a proof easier by splitting into special cases, like n&lt;0, n=0, and n&gt;0.  But mathematicians take a dim view of, say, a <em>dozen</em> special cases.  When Appel and Haken [<a href="#G">G</a>] split the Four-Color Theorem into 1936 special cases (each to be checked with a computer program), there were lots of people who objected. You can imagine what these people will think of puny Theorem 1 and the need for <em>146097 cases</em> to be checked by a computer.</p>
<p>After noting this objection, the proof: as the program below terminates normally with as output a single line containing &#8220;146097&#8243;, the Theorem is proved.</p>
<pre>#include&lt;iostream&gt;

int Zeller(int day, int month, int year) {
  int d = day, m = month, y = year;
  assert(!(m&lt;3 &amp;&amp; year == 0));
  if (m  Saturday etc
  int h =
    ((d+(((m+1)*26)/10)+k+(k/4)+(j/4))-(2*j))%7;
/* From Wikipedia article "Zeller's Congruence"
   observed September 15, 2010.
   Convert from h to 1 ~ Monday, ..., 7 ~ Sunday
*/
  return (h+5)%7 + 1;
}
int proof() {
/* Returns the number of days for which the Zeller formula
   gives the true value as obtained by counting backwards
   from a known day. Fails at first case, if any, in which
   the formula does not give the true value.
*/
  int year0 = 1, year1 = 400;
  int dayOfWk = 7; // Dec 31, 400 was a Sunday
  int count = 0;
  for(int year=year1; year&gt;=year0; year--)
    for(int month=12; month&gt;=1; month--)
      for(int day = monthLen(month, year);
          day &gt;= 1; day--
         ) { assert(Zeller(day,month,year) == dayOfWk);
             dayOfWk--; if (dayOfWk &lt; 1) dayOfWk += 7;
             count++;
           }
  return count;
}
int main() {
  printf("%d\n", proof());
  return 0;
}</pre>
<p>My reason for preferring the algorithmic to the formulaic paradigm is that I find writing the proof a straightforward exercise, whereas understanding Zeller&#8217;s formula seems a daunting task to me. I trust that my readers will not find my code unduly difficult to understand and will find it less mysterious than Zeller&#8217;s formula. Alan Robinson has called for the need of  &#8220;intelligibility engineering&#8221;; I hope he accepts this as a contribution.</p>
<p>So far the <em>minor</em> application of the algorithmic paradigm in matters calendrical.  Minor, because the day of the week for distant dates belongs in <a href="#Smith">trivia-land</a>, like the storming of the Bastille happening on a Tuesday.  Perhaps a bit more significant is that Zeller&#8217;s formula gives Sunday for December 7, 1941.  The formula may also be used to assure someone born on December 13, 1973 that this day was not a Friday, so that subsequent mishaps have to be attributed to other causes.</p>
<p>My major example of demonstrating the algorithmic alternative to formulas is to <em>replace</em> Zeller&#8217;s formula by an algorithm.  The reason for doing so is that the algorithm, although occupying many more times the number of lines of the formula, is <em>intelligible</em>: it can be read straight through compared to the need to puzzle over the single line of the formula.</p>
<p>What is an algorithmic alternative to a single-line formula for determining the day of the week?  A promising starting point is the essence of a calendar: a <em> system to name the days</em>.  In an algorithmic approach, a calendar uses the fact that the days form a sequence. This suggests to use integers for the names.  In this way the calender does not only <em>name</em> the days, but does also <em>count</em> them.  And counts not only the days themselves, but the number of days between any two dates.</p>
<p>I follow this approach. The result is a building block that can be used for many calendrical computations.  The algorithm for numbering the days will be more intelligible than Zeller&#8217;s formula.  The algorithm will also more widely applicable, while the day of week is just one of many uses of converting days to integers.  Thus the algorithmic alternative to Zeller&#8217;s formula will take the form of a function that converts a date in &lt;day,month,year&gt; format to an integer.  As a side effect, knowing the day of the week of just one date (like, say, today) implies the equivalent of Zeller&#8217;s formula via the remainder on division by seven.</p>
<p>I follow Dershowitz and Reingold in calling the integer name of a day <em>rata diem</em> and I abbreviate it to rd.  These authors assign the rd of 1 to January 1 of year 1.  Perhaps the idea is that the symmetry that the integers have with respect to 0 can be used in the rd naming scheme.  This would be a promising approach, were it not for the fact that the Gregorian calendar, with its positive and negative years, is <em>not</em> symmetric: for positive years time moves away from the origin; for negative years it moves in the same absolute direction, which is <em>towards</em> the origin.  As computer integers have a limited range anyway, a computerized calendar has to have an earliest date as well as a latest date.  Accordingly, I use unsigned integers for the rata diem.  As 32 bits is a commonly available size of unsigned integers and gives a generous number of years into past and future, I choose this size.  As there is an even number of integers, there are two midpoints: 7fffffff and 80000000 in hexadecimal.  I choose the later one of these as the rd of January 1 of year 1.  I call this date the <em>epoch</em>, adopting another bit of calendrical terminology from [<a href="#DandR">DR97</a>].</p>
<p>Here is the first bit of the program:</p>
<pre>typedef unsigned uint;
const uint epoch = 0x80000000; // Rata Diem of January 1, 1 C.E.

const int yLen = 365;               // length of normal year
const int quadLen = 4*yLen + 1;     // length of normal quad
const int centLen = 25*quadLen - 1; // length of normal century
const int clavLen = 4*centLen + 1;  // length of clav</pre>
<p>Notice that I have switched to the C++ compiler.  Not because I want to do any object-oriented programming, but because of the minor civilities that this improved form of C affords.  Remember that I relied on my shaky pencil-on-back-of-the-envelope calculations to come to the conclusion that a clav has 146097 days? A man shouldn&#8217;t have to do that in this day &#8216;n age, but C forbids anything except a literal in the initialization of a constant.  (I didn&#8217;t actually mind using a pencil. The real reason is that I want to replace magic numbers by their provenance, as documentation).</p>
<p>A promising strategy for converting Rata Diem to &lt;day,month,year&gt; is to split the rd into the days coming from the completed years (if any) in the past and the days coming from the current year.  The latter number is computed by our old friend:</p>
<pre>int dayNum(int d, int m, int y) {
  while (m &gt; 1) { m--; d += monthLen(m, y); }
  return d;
}</pre>
<p>Function dayNum() was written with positive years in mind.  For negative years the completed years are greater than the current year and the days in the current year are not counted forward from the beginning, but backwards from the end of the year.  Rather than tweak dayNum() to a form that works in negative years as well as positive years, I use the fact that the Gregorian calendar is the same for every clav.  So I shift every date, irrespective of whether the year is positive, negative, or in between, to the clav that begins on January 1, 1 C.E.  and ends on December 31, 400 C.E.  I implement this shift by:</p>
<pre>
 while (year &lt; 1) {
    year += 400; rd -= clavLen;
  }
  while (year &gt; 400) {
    year -= 400; rd += clavLen;
  }
</pre>
<p>This takes linear time.  Yes, it could be done in constant time.  But the rule for modulo a negative number is less clear for me, so I prefer the slower alternative.  After all, the goal is intelligibility.</p>
<p>See the <a href="#appendix">Appendix</a> for a complete set of C++ functions.</p>
<p>Zeller&#8217;s formula, though unintelligible, has the advantage of being computationally efficient, more so than my algorithmic alternative.  In my experience the most extreme example of an unintelligible formula is Matijasevic&#8217;s polynomial [<a href="#G">G</a>] that is claimed to be a representation of the prime numbers.  Not only is it a mystery to all except the few initiated why the polynomial should do so, but it is also computationally inefficient in the extreme. The polynomial is of degree 25 and has 26 variables.  It is claimed to represent the prime numbers in the sense that, as the 26 variables range over the non-negative integers, all positive values of the polynomial are prime and vice versa.  Contrast this, for example, with Dijkstra&#8217;s prime-number generator [<a href="#DDH73">DDH72</a>], which is intelligible as well as reasonably efficient.</p>
<p>In this article I have concentrated on examples where algorithms are preferable to formulas.  I&#8217;m not forgetting that there is a whole world of programming that exists on the strength of the opposite: in functional programming one is forced to abstain from algorithms and to express everything in formulas.  Of course, the formulas allowed in functional programming form a less restrictive language than the one to which Zeller was limited.  A modern language of formulas is enriched with conditionals and with recursion.  These facilities introduce possibilities for explanation that were not open to Zeller.</p>
<h3>References</h3>
<p><a name="G">[1] Information on this was found with Google.</a></p>
<p><a name="DandR"> [2] <em>Calendrical Calculations</em> </a> by Nachum Dershowitz and Edward Reingold; Cambridge University Press 1997.</p>
<p><a name="DDH72"> [3] <em>Structured Programming</em></a> by O.-J. Dahl, E.W. Dijkstra, and C.A.R. Hoare; Academic Press 1974.</p>
<p><a name="Smith"> [4] </a> The ability to tell the day of week of distant dates occurs in several calculating prodigies; see <em>The Great Mental Calculators</em> by Steven Smith, Columbia University Press 1983.  Another study is an essay by Oliver Sacks (&#8220;The Twins&#8221;, in <em>The Man Who Mistook His Wife for a Hat</em>, Picador, 1986) describing twin brothers with this faculty.  Sacks reports (page 187) that the men had an IQ of sixty, could not do addition or subtraction with any accuracy, and could not comprehend the meaning of multiplication or division.  This rules out Zeller&#8217;s formula as their secret weapon.  In the Postscript on page 200 Sacks quotes one Israel Rosenfeld, apparently consulted for his supposedly superior knowledge of matters numerical:</p>
<blockquote><p>Their ability to determine the days of the week within an eighty-thousand-year period suggests a rather simple algorithm. One divides the total number of days between &#8216;now&#8217; and &#8216;then&#8217; by seven. &#8230;</p></blockquote>
<p>One wonders whether Mr Rosenfeld has tried out his &#8220;rather simple algorithm&#8221;.</p>
<h3><a name="appendix"> Appendix </a></h3>
<pre>#include&lt;iostream&gt; // for assert and printf

typedef unsigned uint;
typedef struct{int day; int month; int year;} Date;
void printDate(Date d) {
  printf("%u, %u, %d\n", d.day, d.month, d.year);
}
Date mDate(int d, int m, int y) {
  Date date;
  date.day = d; date.month = m; date.year = y;
  return date;
}

const int yLen = 365;               // length of normal year
const int quadLen = 4*yLen + 1;     // length of normal quad
const int centLen = 25*quadLen - 1; // length of normal century
const int clavLen = 4*centLen + 1;  // length of clav
const uint epoch = 0x80000000; // Rata Diem of January 1, 1 C.E.

int leap(int year) {
  if (year%400 == 0) return 1;
  if (year%100 == 0) return 0;
  if (year%4 == 0) return 1;
  return 0;
}
int monthLen(int m, int y) {
  switch (m) {
    case 4: case 6: case 9: case 11: return 30;
    case 2: return leap(y) ? 29 : 28;
    default: return 31;
  }
}
int dayNum(int d, int m, int y) {
  while (m &gt; 1) { m--; d += monthLen(m, y); }
  return d;
}
uint Date2RD(Date argDate) {
  int day = argDate.day, month = argDate.month;
  int year = argDate.year;
// Converts date to Rata Diem.
  int rd = epoch-1; // Rata Diem of December 31, 0
// Normalize date to one in standard clav.
  while (year &lt; 1) {
    year += 400; rd -= clavLen;
  }
  while (year &gt; 400) {
    year -= 400; rd += clavLen;
  }
  assert(1 &lt;= year &amp;&amp; year &lt;= 400);
  rd += dayNum(day, month, year);
  year --;
  // year contains number of whole years, if any
  if (year == 0) return rd;
  assert(0 &lt; year);
  rd += (year/100)*centLen; year %= 100;
  rd += (year/4)*quadLen;    year %= 4;
  rd += year*yLen;
  return rd;
}
int dayOfWeek(Date date) {
  uint y = 6+Date2RD(date)%7; // day 0 was a Saturday
  return y&gt;7 ? y-7 : y;
}
int main() {
  printf("%d\n", dayOfWeek(mDate(14,7,1789)));
  return 0;
}</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/295/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/295/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/295/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/295/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/295/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/295/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/295/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/295/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/295/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/295/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/295/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/295/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/295/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/295/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=295&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2010/11/29/from-formulas-to-algorithms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>An Interview with Paul McJones</title>
		<link>http://vanemden.wordpress.com/2010/10/27/an-interview-with-paul-mcjones-2/</link>
		<comments>http://vanemden.wordpress.com/2010/10/27/an-interview-with-paul-mcjones-2/#comments</comments>
		<pubDate>Wed, 27 Oct 2010 22:05:49 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=283</guid>
		<description><![CDATA[In my quest for pioneer programmers I was fortunate to get in touch with one who has a possibly unrivalled record of participation in projects that are important in the development of computing over the decades. I am speaking of Paul McJones, who was part of the team assembled by Butler Lampson to develop Cal [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=283&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In my quest for pioneer programmers I was fortunate to get in touch with one who has a possibly unrivalled record of participation in projects that are important in the development of computing over the decades. I am speaking of Paul McJones, who was part of the team assembled by Butler Lampson to develop <a href="http://research.microsoft.com/en-us/um/people/blampson/15-ReflectionsOnOS/Abstract.html">Cal TSS</a>, one of the first operating system designs to tackle the problems of protected and even mutually-suspicious subsystems. At IBM San Jose Research he worked with <a href="http://www.mcjones.org/dustydecks/archives/2007/04/01/60/">John Backus</a> on his RED languages and with the <a href="http://www.mcjones.org/System_R/">System R</a> group on the first relational database system.  At Xerox  he participated in the development of the Star office automation system (based on PARC&#8217;s Alto personal computer).  Via Tandem Computer he went to do research at DEC SRC in Palo Alto.  After various Silicon Valley start-ups he went to work at Adobe. This included a collaboration with Alexander Stepanov. A spin-off of this research is <em><a href="http://www.elementsofprogramming.com/">Elements of Programming</a></em> (by A. Stepanov and P. McJones, Addison-Wesley, 2009), which presents a novel methodology that spans the gap between abstract mathematics and efficient algorithms.</p>
<p>In the following interview we try to cover at least a little bit of this wide and varied terrain.</p>
<p><span id="more-283"></span></p>
<p><em>APP</em><br />
I first heard of you as part of the team of the CAL time-sharing system in Berkeley.  That must have been around 1969. Is that right?  Were you still a student at that time?</p>
<p><em>McJones</em><br />
Yes, I came to Berkeley in the fall of 1967 as a freshman.</p>
<p><em>APP</em><br />
Did you get started on programming in connection with a course?  Which?</p>
<p><em>McJones</em><br />
No, it wasn&#8217;t in a university course.  At high school a dedicated physics teacher taught me programming.</p>
<p><em>APP</em><br />
Ah! I&#8217;m sure that&#8217;s a great way to get started. Tell me more.</p>
<p><em>McJones</em><br />
Actually, my first taste of programming was even earlier, again because of a dedicated teacher. In the seventh grade, a teacher (now Dr. Marilyn Fendrick; then Mrs. Benefield) encouraged me to pursue my curiosity about digital logic and computers. I read a book called <em>Basic Computer Programming</em> by Theodore G. Scott. This exposed me to the basic ideas of computers, but without a real machine to program, I explored analog and digital electronics for the next few years.</p>
<p>In high school, I took a course called Analytical Techniques in the Physical Sciences,  designed and taught by physics teacher Carl Duzen. It included some linear algebra, solving differential equations on an analog computer, and programming a digital computer in FORTRAN IV and assembly language. We used keypunches at a company nearby and an IBM 7094 at an aerospace company in Los Angeles. Duzen only had time to drive to the aerospace company about once a week, so the turn-around time was very long!</p>
<p>Later that year,  Jack Perrine, an independent software consultant, visited Duzen&#8217;s class and hired me and another student as after-school interns.  Perrine had been a member of the team at Computer Sciences Corporation that wrote the original UNIVAC 1107 FORTRAN compiler. Eventually Perrine tired of  corporate life, and founded ATHENA Programming to support the compiler. My assignments were to fix bugs in the runtime and to write utility routines such as snapshot dumps &#8212; this was my real introduction to assembly language programming. So by the time I entered Berkeley that fall, my experience at ATHENA helped me get an entry-level programming job at the campus Computer Center.</p>
<p><em>APP</em><br />
What kind of work did you do there?</p>
<p><em>McJones</em><br />
At first it was similar to what I&#8217;d done at ATHENA: writing a library subroutine to read very long magnetic tape records, and a utility program to catalog the contents of a  tape. The spring of my freshman year, I got interested in Snobol3. It so happened that Charles Simonyi had entered Berkeley the same time as I did. He already had a great deal of <a href="http://programmersatwork.wordpress.com/programmers-at-work-charles-simonyi/">experience</a> writing compilers in Hungary and at <a href="http://www.datamuseum.dk/site_dk/20040213/cs/">Regnecentralen</a> in Denmark.  At Berkeley he proposed to the Computer Center that he write a from-scratch implementation of Bell Lab&#8217;s brand-new Snobol4.  The  Bell implementation used a virtual machine implemented in macros that was very portable and elegant, but too inefficient in memory and cycles for running student jobs.  Charles&#8217;s plan was to write in assembly language (for the CDC 6400), base the compiler on the GIER design, and keep the entire system &#8220;lean and mean&#8221;.  Someone suggested I work with Charles; he agreed, and I jumped at the opportunity.</p>
<p><em>APP</em><br />
So you were off and running, working with one of the greats in programming, and that in the first year of University!  Did you have any time to take courses?</p>
<p><em>McJones</em><br />
I was enrolled full-time in the electrical engineering program, enjoying the social life in a dormitory with new friends with diverse backgrounds, riding my motorcyle around the Bay Area, and spending lots of time programming. To be honest, my grades suffered the first year. Late in the year, I met my future wife (at the Xerox machine, while copying the Snobol3 manual), and her good influence helped my grades recover.</p>
<p><em>APP</em><br />
What was the fate of the Snobol system?  Such projects often get lost in management muddles.</p>
<p><em>McJones</em><br />
We built a successful system &#8212; if I remember correctly, it was limping along that fall &#8212; that was used at Berkeley and other CDC installations well into the 1970s because of its efficiency and despite its lack of full compatibility. Charles created a sound architecture and wrote a prodigious amount of code. (It just occurred to me that he wrote a prototype of the pattern matcher in ALGOL 60). I wrote I/O routines that took advantage of the asynchronous I/O capability of the SCOPE operating system APIs, so a one-line  copy program written in our Snobol would keep tapes running at full speed, unlike some of the standard utilities. While our implementation omitted a number of standard Snobol4 features (e.g., the TABLE data type), it also added some new ideas. For example, we wanted to be able to write a symbolic debugger in Snobol4, as one could do in LISP. This required being able to decompose every compound data value, such as the elaborate Snobol4 patterns. So we added appropriate built-in predicates and selectors.</p>
<p>Snobol4 was used in computer science courses since it allowed creating data structures &#8212; a role  later subsumed by Pascal. It was also used in a course for humanities students since it had good tools for processing text; the <a href="http://www.robertgaskins.com/files/gaskins-gould-cal-snobol4-1972-IMAGE.pdf">lecture notes</a><em> </em>included an appendix serving as the language reference manual.</p>
<p>A couple of years ago, through my interest in the history of software, I was able to look through the electronic archives of the late Ralph Griswold, one of the Snobol designers. I was amused and pleased to find a copy of the source code to our Snobol system, and even more amused when I was able to assemble and run it on an emulator for the CDC 6600.</p>
<p><em>APP</em><br />
When you tell of keypunch days with long turnarounds, I wondered what else you did in addition to working over listings.  Text with paper and pencil, of course. Did you use diagrams? Blackboards?</p>
<p><em>McJones</em><br />
Schooled as I was in Fortran and assembly code, Algol control structures were a revelation. I wrote Algol-like pseudo code for my initial tasks at the Berkeley Computer Center and inserted it as comments interspersed with the actual assembly language.  We drew lots of data structure diagrams with boxes and arrows. Somewhere I still have a photocopy of the ones Charles did for our Snobol&#8217;s runtime system: strings, numbers, patterns, and so on.  We probably used blackboards some, but I think we often just wrote on the back of an old listing. <a href="http://www.mcjones.org/paul/CRMS/CRMS_APL-Reference.pdf">Here</a>&#8216;s an example of a finished design from those days, complete with lots of diagrams.</p>
<p><em>APP</em><br />
After interactive, paper-based teletypes and such the video terminal was an important transition for coders.  Was there composition on the keyboard right away?  Or was it still via pencil and paper first?</p>
<p><em>McJones</em><br />
I first used video terminals at IBM, around 1975. They were in a common terminal room, so I&#8217;m pretty sure I often composed ahead of time via pencil and paper. I did have an IBM 2741 Selectric-style terminal in my office, but I think I tended to walk down to the terminal room most of the time &#8212; it was a useful way to &#8220;share state&#8221; with other team members.  In late 1976 I went to Xerox. We used Altos for program development from the beginning. There was a brief attempt to have us share them, but we quickly achieved one Alto per person.  I still wrote out my programs on paper; among other things, that meant I could work on them in the evening, many miles away from Xerox.</p>
<p><em>APP</em><br />
Was the transition to IDE&#8217;s significant?</p>
<p><em>McJones</em><br />
Most contemporary programmers love them, but I think IDEs always have a struggle dealing with the size and complexity of the largest systems. For example, it may still take 30 minutes to rebuild a complex operating system, even after making a small change.</p>
<p><em>APP</em><br />
Because the tools of work have changed so much, one wonders how that influenced productivity, quality, or even thought itself?</p>
<p><em>McJones</em><br />
In many programming shops today there is such a focus on speed: very frequent releases; programmers are measured by how much code they write or checkins they perform &#8212; do people have time to think?  Software developers have become very good at testing, but notions of specification and correctness at design time often seem to be ignored.</p>
<p><em>APP</em><br />
Speaking of tools, how about languages. In the first decades, there seem to have been two great transitions: in numerical work from assembler to FORTRAN, and in systems programming from assembler to C. How do these transitions compare?</p>
<p><em>McJones</em><br />
They are hardly comparable.  The transition to FORTRAN for numerics was extremely fast. <a href="http://archive.computerhistory.org/resources/text/Fortran/102653977.05.01.acc.pdf"> Here</a>&#8216;s a September 1957 memo from Backus to his management  with statistics showing a very impressive adoption rate only about six months after FORTRAN I shipped.  FORTRAN allowed scientists and engineers to work at a fairly natural level of abstraction (numbers and arrays) without paying a performance penalty because of excellent code generation. FORTRAN quickly spread to most computers (rarely with the same quality of code generation), giving rise to the notion of portability. The first edition of McCracken&#8217;s book on FORTRAN programming came out in 1961; college courses taught the language to engineers and scientists. All this created a pretty strong positive-feedback loop.</p>
<p><em>APP</em><br />
And why is the transition from assembler to C in systems programming hardly comparable?</p>
<p><em>McJones</em><br />
It took much longer &#8212; roughly two decades &#8212; and involved many intermediate languages. I&#8217;m not sure quite why, but I&#8217;ll speculate that some of the sources of the delay were:</p>
<ol>
<li>Evolution of machine architectures  toward the eventual standard byte-addressed memory, uniform integer and floating data formats, and banks of general-purpose registers (the C machine!).</li>
<li>Evolution of language features allowing expressive, efficient programming of algorithms (e.g., iteration and selection statements), data structures (record/structure definitions; pointer types and pointer arithmetic, etc.), and management of complexity (preprocessors, modules, abstract data types, polymorphism of various kinds, etc.).</li>
<li>Realization by professional programmers (compared with scientists and engineers who program) of the advantages of a standard, portable programming language.</li>
</ol>
<p>Along the way, people tried out <a href="http://www.softwarepreservation.org/projects/ALGOL">ALGOL</a> 58 dialects (NELIAC, <a href="http://www.multicians.org/thvv/7094.html">MAD</a>, and JOVIAL), ALGOL 60 extensions (Burroughs ESPOL,  SAIL, etc.), ALGOL 60 follow-ons (ALGOL W, Pascal, Mesa, Modula), <a href="http://www.multicians.org/pl1.html">PL/I</a>, Bliss, BCPL, and even FORTRAN. By the early 1980s, C was emerging as the standard: it was a good match for the standard machine architecture, and expressive enough for disciplined programmers to use  for a wide variety of applications and systems. It spread along with Unix, and was relatively simple to port or reimplement on  non-Unix operating systems including MS-DOS and Macintosh OS. Since then, C++ has joined C both by being a &#8220;better C&#8221; and by providing powerful tools for user-defined types.</p>
<p><em>APP</em><br />
How about the transition to Object-Oriented Programming?</p>
<p><em>McJones</em><br />
While Simula-67 introduced classes, inheritance and virtual procedures, Smalltalk popularized them. In the 1980s, many programmers wanted to try object-oriented programming and many language designers obliged them. Pundits convinced management this would solve all their problems. Java and object-oriented scripting languages came along in the 1990s with garbage collection and huge libraries: for a time, programming appeared to be a simple matter of copy-and-paste. But I think the pendulum is swinging the other way: people once again realize they need to focus on algorithms and data structures as well as &#8220;glue&#8221;.  Inheritance is  a programming technique, good for some things but not others.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/283/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=283&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2010/10/27/an-interview-with-paul-mcjones-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
		<item>
		<title>In Defense of &#8220;Beautiful Code&#8221;</title>
		<link>http://vanemden.wordpress.com/2010/10/05/in-defense-of-beautiful-code-2/</link>
		<comments>http://vanemden.wordpress.com/2010/10/05/in-defense-of-beautiful-code-2/#comments</comments>
		<pubDate>Tue, 05 Oct 2010 09:57:48 +0000</pubDate>
		<dc:creator>Maarten van Emden</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vanemden.wordpress.com/?p=262</guid>
		<description><![CDATA[In my May 2008 APP article &#8220;Beauty Is Our Business?&#8221; I argued that neither in mathematics nor in programming the concept of Beauty is a useful one. But the B-word is hard to suppress. A kind reader of APP directed my attention to &#8220;Beautiful Code: Leading Programmers Explain How They Think&#8221;, a collection edited by [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=262&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In my May 2008 <em>APP</em> article &#8220;Beauty Is Our Business?&#8221; I argued that neither in mathematics nor in programming the concept of Beauty is a useful one.  But the B-word is hard to suppress.  A kind reader of <em>APP</em> directed my attention to &#8220;Beautiful Code: Leading Programmers Explain How They Think&#8221;, a collection edited by Andy Oram and Greg Wilson.  This book already existed when I wrote my article, so I could have known about it.  It&#8217;s good that I didn&#8217;t, because we now have the benefit of <a href="http://tinyurl.com/2vu9mr8">Jeff Atwood&#8217;s comments</a> [seen Sept. 30, 2010].  His critique is that an <em>idea</em> can be beautiful, or an <em>algorithm</em>.  But the code embodying them is not beautiful, Atwood maintains, and code stands in the way of the appreciation of the beauty of the underlying idea or algorithm.  In this article I argue that handwaving is not enough, that such abstract thing as an idea or an algorithm needs <em>embodiment</em> to be appreciated, and that for an algorithm the natural embodiment is in the form of code.</p>
<p><span id="more-262"></span></p>
<p>For the gist of Atwood&#8217;s criticism I quote:</p>
<blockquote><p>Instead, many chapters just reprint a few pages of code and conclude &#8212; see, it is beautiful!  Many times I was unable to grasp the problem &#8212; what was it that required that so-called beauty to emerge?  I couldn&#8217;t see the whole picture, but the authors presume I do.  Any possible appreciation of beauty requires deep understanding.</p></blockquote>
<p>I can believe that, about the deep understanding!  So here we have a software guru, like Brian Kernighan, or Jon Bentley, pointing at a piece of gobbledygook, and saying: &#8220;This is beautiful!&#8221; They are like many other gurus, through the ages, pointing at this:</p>
<blockquote><p>μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος<br />
οὐλομένην, ἣ μυρί&#8217; Ἀχαιοῖς ἄλγε&#8217; ἔθηκε,<br />
πολλὰς δ&#8217; ἰφθίμους ψυχὰς Ἄϊδι προί̈αψεν<br />
ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν<br />
οἰωνοῖσί τε πᾶσι, Διὸς δ&#8217; ἐτελείετο βουλή,<br />
ἐξ οὗ δὴ τὰ πρῶτα διαστήτην ἐρίσαντε<br />
Ἀτρεί̈δης τε ἄναξ ἀνδρῶν καὶ δῖος Ἀχιλλεύς.</p></blockquote>
<p>and saying &#8220;This is beautiful!&#8221;.  I both cases I&#8217;m not likely ever to undertake the arduous journey to the required deep understanding.  But in both cases I am willing to believe the gurus and to respect those who do make the effort.</p>
<p>Atwood likens code to the paint of the painting.  If you can only see the paint of Van Gogh&#8217;s Irises, and not the painting, then you don&#8217;t experience beauty.  Similarly, if you only see the code, and not what&#8217;s behind it, then you don&#8217;t see beauty.  A difference is that most people are endowed with an innate capacity to appreciate &#8220;Irises&#8221;, but do not have this capacity for text.  Text needs to be worked at, to a greater or lesser extent.</p>
<p>If Van Gogh would <em>talk</em> to us about the idea in &#8220;Irises&#8221; it&#8217;s unlikely we would experience beauty.  Beauty needs to be embodied, and for paintings embodiment takes the form of paint on canvas.  But the paint can get in the way if you look at it the wrong way.  Similarly, if the authors collected in &#8220;Beautiful Code&#8221; would only talk about &#8220;the idea, the algorithm&#8221;, then it wouldn&#8217;t be enough.  The code is needed, just as the paint is needed.</p>
<p>With &#8220;Irises&#8221; it&#8217;s so that the artist can only present it on a take-it-or-leave-it basis: you either see it or you don&#8217;t.  When asked what the meaning of a painting was, Picasso is reputed to have said: &#8220;If I could tell you, Madam, I would be a writer.  As it happens, I am a painter.&#8221; Code does not need to be presented on a take-it-or-leave-it basis.  The author happens to be a writer (she had better be), so can work at facilitating appreciation of the code.  In this article I will try to demonstrate the phenomenon of beauty in code.  Taking Atwood&#8217;s criticisms to heart, I make the examples drastically smaller and I add plenty of context.</p>
<p>I&#8217;m going to experiment with the possibility of supplementing beautiful code with the required understanding with two related examples.  The first is the task of reversing an array segment a[p..q].  To be concrete, the task is that of writing a function declared as</p>
<pre>void rev(int a[], int p, int q);</pre>
<p>assuming one has available</p>
<pre>void swap(int a[], int i, int j) {
  int temp = a[i]; a[i] = a[j]; a[j] = temp;
}</pre>
<p>To reverse an array segment, one swaps each element in the left half of the segment with its mirror image in the right half, for example like this:</p>
<pre>void rev(int a[], int p, int q) {
  for(int i = (p+q)/2; i &gt;= p; i--) swap(a, i, q-(i-p));
}</pre>
<p>Even in this simple example the student author must have experienced a few annoyances.  Is the halfway index OK for both odd and even segment lengths?  Should I introduce another variable to save recomputation in the argument expression in the call to swap?</p>
<p>Compare this student&#8217;s effort with that of another:</p>
<pre>void rev(int a[], int p, int q) {
  for(; p&lt;q; p++,q--) swap(a, p, q);
}</pre>
<p>This was my favourite until I saw the comment from Samuel Tardieu:</p>
<pre>void rev(int a[], int p, int q) {
  while(p&lt;q) swap(a, p++, q--);
}</pre>
<p>Isn&#8217;t this more than just better code?  Don&#8217;t you experience this &#8220;more&#8221; as, er, <em>beauty</em>?  The function definitions embody the same idea and the same algorithm.  According to Atwood one cannot be more beautiful than the other.  But just <em>look</em> &#8230;  code <em>can</em> make a difference.</p>
<p>As the other example I choose the task of cyclically shifting the contents of a one-dimensional array.  I used this example in the first course I ever taught, which was for novice programmers.  The problem has besides some obvious solutions an interesting one. How could I keep my students away from the obvious ones?  Well, I showed them these in advance, so they would know what I did not want them to do.</p>
<p>I explained that the array can be copied shifted to a temporary other array and that the contents can be copied back, like this:</p>
<pre>void cycShift(int a[], int n, int k) {
  int b[n];
  for(int j=0; j&lt;n; j++)
    if (j-k &lt; 0) b[j-k+n] = a[j];
    else b[j-k] = a[j];
  for(int j=0; j&lt;n; j++) a[j] = b[j];
}</pre>
<p>The novice needs to be made aware of the dire disadvantages of this simplistic solution: all that wasted space.  OK, so how about <em>no</em> extra space, like here:</p>
<pre>void cycShift(int a[], int n, int k) {
  while(k-- &gt; 0) {
    int temp = a[0];
    for(int i=1; i&lt;n; i++) a[i-1] = a[i];
    a[n-1] = temp;
  }
}</pre>
<p>The novice needs to be made aware of the dire disadvantages of this simplistic solution: all that wasted <em>time</em>.</p>
<p>By denying the novice these easy ways out, I was hoping to light the spark of inspiration.  I suggested using mathematics, showing that there is a kind of algebra of sequences.  What the operations of such an algebra are depends on who you talk to.  But all would agree that concatenation and reversal are worth including, if only because of the following property: (x.y)&#8217; = y&#8217;.x&#8217; where the dot is concatenation and the prime is reversal.  And note how the order of x and y is reversed.</p>
<p>I left it at this, hoping that they would continue with something like the following.  Let s be the sequence to be shifted by k places to the left.  Define x and y such that s = x.y with the length of x equal to k.  Let us denote by s|k the result of cyclical shift of s by k places to the left.  Then we have s|k = y.x.</p>
<p>How do we get y.x? Let&#8217;s set s = x.y so that s&#8217; = (x.y)&#8217; = y&#8217;.x&#8217;.  Then an algorithm for cyclical shift is obtained by</p>
<pre>s|k = (x.y)|k = y.x = (y.x)'' = (x'.y')'</pre>
<p>It is an algorithm because the latter expression consists of easily implementable operations.  The algorithm is apparently: first reverse left and right segments separately, then reverse the entire sequence.  In code:</p>
<pre>void cycShift(int a[], int n, int k) {
  rev(a, 0, k-1); rev(a, k, n-1); rev(a, 0, n-1);
}</pre>
<p>This steers the way between the Scylla of excessive memory use and the Charybdis of excessive processing.  It needs a constant amount of memory beyond that for the array to be shifted, so it is well clear of Scylla.  It needs n swaps, each of which require 4 array accesses (2 reads, 2 writes), so it is well clear of Charybdis.  By showing my students two ways how not to do it, I hoped to give them a push on the way to finding this piece of beautiful code.</p>
<p>I wasn&#8217;t sure whether I could expect any of the students to get this, the bonus question.  After all, one needs some experience and sophistication to relate algebraic manipulations with code.  I was in for a surprise.  Actually, a double surprise.  One part of the surprise was that there was a student who submitted what he claimed was working code that needed only a constant amount of extra memory and an amount of processing that was independent of k.  The other surprise was that he did it not in the way I believed to be the one right and elegant way.  What did he do?</p>
<p>I found the code puzzling.  By the time I figured out what he had been doing, I realized that he only used one half of the number of array accesses compared to mine.  The idea that I found was that if one has merely one free location in the array, then one can have this free location jump all around the array to receive every element in turn at the new location required by the cyclic shift.  Is this idea beautiful?  Maybe it can only become so by being realized in every detail so that a machine can do it this way fully automatically.  That is, to code it.  But first I need to make the above hand-waving sketch of the idea more concrete by an example.</p>
<p>Here is an array with indexes 0..9 containing the first ten letters of the alphabet:</p>
<pre>          0 1 2 3 4 5 6 7 8 9
         ---------------------
          a b c d e f g h i j</pre>
<p>We are asked to shift it by three locations to the left. Index 0 is the initial free space. It is &#8220;free&#8221; because its content is saved away outside the array:</p>
<pre>          0 1 2 3 4 5 6 7 8 9
         ---------------------
   a        b c d e f g h i j</pre>
<p>In the shifted situation this location has to have the d in it.  So we move it there, thereby freeing the location where the d was before.  We continue in this way until all nine remaining elements are in their new locations:</p>
<pre>          0 1 2 3 4 5 6 7 8 9
         ---------------------
   a        b c d e f g h i j
   a      d b c   e f g h i j
   a      d b c g e f   h i j
   a      d b c g e f j h i
   a      d b   g e f j h i c
   a      d b f g e   j h i c
   a      d b f g e i j h   c
   a      d   f g e i j h b c
   a      d e f g   i j h b c
   a      d e f g h i j   b c
          d e f g h i j a b c</pre>
<p>In the penultimate line all nine remaining elements have been moved to their shifted positions. The location of the hole in the penultimate line has to be the place for the element that was initially shunted away.  So that&#8217;s where it went in the last line.</p>
<p>This example is simple because the distance to be shifted, 3, and the length of the array, 10, are mutually prime. As a result, everything is moved in a single cycle. In general, gcd(n,k) cycles are needed for shifting by k in an array of length n. Here is the example with shifting by 2:</p>
<pre>          0 1 2 3 4 5 6 7 8 9
         ---------------------
   a        b c d e f g h i j
   a      c b   d e f g h i j
   a      c b e d   f g h i j
   a      c b e d g f   h i j
   a      c b e d g f i h   j
          c b e d g f i h a j    end of first cycle
   b      c   e d g f i h a j
   b      c d e   g f i h a j
   b      c d e f g   i h a j
   b      c d e f g h i   a j
   b      c d e f g h i j a
          c d e f g h i j a b    end of second cycle</pre>
<p>Time for coding.</p>
<p>Executing a single cycle is a self-contained task:</p>
<pre>void cycle(int a[], int n, int k, int start) {
// executes single cycle starting from "start"
    int i = start;
    int j = (i+k)%n;
    int saved = a[i];
    // save element at first location of cycle
    while (j != start) {
      a[i] = a[j]; // shift element
      i = j; j = (j+k)%n; // shift indexes
    }
    a[i] = saved;
}</pre>
<p>To complete the cyclic shift, gcd(n,k) cycles are needed; hence the function:</p>
<pre>void cycShift(int a[], int n, int k) {
// shift a[0..n-1] to the left by k places, cyclically
  assert((0 &lt; k) &amp;&amp; (k  0));
  int start = 0; // index where cycle begins
  for (int i = gcd(n,k); i &gt; 0; i--)
    cycle(a, n, k, start ++);
}</pre>
<p>Is this Beautiful Code? In this case again I agree with Atwood that the beauty is not an attribute that applies to code.  &#8220;Beauty is in the eye of the beholder&#8221;.  A more precise formulation of this piece of wisdom is that beauty is a <em>binary relation</em>.  One that may or may not exist between the beholder and the object beheld.  Beauty as a unary predicate is erroneously derived from this binary relation.  The patter that preceded the code in this article is intended to establish that relation.</p>
<p>This brings me back, way back, to when I first used this exercise.  C did not exist; the language was Algol 60.  The student who got this solution was Dik Winter.  His code ended with the comment:</p>
<blockquote><p><em> Like as the waves make towards the pebbled shore,<br />
So do our minutes hasten to their end,<br />
Each changing place with that which goes before<br />
In sequent toil all forwards do contend. </em></p></blockquote>
<p>From sonnet LX by William Shakespeare</p>
<p>Dik was a famously uncommunicative person.  I take it that in this way he wanted to tell me that he had experienced beauty.</p>
<p>Let this article be dedicated to the memory of this remarkable man, who died on December 28, 2009.</p>
<p>PS Paul McJones was so kind as to provide some scholarly background to this article:</p>
<blockquote><p>Your               reverse and cycShift examples are nice. I believe that the               earliest published version of the algorithm that Dik               Winter discovered was Fletcher and Silver in 1966               (Algorithm 284: Interchange of two blocks of data. CACM               9(5): 326). Actually, Dik&#8217;s is slightly better: his cycle               uses n+1 assignments, whereas theirs uses n swaps. Note               Dik&#8217;s algorithm requires general indexing, whereas the               algorithm based on rev requires only ++ and &#8212; on indices.               There is another efficient variation that requires only ++               (e.g., it will work on a singly-linked list); it was first               published by Gries and Mills in 1981 (Swapping Sections.               Tech Report 81-452, Department of Computer Science,               Cornell University). Perhaps unsurprisingly, GCD comes up               again with it. Our chapter 10 (&#8220;Rearrangements&#8221;) covers               these things.</p></blockquote>
<p>PPS The &#8220;Chapter 10&#8243; in the PS is in &#8220;Elements of Programming&#8221; by Alexander Stepanov and Paul McJones; Addison-Wesley, 2009.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vanemden.wordpress.com/262/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vanemden.wordpress.com/262/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vanemden.wordpress.com/262/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vanemden.wordpress.com/262/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vanemden.wordpress.com/262/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vanemden.wordpress.com/262/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vanemden.wordpress.com/262/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vanemden.wordpress.com/262/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vanemden.wordpress.com/262/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vanemden.wordpress.com/262/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vanemden.wordpress.com/262/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vanemden.wordpress.com/262/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vanemden.wordpress.com/262/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vanemden.wordpress.com/262/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vanemden.wordpress.com&amp;blog=3462521&amp;post=262&amp;subd=vanemden&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vanemden.wordpress.com/2010/10/05/in-defense-of-beautiful-code-2/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e28602c14607fe4f92e85f6850e35a93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Maarten van Emden</media:title>
		</media:content>
	</item>
	</channel>
</rss>
