The MIT style in Artificial Intelligence 1958 – 1985

The MIT style in Artificial Intelligence 1958 – 1985

In a previous article I illustrated the tension in Artificial Intelligence (AI) between two mentalities, generally referred to as “scruffy” and “neat”. I started by accepting the characterization by Norvig and Russell [1].

“… the neats — those who think that AI theories should be grounded in mathematical rigor — versus the scruffies — those who would rather try out lots of ideas, write some programs, and then assess what seems to be working.”

Critics of my article have opened up so many dimensions of scruffy versus neat that it is hard to know where to start. From my disorganized thoughts one recurrent theme emerged: the distinctive MIT style during the formative years of AI.

 

In my previous article I explained the Lighthill affair as a conflict between AI proper and those who were stuck in cybernetics. That was in 1972. But cybernetics was there before there was AI and it’s not surprising that that was where some AI people started.

The year was 1946, the place was Cambridge, Massachusetts. The chronicler [2] was Garrett Birkhoff (1911 – 1996). Birkhoff made his reputation in the 1930s as a brilliant young mathematician of the pure variety. His talent was mobilized during World War II and applied to fluid dynamics. Cambridge was abuzz with the developments that Norbert Wiener (1894 – 1964) was soon to synthesize under the label “cybernetics”. As a Harvard mathematician, Birkhoff was not part of this, but confesses himself [2] a fascinated “voyeur”. He identifies as seminal event a public lecture featuring Wiener, John von Neumann (1903 – 1957), and Nicolas Rashevsky (1899 – 1972). The latter was a Chicago professor known for promoting the concept of mathematical biology. The confident organizers had booked an auditorium seating five hundred. Birkhoff estimated an overflow crowd of a thousand. Such was the start of cybernetics. It is safe to assume that the publication of “Cybernetics” by Wiener in 1948 and of “The Mathematical Theory of Communication” by Claude Shannon and Warren Weaver in 1949 were eagerly anticipated.

In the US the second world war had mobilized scientific talent and resources on a large scale. This effort launched novel developments such as operations research, game theory, automata theory, time series analysis, stochastic processes, information theory. By 1945 the innovation machine had gotten up to speed and was to continue unabated for over a decade. Electronics was the technology that made many things practical that had hitherto been confined to theory. In turn, theoretical studies were stimulated by the technological developments. The impact of electronics ranged widely, from the automation of computing to experimental investigation of the nervous systems of animals and humans. To some farsighted researchers there were many interconnections between these new developments. “Cybernetics” was the term coined to characterise the heady brew of synergies.

Hard on the heels of these novelties another wave was coming. Turing talked (and Shannon wrote [3]) about programming a computer to play chess, rather than the trajectory of an artillery shell. Turing famously wrote [4] about the corollary of a computer being programmed to think. In the early 1950s Shannon and McCarthy wanted to convene a symposium, not about cybernetics, but about the new New Thing: programming a computer to display human-level intelligence. McCarthy proposed as title “Artificial Intelligence”. Shannon could hardly disagree, as author of the first article about programming a computer to play chess. But he argued for a less provocative term and got his way: “Automata Studies”. As McCarthy later ruefully observed, “…, and guess what we got … automata studies” [5].

When the next opportunity presented itself, McCarthy had learned not to fly any false flags. The grant application proposed a “Summer Research Conference on Artificial Intelligence”. It was held in Dartmouth College in 1956 [14]. “Artificial Intelligence” stuck. Then, and ever since, many were unhappy with the term, but ended up using it for lack of a better alternative. This time the meeting was successful. In many of the participants and their discussions we recognize the real thing, in addition to some cybernetics and the inevitable automata studies.

From our current vantage point it is difficult to appreciate how insanely ambitious the idea of thinking computerswas compared to the hardware and software technology of the time. In 1956 computers were established as machines for scientific and for administrative applications. Such applications thrived: they were of interest to organizations with lots of money and these applications fitted comfortably inside the constraints imposed by the technology of the time. The hardware and software was developed at Univac and IBM for these purposes and delivered to corporate and university customers accordingly. For people like those who came to the Dartmouth Summer School the software of the time and the way the computers were run was intensely frustrating.

Before anything could happen in the direction of AI much practical development had to be done. In Dartmouth McCarthy was not only aware of this, but had the most advanced software research program. He had correctly identified the incipient Fortran compiler at IBM as the most promising starting point. He extended the compiler to handle the list data structure invented by Newell and Simon at RAND in Santa Monica.

While at Harvard professor Birkhoff was a fascinated bystander there was in his department an equally fascinated person, an undergraduate, who was not content to remain a bystander: he built SNARC, the first randomly wired neural network learning machine. Marvin Minsky continued to a PhD in mathematics in Princeton, with the thesis “Neural Nets and the Brain Model Problem”. He returned to Cambridge to work in Oliver Selfridge’s cybernetics group at the MIT Lincoln Laboratory in 1956 [7].

In 1958 the AI project was launched at MIT, with Minsky and McCarthy as founding faculty. McCarthy had come to the conclusion that souped-up Fortran was not satisfactory for the Advice-Taker project, nor was the Algol 58 that McCarthy was involved in. He needed to develop his own language. Upon his arrival at MIT he found the resources to do so: an IBM 704 and support for his programmers S. Russell and K. Maling [8].

That researchers at MIT had to depend on a machine bought from IBM suggests that MIT was behind IBM in computer development at the time. This is only true of the nonclassified part of MIT. Behind the veil of military security MIT had developed the most advanced computers of its time: Whirlwind, followed by TX0. The latter machine was a test vehicle for novel technology: transistors for the processor and ferrite cores for memory. When the test was successful, a more powerful version was developed, the TX2. The redundant TX0 was transferred in 1958 to the MIT Research Laboratory for Electronics, the home of the AI project.

Up till that time the Tech Model Railroad Club was just one of the student clubs on the MIT campus. TMRC had evolved an elaborate model railroad installation. For one part of the club the installation was a relief from technology. These students delighted in making pretty landscapes and in neatly painting the engines. The invisible part of the installation attracted the other part of the club: the freshest of freshmen and the nerdiest of nerds. They developed the underside of the installation, an insanely elaborate switching system with the complexity of a telephone switching office and indeed built with such equipment [9]. Russell, though not an MIT student, became a member of TMRC.

When the TX0 emerged in 1958 from its security-enforced seclusion it was not long before the more tech-crazed TMRC members discovered it as a playground that was even more interesting than the underside of the TMRC installation. And it was not long before their single-minded obsession with complex systems made some of them into virtuoso programmers and systems engineers at a level as high as anywhere else. Such expertise was in demand, as the typical prof and even grad student at the time was clueless in this area [9].

The IBM installation only allowed experimentation at the level of user programs, which included McCarthy’s work. But it did not allow hooking up cameras or robotic hardware, or the experimentation with systems software that would make this useful. In the TMRC punks hanging out around the TX0 Minsky saw expertise badly needed for the AI laboratory. The hackers, undergraduates in danger of dropping out of their academic program, represented the extreme end of scruffy. They competed for computer access with the representatives of neat: graduate students working on their theses and profs looking for opportunities to publish in the scholarly literature [9].

As an illustration of the kinds of things going on at MIT around 1960, consider David Silver, one of the hackers. He joined the group as a fourteen-year old drop-out from grade six, after twice having skipped a grade backwards. The hackers first tolerated Silver as a sort of mascot. It wasn’t long before he had cobbled together a floor-roving robot controlled from his software that performed tasks that the graduate students were writing about in their theses and pronouncing infeasible on the basis of solid surveys of the scholarly literature. The graduate students were most upset. Experiments without underlying theory were bad enough. And this punk was gobbling up precious machine time for activities that weren’t even experiments (where were the designs? where were the records?). But Minsky felt that the hacker contributions were vital to keeping AI real [9].

McCarthy was not impressed with the TMRC hackers hanging around at the TX0. The feeling was mutual: McCarthy had an ulterior motive behind his programming. For the hackers programming was its own justification; a silly hack was as good as anything else to enjoy programming and to display virtuosity. McCarthy likened them to “ski bums” [9]. Yet, as McCarthy acknowledges, one of the hackers made an important contribution to the implementation of Lisp. But then, this was McCarthy’s own Steve Russell, who had become a hacker in the sense of joining TMRC. The AI project was struggling with a compiler for Lisp, with no end in sight. McCarthy had shown Russell his paper about Lisp as an alternative formalism for specifying computable functions, one based on symbolic expressions rather than numbers. The required universal function was EVAL. Russell suggested programming it on the IBM 704.

… This EVAL was written and published in the paper and Steve Russell said, look, why don’t I program this EVAL and you remember the interpreter, and I said to him, ho, ho, you’re confusing theory with practice, this EVAL is intended for reading not for computing. But he went ahead and did it. That is, he compiled the EVAL in my paper into 704 machine code fixing bugs and then advertised this as a LISP interpreter which it certainly was, so at that point LISP had essentially the form that it has today, the S-expression form … (McCarthy quoted in [8])

I infer that the compiler project went on the back burner for a while.

This feat, destined to become History, did not deflect Russell from his involvement with hacks. In 1962 he programmed the PDP-1 so his friends could play Space War. At the time, a neat hack to show off the graphic display of the PDP-1 and to try out a new symbolic interactive debugger. In retrospect, the neat hack was also the invention of the very concept of something destined to grow into an industry in its own right.

The hackers were totally absorbed in the act of programming and in the exploration of the capabilities of the computer hardware they found and which they urgently needed to modify. They found in their own small group all the audience they needed. No need to publish anything. This attitude, though the exact opposite of the typical academic’s, was congenial to Minsky, who was later to reminisce:

…when it was our strategy in those early days to be unscholarly; we tended to assume, for better or for worse, that everything we did was so likely to be new that there was little need for caution or for reviewing literature or for double-checking anything. As luck would have it, that almost always turned out to be true [7].

Could it be that Minsky was infected by the hackers’ example? It looks like Minsky’s attitude was the root of the specific MIT style in AI, a style that was later to be, if not denounced, at least branded, as “scruffy”.

Compare McCarthy’s Advice-Taker project [12]. His approach was the epitome of “neat”: its knowledge base was to be expressed in formulas of predicate logic; the planning was to be the outcome of logical inference. McCarthy considered the existing programming languages, insofar as there were any, unsuitable for implementing such a system of logic. Hence his effort in developing a new language, the one that became Lisp. McCarthy must have assumed that somehow a suitable inference system would show up once Lisp was available. It didn’t. However, Lisp opened up so many exciting new possibilities that lack of progress in the Advice-Taker didn’t matter. These early years of Lisp were the years of the undocumented firsts that Minsky mentions in [7]; there is also documented work [10].

It was during this period that resolution inference appeared [11]. At first sight it seemed exactly what Advice Taker had been waiting for. The possibilities opened by resolution set off a wide-spread effort to realize the goal of Advice-Taker: given a description in logic of a world state and of a goal to be achieved, generate a constructive proof that the goal is achievable. The constructive nature of the proof implies that the sequence of actions to be performed can be picked out from the proof. In short, a proof as a plan. And automatically generated. That was the dream.

At MIT, however, the AI group may not have tried too hard to harness resolution to this task [13]. After all, if you have Lisp, why would you let logic stand in the way of programming a planning agent? Carl Hewitt’s answer to this question was Planner, untypically for MIT, not a Lisp program, but the design for a language. Planner was intended for implementation in Lisp, and a subset named Micro-Planner was implemented. It was an extension of Lisp, enriching the control structure with the addition of automatic backtracking. This was also the choice in Prolog, which appeared soon after. Sussman, one of the implementers of Micro-Planner, came to the conclusion that the fully automatic backtracking implied in “planning” was unsatisfactory and thought some conniving on the part of the programmer would be a more realistic approach for the automatic generation of plans; hence the language Conniver, created in collaboration with Drew McDermott. When conniving also turned out to be too ambitious, backtracking was thrown out altogether, so that the programmer had to resort to scheming. With Guy Steele, Sussman designed Schemer to achieve the desired level of programmer control. And guess what, Schemer was essentially Lisp, except that it was the long overdue lexically scoped version of Lisp. Thus the planning problem was brought back to what it should have been all the time according to the characteristic MIT approach: a problem to be solved in Lisp. The ITS operating that Sussman and Steele were working with did not allow filenames longer than six characters, so that “Schemer” was truncated to “Scheme”, now a standardized and much-loved programming language, whose roots in planning have mostly been forgotten.

It is of course not wise to try to tie down the heyday of the MIT style in AI to an exactly specified period. In the title I chose 1958 for the start because that was the year that Minsky started the AI project at MIT. That was also the year when he presented a wide-ranging overview of AI [6]. In the beginning Minsky spoke mainly through his students: in 1969 “Semantic Information Processing” appeared, edited by Minsky and with an article contributed by him. The bulk of the book consists of papers based on the PhD theses of Minsky’s first batch of students. The next milestone is an AI Memo co-authored with Seymour Papert [15]. Reading these two, the extreme of Scruffy, makes it hard to realize that Minsky could be as Neat as any. Not long before Semantic Information Processing he had published Computation: Finite and Infinite Machines and the year after, with Seymour Papert, Perceptrons: an Introduction to Computational Geometry. As an undergraduate Minsky had studied with Andrew Gleason, a mathematician’s mathematician [17]. My guess is that Minsky could imagine becoming a mathematician like Gleason, but set his sights, if not higher, at least further.

For the end date in the title I picked that of the publication of Minsky’s “The Society of Mind” [16]. It starts like this:

This book tries to explain how minds work. How can intelligence emerge from nonintelligence? To answer that, we’ll show that you can build a mind from many little parts, each mindless by itself.I’ll call “Society of Mind” this scheme in which each mind is made of many smaller processes. These we’ll agents. Each mental agent by itself can only do some simple thing that needs no mind or thought at all. Yet when we join these agents in societies — in certain very special ways — this leads to true intelligence.

Of course, when you write a book “that tries to explain how minds work”, then you get in the way of the psychologists. For the better part of a century, psychology had tried very hard to be a real science. Physics and chemistry had dominated the intellectual scene so much that “real science” was identified with these fields. Physics and chemistry got great by never accepting apparent complexity at face value: there always had to be something simple hiding behind. As a result, for almost half a century, behavioralism reigned supreme in psychology. Only shortly before the appearance of AI, psychology freed itself from this stranglehold. Minsky dared to ask: what if the mind is not inherently simple? What if, behind this apparent complexity, there is … complexity? What if this complexity takes the form of a changing collection of many, many agents connected in intricate and changing patterns?

Behaviourist psychology, with its urge to conform to “real science” and its emphasis on the Scientific Method, was Neat. Minsky, in his bid to discover how the mind works, no holds barred, was Scruffy.

Acknowledgments

Thanks to Paul McJones, Alan Robinson, and Steve Russell for suggestions and corrections.

References

[1] Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig. Prentice-Hall, First edition 1995, page 21.
[2] Article by Garrett Birkhoff in History of Computing edited by N. Metropolis, J. Howlett, and Gian-Carlo Rota. Academic Press, 1980.
[3] Claude E. Shannon: Programming a Computer for Playing Chess, Philosophical Magazine, Ser.7, Vol. 41, No. 314, March 1950.
[4] Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460.
[5] BBC television 1973.
[6] “Some methods of artificial intelligence and heuristic programming” in Mechanisation of Thought Processes Symposium No. 10, National Physical Laboratory, Her Majesty’s Stationery Office, 1959.
[7] “Introduction to the COMTEX Microfiche Edition of the Early MIT Artificial Intelligence Memos” by Marvin L. Minsky, AI Magazine, 1983: 19 – 22.
[8] “Early Lisp History (1956 — 1959)” Herbert Stoyan. LFP ’84: Proceedings of the 1984 ACM Symposium on LISP and functional programming.
[9] “Hackers” by Steven Levy. Doubleday, 1984.
[10] Marvin Minsky: “Semantic Information Processing“, MIT Press, 1968. Slagle’s thesis.
[11] J.A. Robinson: “A Machine-Oriented Logic Based on the Resolution Principle”, Journal of the ACM, 1965.
[12] “Programs with common sense” in Mechanisation of Thought Processes Symposium No. 10, National Physical Laboratory, Her Majesty’s Stationery Office, 1959.
[13] But see an interview with Robinson for an interesting peek behind the scenes.
[14] Dartmouth 1956 proposal
[15] “Artificial Intelligence: Progress Report” by Marvin Minsky and Seymour Papert, Artificial Intelligence Memo No. 252, January 1, 1972.
[16] The Society of Mind by Marvin Minsky. Simon and Schuster, 1985.
[17] See the interview with Andrew M. Gleason in More Mathematical People edited by D.J. Albers, G.L. Alexanderson, and C. Reid, Harcourt Brace Jovanovich, 1990.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: