With his “Pretty Good Privacy” (PGP), Phil Zimmermann launched the era of what I call Wildcat Crypto. Up to that point discussions on cryptology had focussed on whether DES, the US government standard, was secure and whether government involvement in its development might have jeopardized its security. To avoid government involvement, PGP used IDEA as alternative to DES for its bulk encryption. Yet IDEA is a close relative of DES and shares its weakness: small, constant-size blocks. True Wildcat Crypto calls for a radical departure in the form of blocks that are much larger, and moreover, vary in length under control of the key. In this article I trace the relevant history and outline my implementation of such a radical alternative.
Since time immemorial, cryptography was a common feature of puzzle columns in magazines and newspapers. But up until the 1970s, only the military, departments of foreign affairs, and spies were seriously interested in it. This state of affairs changed when banks and other commercial organizations started using computer networks. The need arose for industrial-strength cryptography. Not only that, but the cryptography needed to be standardized and sanctioned by the government. For the bank’s information-technology chief it was even more important for encryption to be standard than to be secure: if a breach occurred that cost the bank a few million, he wouldn’t lose his job as long as he had used the industry-accepted standard. This was the reason for the U.S. National Bureau of Standards (now NIST) to adopt the Data Encryption Standard (DES) in 1977. Its current successor is the Advanced Encryption Standard (AES), adopted in 2001.
In the 1990s there was a new wave of interest in cryptography arose. Private citizens became concerned about the possibility of surveillance of their personal communications. Phil Zimmermann, who had been an anti-nuclear activist, created an encryption program he named PGP (“Pretty Good Privacy”). It gave people privacy when using bulletin board systems (this was before e-mail was widely available) and security when storing messages and files.
To achieve the highest level of security, PGP uses a public-key protocol implemented by the RSA algorithm. However, this is only computationally feasible for short messages. Systems that can handle long messages are symmetric-key systems, meaning the sender and recipient have to share the same key. PGP solves this problem by using a public-key protocol for transmission of the key needed in some symmetric-key system for bulk encryption, that is, to encrypt the message itself, which can be long.
PGP marks the beginning of wildcat encryption. I am calling it so to draw attention to the difference between Zimmermann and the data security people at the banks: he didn’t care about PGP having a level of security sanctioned by the government; he wanted the best security he could get. Accordingly, for the bulk encryption part of PGP he used a cipher called IDEA, from “International Data Encryption Algorithm”, a non-standard improvement of DES published in 1990 .
From the point of view of Zimmermann IDEA had the advantage of key length and block size suggesting a higher level of security than DES, but it is the same type of algorithm. For example it is also a block cipher. Its blocks are twice the size of those of DES, but still minuscule compared to plaintext lengths of many thousands of bytes, not uncommon in practice.
To understand how we came to be stuck with short, fixed-sized blocks, let us have a look at the history of DES. In 1971 Horst Feistel patented a new type of encryption algorithm: a substitution-permutation network with 48-bit keys and blocks. In 1973 he published a version where this was increased to 128 bits. These were part of an IBM project named Lucifer. At the time, practical deployment required hardware implementation. The block sizes were small enough to make this feasible in the chip technology of the time.
Let us now move forward to 2001, when AES was adopted as standard to replace DES. Much has changed; what has not changed is that block sizes, though increased, are still minuscule compared to the length of the messages that the system should be able to carry.
What had changed in 2001 was that since at least a decade encryption was implemented in software and was running on byte-oriented computer architectures. The constraints imposed by hardware implementation had disappeared. And guess what, AES, like Lucifer, is a substitution-permutation network, with 128-bit keys and blocks. The structure of the network, the structure of the boxes, and the algorithm incorporate many details not necessitated by published design criteria. DES had been designed by IBM in collaboration with the U.S. government. Some commentators voiced concerns about the possibility that, in addition to the published design criteria, there was a backdoor that presented a weakened version of the system to an insider. In fact, after two decades some hitherto unpublished design criteria for DES surfaced .
In an attempt to lay such suspicions to rest, the designers of AES published a sizeable book with lots of abstract algebra in an attempt to convince the public that the design decisions were all in the interest of the users’ security. But no matter how big one makes such a book, it remains possible that it does not contain all design criteria. This is not the authors’ fault: the possibility is inherent in the type of encryption system: substitution-permutation network, with fixed-size blocks of a size much less than that of the larger messages to be carried.
This curious history makes it worth thinking about an encryption algorithm that does not inherit hardware design constraints and freely uses the liberty afforded by software implementation to cobble together an algorithm that stays close to fundamentals of encryption. For these fundamentals it is best to refer to the document mentioned by Whitfield Diffie in the following quote .
The literature of cryptography has a curious history. Secrecy, of course, has always played a central role, but until the First World War, important developments appeared in print in a more or less timely fashion and the field moved forward in much the same way as other specialized disciplines. As late as 1918, one of the most influential cryptanalytic papers of the 20th century, William F. Friedman’s monograph The Index of Coincidence and its Applications in Cryptography, appeared as a research report of the private Riverbank Laboratories. And this, despite the fact that the work had been done as part of the war effort. In the same year Edward H. Hebern of Oakland, California filed the first patent for a rotor machine, the device destined to be a mainstay of military cryptography for nearly fifty years . After the First World War, however, things began to change. U.S. Army and Navy organizations, working entirely in secret, began to make fundamental advances in cryptography. During the thirties and forties a few basic papers did appear in the open literature and several treatises on the subject were published, but the latter were farther and farther behind the state of the art. By the end of the war the transition was complete. With one notable exception, the public literature had died. That exception was Claude Shannon’s paper “The Communication Theory of Secrecy Systems,” which appeared in the Bell System Technical Journal in 1949 . It was similar to Friedman’s 1918 paper in that it grew out of wartime work of Shannon’s. After the Second World War ended it was declassified, possibly by mistake.
Shannon’s paper stands out as lone monument in an empty period of the literature, a period that lasted from Friedmann’s monograph to the early 1970s.
Apart from specific results, Shannon’s paper is valuable for introducing a mathematical way of thinking about cryptology, something we now take for granted. What does Shannon’s mathematical view show us? For any practical system the key is shorter than the message, much shorter. Suppose we use a key of 256 bits (reasonably large) to encrypt a message of 100,000 bits (only moderately long at 12,500 bytes). The key selects one among at least 100,000! possible message-to-cryptogram maps. But there are only 2↑256 keys possible. So the keys identify only the tiniest fraction of possible message-to-cryptogram maps [by my computer-aided reckoning 100,000! is about 2↑1,516,704]. The security of a cipher depends on how randomly it sprinkles that tiny fraction over the space of all message-to-cryptogram maps.
Shannon makes this vague criterion more precise by identifying two methods to thwart attempts to break an encrypted message: diffusion and confusion. His description of these methods is too technical to reproduce here. Suffice it to say that Feistel’s invention, the block cipher, incorporated both confusion and diffusion in Shannon’s sense. Feistel’s cipher uses a network of “boxes” of two types: S-boxes and P-boxes. The plaintext is subdivided into fixed-sized blocks. The network subjects each block of text to repeated transformation. An S-box substitutes text elements for existing text elements. This achieves Shannon’s “confusion”; his “diffusion” is achieved by the P-boxes permuting the text elements within a box. Thus Feistel was right on track according to Shannon’s fundamental principles in aiming at confusion and diffusion. But Feistel had to work under the constraint of hardware implementability, which was constrained by the chip technology of the 1970s.
What I’m proposing here is to follow Feistel in implementing substitution and permutation, but freed from the constraint of using a fixed set of S and P boxes acting on a short and constant block size. Instead, I let the boxes themselves as well as their lengths depend on the key. This results in a block size only limited by the length of the message. Of course some constraints remain. For example, I am not assuming that the entire plaintext is in random-access memory. Accordingly the proposed algorithm buffers the message using a buffer size B, which, of course, can be made as large as one likes. The length of the current block is chosen between B/2 and B in a way that depends on the key. For test runs I set B at 768 bytes (giving block sizes between 3072 and 6144 bits, to be compared to 128 bits in the case of AES). It is these variable-sized blocks that are subjected to a permutation. I have chosen the Fisher-Yates shuffle under control of the pseudo-random number generator.
This takes care of the diffusion part. The confusion part takes some more explaining. To cut to the chase, it is a version of the WWII Enigma machine. Logically, Enigma had a complex structure, a complexity necessitated by hardware constraints (mechanical hardware in this case). Software freedom allows drastic simplification of Enigma’s logical structure. The relatively large memory size available allows an Enigma-like software device where everything is much bigger than in the original. I call this gigantic version of Enigma Giganigma.
In the contemporary cryptographic community Enigma is regarded as no more than an historic curiosity. But still, let us consider its potential security. During WWII many variants of Enigma were in use, with different levels of security. As will be explained below, Enigma had no explicit key, but instead relied on a set-up procedure that was kept secret and that led to a large number of possible combinations. For the high-end Enigma, this number was about 3*(10↑35), equivalent to a key size of between 118 and 119 bits (as can be seen from the fact that 10↑3 is approximately 2↑10), rather better than DES. It is intriguing, but probably a coincidence, that Feistel chose a key size of 128 bits for his larger version of Lucifer. What matters is of course how well the uncertainty in the key is transferred to uncertainty in the plaintext when the ciphertext is given. Due to its mechanical constraints this utilization is probably rather poor in the case of Enigma, and in any case hard to assess.
Giganigma has a key size of 256, which makes it double that of AES. Key utilization in AES is hobbled by it fixed network, fixed S- and P-boxes, and a short, fixed block length. In Giganigma the structure of the components and the block length are determined by the key only, which I expect to lead to better key utilization, but, again, hard to assess.
To explain my choice of Enigma as starting point for a new approach to bulk encryption, a brief description of the device. Enigma was the most widely used family of rotor-based encryption devices. This encryption principle was independently invented by Arthur Scherbius in Germany, Edward Hebern in the U.S.A., Hugo Koch in the Netherlands, and Arvid Damm in Sweden. Their patents are all dated 1918 or 1919. Rotor machines were the mainstay of military cryptography from around 1930 to 1970.
A rotor is an electro-mechanical way of implementing an invertible substitution by a letter for a letter of the same alphabet. In the case of Enigma the alphabet consists of the 26 letters A through Z. Enigma rotors implement such a substitution by means of a circular disk of insulating material the size of a hockey puck. Around the perimeter of each face of the disk there are 26 evenly spaced electrical contacts, each of which is connected to exactly one contact on the other face. Several rotors are mounted in a pack on an axle on which they can rotate independently of each other. Each contact of a rotor facing another rotor makes a connection with a contact of that other rotor. In this way the pack as a whole implements a substitution of one letter for another.
This substitution is used for encrypting one plaintext letter. After this at least one rotor changes position, so that the pack as a whole effects another substitution, which is used for the next letter. Before entering the rotor pack, the signal travels through a plugboard which is set by the operator. The plug board also implements a substitution. At the other end of the rotor pack the signal is reflected and is sent back through the pack. This arrangement effects a simple way of decrypting: entering the ciphertext as a plaintext message as if it were to be enciphered has the effect that the plaintext emerges. In some Enigma models the reflector is field-rewirable.
The secret key to be shared by sender and recipient is not, in the case of rotor machines, in the form of some secret word. Instead, it is effected by the form of a protocol followed by sender and recipient that leaves an adversary with a large amount of uncertainty about the substitutions used to obtain the ciphertext. Let us calculate the amount of this uncertainty for a high-end Enigma with a basket of 8 rotors, out of which 4 are mounted. The protocol specifies for each day:
- the rotors to be mounted, and the order in which this is to be done (8*7*6*5 = 1680 possibilities)
- the setting of the plugboard (26! possibilities)
- the initial settings of the rotors (26↑4 possibilities)
This gives as total number of possibilities about 3*(10↑35), the number mentioned earlier, equivalent to a key length between 118 and 119 bits. This is already a respectable length, which increases if there is a field-rewirable reflector to contend with.
To see how Enigma can serve as model for a modern incarnation, let us look at the device from an abstract point of view. In the first place, Enigma is not a single device, but a kit from which any of a large number (8*7*6*5*26!, from items 1 and 2 above) of alternative devices can be assembled in a matter of minutes. Any of these assembled Enigma’s can be regarded as a finite-state machine with an input tape (the plaintext) and an output tape (the ciphertext). Such a machine is specified by a set of states (including a distinguished one serving as start state), a next-state function, and an output function.
In the case of Enigma there are as many states as there are combinations of the 26 positions of the four rotors. That is, there are 26↑4 = 456,976 states. Each of these states amounts to a virtual rotor, which is, like the constituent real rotors, a substitution table for the 26 letters of the Enigma’s alphabet. The output function is the effect of this substitution table on the current input symbol. Enigma realizes only 26↑4 substitution tables for any particular letter, but the uncertainty faced by the adversary is compounded with the fact that it is not known which of the 8*7*6*5*26! alternative Enigma’s has been assembled.
Giganigma is derived by adopting the same abstract view as finite state machine assembled from a kit. There are several points at which software implementation suggests additional security. But first and foremost the role of the key needs to be changed. As noted above, Enigma did not have a key in the usual sense of a secret word. Instead, the uncertainty faced by the adversary was in the form of not knowing which particular Enigma was assembled and what its initial state was.
Software implementation allows us to use a key in the usual sense of the word, as a sequence of bits. A convenient length for this sequence is 256. The key-equivalent choices in Enigma propagate throughout the transmission of even the longest message. If the key is a sequence of only 256 bits, a mechanism is needed to ensure its continuing contribution throughout encryption of the plaintext or decryption of the cryptogram. In Giganigma this is achieved by a pseudo-random number generator in the form of a finite-state machine with the key as initial state.
Let us start with the assembly stage. Enigma is assembled from a basket of 8 rotors. These rotors had a fixed wiring pattern, so that the resulting substitution tables had to be assumed known by the adversary. For Giganigma we imagine a stage to precede the assembly stage. In this additional stage, which we can think of as a “manufacturing stage”, each of the basket of rotors is created as a random permutation of the alphabet derived from the key. And, while we’re at it, we might as well get more than 8 rotors in the basket. We ran our examples with a basket of 64 rotors.
Thus in Giganigma, for each message separately, the rotors are wired under control of the key.
The manufacturing stage of Giganigma is also the time where the number of mounted rotors is to be decided. In Engima the current substitution table is realized by an electrical curent traversing the four mounted rotors. In Giganigma each rotor is represented by an array A of eight-bit characters such A[i] is substituted for character i. For Giganigma our first impulse is to exploit software freedom for a much greater number of mounted rotors, to increase the amount confusion facing an adversary. How much greater?
We need to realize that the more virtual rotors we mount in Giganigma, the longer it takes to encrypt or decrypt a character, as each rotor correspohds to an array access. In Enigma this is of no concern, as the change in electrical current traverses the half-inch thickness of the rotor at at least half (my guess) the speed of light in vacuum. Which is why I set the number of mounted rotors at 16, much smaller than the number in the basket. Decrease it if you want more speed; increase it if you want more confusion.
The state in the finite-state machine  modeling Giganigma has following components:
- In Enigma the identity and position of each mounted rotor has to remain unchanged during transmission of a message. In software this can easily be changed, so for Giganigma the position and identity of each mounted rotor is part of the state.
- As in Enigma, the phase of each mounted rotor is part of the state.
- Each rotor includes a counter called its “age”. In certain transitions it gets incremented. Reaching a certain value causes an additional change in the state.
- As Giganigma incorporates a pseudo-random number generator, its state is part of the state of Giganigma.
State transitions in the finite-state machine modeling Giganigma are as follows.
- the phase of at least one mounted rotor changes. This ensures that for each successive character a translation table is picked from all the possibilities inherent in the 16 mounted rotors.
- To ensure that it’s not the same rotor at every character that rotates, one could incorporate a mechanism analogous to an odometer. Even for Enigma the high level of predictability of a straightforward odometer was considered unacceptable, so that considerable mechanical ingenuity was expended to introduce variations on an odometer-like mechanism.
In Giganigma software freedom offers an embarrassment of riches for item 2. An extreme libertarian would even question the use of a basket of rotors to be fixed for the entire message: why not continue wiring new rotors under control of the key stream as encryption of decryption proceeds?
To answer this question we should consider the key stream as generated by a PRNG. Even if the key is selected randomly, the key stream is not random in the sense that each byte comes out as a random choice among all possible bytes. The ideal of next-byte randomness is more closely approached in the early part of the key stream. The role of the basket of rotors is to capture the randomness of the early part of the key stream and to keep it available for encryption of the entire message, however long. Of course the later parts of the key stream are still useful. In Giganigma they are used to control the ongoing change of mounted rotors and to control block sizes.
Thus the next development in Wildcat Crypto is to replace in PGP the IDEA component for bulk encryption by a program that effects Shannon’s diffusion in the manner indicated above and that effects Shannon’s confusion by Giganigma.
 Lai, Xuejia, and James L. Massey. “A proposal for a new block encryption standard.” Advances in Cryptology—EUROCRYPT’90. Springer Berlin Heidelberg, 1990.
 Foreword for: Schneier, Bruce. “Applied Cryptography”, Wiley 1996.
 Diffie, Whitfield, and Martin E. Hellman. “Privacy and authentication: An introduction to cryptography.” Proceedings of the IEEE 67.3 (1979): 397-427.
This paper is also a good short overview of cryptography. I recommend the following books: “The Codebreakers” by David Kahn (MacMillan 1967), “Handbook of Applied Cryptography” by Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone (CRC Press, 1996-2001), and “Cryptography” by Douglas R. Stinson (Chapman and Hall/CRC, 3rd edition 2006).
 Shannon, Claude E. “Communication theory of secrecy systems.” Bell system technical journal 28.4 (1949): 656-715.
 Also, “finite-state automaton”. There is a veritable zoo of such paper machines, first identified by authors such as Arthur Burks, Edward Moore, and George Mealy. Their writings stand at the birth of computer science. A good early textbook is “Computation: Finite and Infinite Machines” by Marvin Minsky (Prentice-Hall 1967).
 “The Design of Rijndael” by Joan Daemen and Vincent Rijmen (Springer 2002).
 Example of a secret design criterion: “It was acknowledged that the purpose of certain unpublished design criteria of the S-boxes was to make differential analysis of DES infeasible … it was kept secret for almost 20 years …” page 101, “Cryptography” by D.R. Stinson (Chapman and Hall/CRC, 3rd edition 2006). Here of course the secret design criterion was in the interest of the user’s security.