Rewriting a Utility

By Maarten van Emden

I can’t leave well enough alone. The most recent manifestation of this problem is the following program from Kernighan and Ritchie (“The C Programming Language”), which is perfectly OK code for converting a number to the equivalent decimal numeral.


void itoa(int n, char s[])
{
    int i, sign;

    if ((sign = n) < 0)  /* record sign */
        n = -n;          /* make n positive */
    i = 0;
    do {       /* generate digits in reverse order */
        s[i++] = n % 10 + '0';   /* get next digit */
    } while ((n /= 10) > 0);     /* delete it */
    if (sign < 0)
        s[i++] = '-';
    s[i] = '';
    reverse(s);
} 

This requires something presumably like the function below, which I list so that lengths of code can be compared.


void reverse(char s[]) {
  int i = 0, j = 0;
  while (s[j++] != '')
    ;
  j -= 2;
  while (i < j) {
    char temp = s[i];
    s[i] = s[j]; s[j] = temp;
    i++; j--;
  }
}

What is it that I don’t like about it? For one, it is left up to the user to supply a sufficiently long string. And wouldn’t it be more natural for a routine with one input and one output to deliver that output as the function value? As in

char* itoa(int n)

?

Then, the conversion is such that the digits come out in the wrong order, so that in a separate pass they have to be reversed. Moreover, reverse is called in such a way that, it seems to me, it has to make two passes over the string.

These thoughts need to be quelled right away: OK, we have to guess at a suitable length for the numeral, but one can hardly guess very wrong: after all, an int can only have a very limited range of sizes. And, as for the digits coming out in the wrong order, isn’t it a convenient order and isn’t it a good problem decomposition to have them come out in the convenient order first and use the easiest possible piece of code to obtain the desired order?

Yes, indeed I should have left it alone. One should use the most un-subtle solution to a simple problem, and get on with the job. But just before conceding the point, I would like to point out a subtlety in this supposedly straightforward code. The convention for numerals is not to have leading zeros. This rule has an exception: the numeral for the integer zero does have a leading zero, otherwise it would be the empty string.

The empty string is the mathematically correct representation of the number zero. The problem is that we don’t have a symbol for the empty string. Through the recent centuries the public learned to accept the mathematical subtlety of a digit to represent zero (“nothing”), but unfortunately stopped short of accepting the analogous need for a symbol for an empty string. When you write the usual “while … do” in itoa you will get the empty string for the integer zero. Only by using the unusual “do … while” do you get the required exception. In the interest of straightforward coding, I would prefer to catch the exceptional case up front and let the rest of the code follow what is natural from a mathematical point of view.

I rewrote itoa to accommodate the above gripes, knowing that I can’t charge the time so spent to the Project. The result may strike you as unnecessary show-off: harder thinking than necessary for a task that can be as simple as demonstrated by Kernighan and Ritchie. Compare it to a flip-kick in soccer: necessary in tight situations, with success meriting a big applause. But if a player would do one without due necessity, he would deserve a Boo. What I’m doing in the sequel is like an uncalled-for flip-kick. It is educational (ambitious soccer players practice flip-kicks), so get students to write it. Or add it to your repertoire for an in-depth interview.
In his comment of March 9, Michael Levy suggested a preliminary pass to find out how many digits the numeral has, then allocate, then the digits can be written from right to left, as in the following:


char* mrl_itoa(int n) {
  char* result;
  if (n == 0) {
    result = calloc(2,sizeof(char));
    result[0] = '0';
    return result;
  }
  int neg; if (neg = n0
  int m = n, numDgts = 0, k;
  do {m = m/10; numDgts++;} while (m > 0);
  // numDgts is the number of digits of positive n
  result =
    calloc(k = numDgts  // for digits of n
             + neg       // for sign, if present
             + 1         // for terminating null
           , sizeof(char));
  m = n;
  for(k = k-2; k >= neg; k--) {
    result[k] = '0' + m%10; m = m/10;
  }
  if (neg) result[0] = '-';
  return result;
}

But do we really have to make two passes? First, why do the digits seem to want to come out in the wrong order? The answer is another question: Why do we think that iteration is the only way to do repetition? There is another way: recursion. It is has the advantage in this case that we do have control over the order in which the digits are printed.

To make them come out the wrong way, write


void itoa(int n) {
  if (n == 0) return;
  print(n%10); itoa(n/10);
}

This reads like: to print out the digits in reverse order, print the last followed by the remaining digits in reverse order. Now it’s obvious what to do out about it:


void mitoa(int n) {
  if (n == 0) return;  mitoa(n/10); print(n%10);
}

This reads like: to print out the digits in order, print in order all digits except the last followed by the last digit. The attraction of this scheme is that the digits are immediately deposited in the right place.

Let us now take care of catching the digits in a string instead of printing them as soon as they are generated. mitoa() has the additional advantage that at the end of the recursion it is known how many digits there are. At that point we can allocate the right number of characters. We modify the function to return the location in the string where the next character is to be deposited. Or, rather, we define an auxiliary function to do this.


int mitoa1(int n, int depth, char** s) {
// write numeral from s[depth] onward
// return index for next digit
  if (n == 0) {
    *s = (char *) malloc(sizeof(char)*(depth+1));
    *(*s + depth) = '';
    return 0;
  }
  int i = mitoa1(n/10, ++depth, s);
  *(*s + i) = '0' + n%10;
  return i+1;
}
char* mitoa(int n) {
  char* t;
  mitoa1(n, 0, &t);
  return t;
}

Look: nice and short; no reversing. So far the pretty part. Now for the uglies: the minus sign in case of a negative number. Here I admit that writing the digits first in reverse order has an advantage: you can just stick the minus sign on at the end, before the reversal. In my approach the minus sign has to be deposited once and for all in the right place. So, for the first character I need to distinguish between printing a negative or a positive numeral. Another special case.


int mitoa1(int neg, int n, int depth, char** s) {
// write numeral from s[depth] onward
// return index for next digit
  if (n == 0) {
    if (neg) {
      *s = (char *) malloc(sizeof(char)*(depth+2));
      *(*s + depth+1) = '';
      return 1; // leave room for minus sign at index 0
    } else {
      *s = (char *) malloc(sizeof(char)*(depth+1));
      *(*s + depth) = '';
      return 0;
    }
  }
  int i = mitoa1(neg, n/10, ++depth, s);
  *(*s + i) = '0' + n%10;
  return i+1;
}
char* mitoa(int n) {  char* t;
// will point to the digits of n
  if (n == 0) { // the special case
    t = (char*) malloc(sizeof(char) * 2);
    t[0] = '0'; t[1] = '';
    return t;
  }   // the general case
  int neg = (n<0)?1:0;
  if (neg) n = -n;
  mitoa1(neg, n, 0, &t);
  if (neg) *t = '-';
  return t;
}

The following improvements have been implemented.

  1. The storage allocation is done by the function rather than by the user.
  2. The function gives its result as the returned value rather than as
    a modification to an output parameter.
  3. Exactly the right number of characters is allocated.
  4. The computed digits are put down in the right place once and for all.

But is this really a one-pass algorithm? It relies on finding the length of the numeral on the way down in the recursion and on writing the digits on the way back up. And a function call per digit may be more expensive than Michael Levy’s division per digit for finding the length.

One Response to “Rewriting a Utility”

  1. Michael Says:

    We assume that division is slow, but is it? Why not compute the size required directly? (Also, name the function so that the caller is alerted to release the memory):


    char* createStringFrom(int n)
    {
    int neg = n0) ++digits;
    if(neg)++digits;
    char* result = malloc(sizeof(char)*(digits+1));
    result[digits] = 0;
    int k;
    int e = neg?1:0;
    for(k=digits-1;k>=e;--k)
    {
    char c = '0' + n%10;
    n = n/10;
    result[k] = c;
    }
    if(neg)
    {
    result[0] = '-';
    }
    return result;
    }

Leave a Reply