Rewriting a Utility

I can’t leave well enough alone. The most recent manifestation of this problem is the following program from Kernighan and Ritchie (“The C Programming Language”), which is perfectly OK code for converting a number to the equivalent decimal numeral.

void itoa(int n, char s[]) {
  int i, sign;
  if ((sign = n) < 0) /* record sign */
  n = -n; /* make n positive */ i = 0;
  do { /* generate digits in reverse order */
    s[i++] = n % 10 + '0'; /* get next digit */ }
  while ((n /= 10) > 0);     /* delete it */
  if (sign < 0) s[i++] = '-';
  s[i] = '\0';
  reverse(s);
}

 

Presumably this requires something like the function below, which I list so that lengths of code can be compared.

void reverse(char s[]) {
  int i = 0, j = 0;
  while (s[j++] != '\0') ;
  j -= 2;
  while (i < j) {
    char temp = s[i];
    s[i] = s[j]; s[j] = temp;
    i++; j--;
  }
}

What is it that I don’t like about it? For one, it is left up to the user to supply a sufficiently long string. And wouldn’t it be more natural for a routine with one input and one output to deliver that output as the function value? As in

char* itoa(int n)

?

Then, the conversion is such that the digits come out in the wrong order, so that in a separate pass they have to be reversed. Moreover, reverse is called in such a way that, it seems to me, it has to make two passes over the string.

These thoughts need to be quelled right away: OK, we have to guess at a suitable length for the numeral, but one can hardly guess very wrong: after all, an int can only have a very limited range of sizes. And, as for the digits coming out in the wrong order, isn’t it a convenient order and isn’t it a good problem decomposition to have them come out in the convenient order first and use the easiest possible piece of code to obtain the desired order?

Yes, indeed I should have left it alone. One should use the most un-subtle solution to a simple problem, and get on with the job. But just before conceding the point, I would like to point out a subtlety in this supposedly straightforward code. The convention for numerals is not to have leading zeros. This rule has an exception: the numeral for the integer zero does have a leading zero, otherwise it would be the empty string.

The empty string is the mathematically correct representation of the number zero. The problem is that we don’t have a symbol for the empty string. Through the recent centuries the public learned to accept the mathematical subtlety of a digit to represent zero (“nothing”), but unfortunately stopped short of accepting the analogous need for a symbol for an empty string. When you write the usual “while … do” in itoa you will get the empty string for the integer zero. Only by using the unusual “do … while” do you get the required exception. In the interest of straightforward coding, I would prefer to catch the exceptional case up front and let the rest of the code follow what is natural from a mathematical point of view.

I rewrote itoa to accommodate the above gripes, knowing that I can’t charge the time so spent to the Project. The result may strike you as unnecessary show-off: harder thinking than necessary for a task that can be as simple as demonstrated by Kernighan and Ritchie. Compare it to a flip-kick in soccer: necessary in tight situations, with success meriting a big applause. But if a player would do one without due necessity, he would deserve a Boo. What I’m doing in the sequel is like an uncalled-for flip-kick. It is educational (ambitious soccer players practice flip-kicks), so get students to write it. Or add it to your repertoire for an in-depth interview.

In his comment of March 9, Michael Levy suggested a preliminary pass to find out how many digits the numeral has, then allocate, then the digits can be written from right to left, as in the following:

char* mrl_itoa(int n) {
  char* result;
  if (n == 0) {
    result = calloc(2,sizeof(char));
    result[0] = '0';
    result[1] = '\0';
    return result;
  }
  int neg = 0>n; if (neg) n = -n;
  int m = n, numDgts = 0, k;
  do {m = m/10; numDgts++;} while (m > 0);
  // numDgts is the number of digits of positive n
  result =
    calloc(k = numDgts  // for digits of n
           + neg        // for sign, if present
           + 1          // for terminating null
          , sizeof(char));
  m = n;
  for(k = k-2; k >= neg; k--) {
    result[k] = '0' + m%10; m = m/10;
  }
  if (neg) result[0] = '-';
  return result;
}

But do we really have to make two passes? First, why do the digits seem to want to come out in the wrong order? The answer is another question: Why do we think that iteration is the only way to do repetition? There is another way: recursion. It has the advantage in this case that we have control over the order in which the digits are printed.

To make them come out the wrong way, write

void itoa(int n) {
  if (n == 0) return;
  print(n%10); itoa(n/10);
}

This reads like: to print out the digits in reverse order, print the last followed by the remaining digits in reverse order. Now it’s obvious what to do about it:

void mitoa(int n) {
  if (n == 0) return;
  mitoa(n/10); print(n%10);
}

This reads like: to print out the digits in order, print in order all digits except the last followed by the last digit. The attraction of this scheme is that the digits are immediately deposited in the right place.

Let us now take care of catching the digits in a string instead of printing them as soon as they are generated. mitoa() has the additional advantage that at the end of the recursion it is known how many digits there are. At that point we can allocate the right number of characters. We modify the function to return the location in the string where the next character is to be deposited. Or, rather, we define an auxiliary function to do this.

int mitoa1(int n, int depth, char** s) {
// write numeral from s[depth] onward
// return index for next digit
  if (n == 0) {
    *s = (char *) malloc(sizeof(char)*(depth+1));
    *(*s + depth) = '\0';
    return 0;
  }
  int i = mitoa1(n/10, ++depth, s);
  *(*s + i) = '0' + n%10;
  return i+1;
}
char* mitoa(int n) {
  char* t;
  mitoa1(n, 0, &t);
  return t;
}

Look: nice and short; no reversing. So far the pretty part. Now for the uglies: the minus sign in case of a negative number. Here I admit that writing the digits first in reverse order has an advantage: you can just stick the minus sign on at the end, before the reversal. In my approach the minus sign has to be deposited once and for all in the right place. So, for the first character I need to distinguish between printing a negative or a positive numeral. Another special case.

int mitoa1(int neg, int n, int depth, char** s) {
// write numeral from s[depth] onward
// return index for next digit
  if (n == 0) {
    if (neg) {
      *s = (char *) malloc(sizeof(char)*(depth+2));
      *(*s + depth+1) = '\0';
      return 1; // leave room for minus sign at index 0
    } else {
      *s = (char *) malloc(sizeof(char)*(depth+1));
      *(*s + depth) = '\0';
      return 0;
    }
  }
  int i = mitoa1(neg, n/10, ++depth, s);
  *(*s + i) = '0' + n%10;
  return i+1;
}
char* mitoa(int n) { char* t;
// will point to the digits of n
  if (n == 0) { // the special case
    t = (char*) malloc(sizeof(char) * 2);
    t[0] = '0'; t[1] = '\0';
    return t;
  }   // the general case
  int neg = (0>n)?1:0;
  if (neg) n = -n;
  mitoa1(neg, n, 0, &t);
  if (neg) *t = '-';
  return t;
}

The following improvements have been implemented.

  1. The storage allocation is done by the function rather than by the user.
  2. The function gives its result as the returned value rather than as a modification to an output parameter.
  3. Exactly the right number of characters is allocated.
  4. The computed digits are placed in the right spot once and for all.
  5. The anomalous status of the integer zero has been made explicit.

But is this really a one-pass algorithm? It relies on finding the length of the numeral on the way down in the recursion and on writing the digits on the way back up. And a function call per digit may be more expensive than Levy’s division per digit for finding the length.

2 Responses to “Rewriting a Utility”

  1. Michael Says:

    We assume that division is slow, but is it? Why not compute the size required directly? (Also, name the function so that the caller is alerted to release the memory):


    char* createStringFrom(int n)
    {
    int neg = n0) ++digits;
    if(neg)++digits;
    char* result = malloc(sizeof(char)*(digits+1));
    result[digits] = 0;
    int k;
    int e = neg?1:0;
    for(k=digits-1;k>=e;--k)
    {
    char c = '0' + n%10;
    n = n/10;
    result[k] = c;
    }
    if(neg)
    {
    result[0] = '-';
    }
    return result;
    }

  2. Yigal rachman Says:

    Elegant indeed!

    My one reservation is your use of malloc, which is, IMHO, an invitation to memory leaks, because a lot of callers are going to forget to call ‘free’, often because the result is passed to another place in the program where it is soon impossible to know when its memory becomes reclaimable. I have no doubt that this was a consideration when K and R coded their version.

    With a modern c compiler (but not with the K and R versions), one can instead return by *value* a struct containing the result, particularly as the maximum size is known and is just a few bytes. I have often used this method and it works very elegantly. As a bonus, it is very fast.

    The really serious beef I have with the K and R version is that it is highly unsafe because, as with many of their original library functions, the caller cannot pass the size of the buffer to the function, so the function writes into memory that is presumed to be occupied by the buffer, but has no way of verifying when it has reached its limit, which, when exceed, will cause all hell to break loose, often leaving behind so much corruption that the originating bug is almost impossible to find.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: