The Joys of Character Processing
(or, Why Character Processing is Annoying)
 
 

Background

Characters such as letters, punctuation marks, and digits are represented using what is known as the ASCII character set.  The ASCII character set is a set of numbers from 0 .. 255, where each number in the set represents a unique character.  For example, the number 48 is used to represent the digit '0', 65 represents the character 'A', and 97 represents the character 'a'.  A full listing of the first 128 characters of the ASCII character set can be found in Appendix D on page 891 of the textbook.

Formatting characters are also represented in the ASCII character set.  The blank space character is ASCII 32, the tab character is ASCII 9 , and the carriage return is represented by 13.  When you look at a text file, every symbol you see and every non visible formatting technique  is represented by an ASCII character.  In what follows the assumption is made that data is being read from the keyboard.

Every keystroke that you hit is recorded in an area of memory called the keyboard buffer.  This is where data is held until needed by a program.  When an input command such as getchar or scanf appears in a C program, the program will first check the keyboard buffer to see if any data is present.  If data is present, the program will extract the data out of the keyboard buffer.  In this way the input command is satisfied.  If, however, there is no data in the buffer,  the program will pause and wait for the user to type something.  Some of you may have experienced a situation where you are typing so quickly that you "get ahead" of a program - you are typing characters faster than the program can process them.  This might occur when entering data into a database, for example.  The characters you are typing are being stored in the keyboard buffer until the program is able to process them.
 

Character Data

There are two types of data that are input into a program:  numeric data and text (character) data.  Reading numeric data is easy, because when the program is told to retrieve numeric data, it will ignore white space to find the numbers that have been typed.  Formatting characters such as tabs, carriage returns, and blank spaces are considered white space.  So, for example, if your program contains the command

scanf("%d %f",&i, &x);

you could type: 15<space>7.8<enter> and the integer 15 would be assigned to i, and 7.8 would be assigned to x.  You could also type: 15<tab>7.8<enter> and you would have the same result.   Or you could even type: 15<enter>7.8<enter> and it would still work.  The scanf function is ignoring the white space in search of the numbers you have told it to read.

However, reading text data is a different story.  In particular, when  reading text data one character at a time  we encounter difficulties.  Since the carriage return is considered a character,  when you type 15<enter>, you have actually typed one number (15) and one character (the carriage return).  If you are only retrieving numeric data, then the carriage return is ignored since white space is ignored.  However, if you are only retrieving character data, then you have a problem.  You will also encounter problems if you are reading both numeric and character data in the same program.  You cannot avoid the problem by not hitting the <enter> key; the <enter> key must be hit to let the program know that data is available to be read.  Luckily, as long as you understand when you will have the problem, it is very easy to deal with.
 

When Does This Occur?

The key is to understand when the problem will occur.  Any time you type the <enter> key, you have typed the ASCII carriage return character.  As was said a moment ago, you cannot avoid hitting the <enter> key.  So you have to make sure your program does not interpret the <enter> key as a valid character input.  Whenever character input follows any other kind of input (either numeric or character), this problem is going to occur.  In order for your program to recognize the first input, you had to hit the <enter> key.  But your program will then think that the carriage return has been supplied as the character input that followed the first input.  To illustrate, let's look at the following  program.

 
int main(void)
{
   int i;
   char ch;

   printf("Enter a number:  ");
   scanf("%d",&i);
   printf("Now enter a character: ");
   scanf("%c",&ch);
   printf("You entered the number %d and the character %c\n",i,ch);
}

When this program is run, the user will be prompted to enter a number.  The scanf function then causes the program to wait for the user to type something (say for example the user enters 15).  From looking at the program, you would expect that the user will next be prompted to enter a character.  The line "Now enter a character" will appear on the screen, but the program will not pause to allow the user to enter anything.  It will keep right on going and display the final line of output, which will look like this: You have entered the number 15 and the character

Why has this occurred?  Because the user provided a character by typing: 15<enter>.  Thus, when the second scanf is encountered, there is already a character sitting in the keyboard buffer.  The scanf function will retrieve the character out of the keyboard buffer and thus be "fed", so there is no need to pause and wait for the user to type something.
 

How Do We Fix This?

You need to eliminate the extra characters that result when the user hits the enter key.  In a sense you tell your program to read those characters so that they are eliminated from the keyboard buffer.  You can do this in a few different ways:

Method #1: Use the assignment suppression option with scanf
When using scanf, there is an option that tells the function to expect a character or a number, but to ignore that character or number.  This is called assignment suppression, and it looks like this:

 scanf("%d%*c",&i);

This command will read one integer and assign it to the variable i.  It will also read and discard whatever ASCII character directly follows the integer in the keyboard buffer.  Thus, when the user types: 15<enter>   the carriage return is discarded.

Method #2: Use a dummy character variable
Simply declare a dummy character variable, then store the carriage return in that dummy variable.  The dummy variable is using up a small amount of memory to hold a value you will never use.

scanf("%d%c",&i,&dummy);



You will need to use one of these techniques every time your program performs character input after any other kind of input.  So you may need to make use of this technique several times in a program.  You will never need to worry about this if your program requires only numeric input.