How to Debugging Shell Scripts

Posted on 7:00 PM by Bharathvn

Although this section does not contain a true script per se, it's a good place to spend a few pages talking about some of the basics of debugging and developing shell scripts, because it's a sure bet that bugs are going to creep in!

The best debugging strategy I have found is to build scripts incrementally. Some script programmers have a high degree of optimism that everything will work right the first time, but I find that starting small, on a modest scale, can really help move things along. Additionally, liberal use of echo statements to track variables, and using the -x flag to the shell for displaying debugging output, are quite useful. To see these in action, let's debug a simple number-guessing game.

The Code
#!/bin/sh
# hilow -- A simple number-guessing game

biggest=100 # maximum number possible
guess=0 # guessed by player
guesses=0 # number of guesses made
number=$(($$ % $biggest) # random number, between 1 and $biggest

while [ $guess -ne $number ] ; do
echo -n "Guess? " ; read answer
if [ "$guess" -lt $number ] ; then
echo "... bigger!"
elif [ "$guess" -gt $number ] ; then
echo "... smaller!
fi
guesses=$(($guesses + 1))
done

echo "Right!! Guessed $number in $guesses guesses."

exit 0


Running the Script
The first step in debugging this game is to test and ensure that the number generated will be sufficiently random. To do this, we take the process ID of the shell in which the script is run, using the $$ notation, and reduce it to a usable range using the % mod function. To test the function, enter the commands into the shell directly:

$ echo $(($$ % 100))
5
$ echo $(($$ % 100))
5
$ echo $(($$ % 100))
5

It worked, but it's not very random. A moment's thought reveals why that is: When the command is run directly on the command line, the PID is always the same. When run in a script, the command is in a different subshell each time, so the PID varies.

The next step is to add the basic logic of the game. A random number between 1 and 100 is generated, the player makes guesses at the number, and after each guess the player is told whether the guess is too high or too low until he or she figures out what number it is. After entering all the basic code, it's time to run the script and see how it goes, using exactly the code just shown, warts and all:

$ hilow
./013-hilow.sh: line 19: unexpected EOF while looking for matching `"'
./013-hilow.sh: line 22: syntax error: unexpected end of file

Ugh; the bane of shell script developers: an unexpected EOF. To understand what this message means, recall that quoted passages can contain newlines, so just because the error is flagged on line 19 doesn't mean that it's actually there. It simply means that the shell read merrily along, matching quotes (incorrectly) until it hit the very last quote, at which point it realized something was amiss. In fact, line 19 is perfectly fine:

$ sed -n 19p hilow
echo "Right!! Guessed $number in $guesses guesses."

The problem, therefore, must be earlier in the script. The only really good thing about the error message from the shell is that it tells you which character is mismatched, so I'll use grep to try to extract all lines that have a quote and then screen out those that have two quotes:

$ grep '"' 013-hilow.sh | egrep -v '.*".*".*'
echo "... smaller!


That's it: The close quote is missing. It's easily fixed, and we're ready to go:

$ hilow
./013-hilow.sh: line 7: unexpected EOF while looking for matching `)'
./013-hilow.sh: line 22: syntax error: unexpected end of file

Nope. Another problem. Because there are so few parenthesized expressions in the script, I can eyeball this problem and ascertain that somehow the closing parenthesis of the instantiation of the random number was mistakenly truncated, as the following line shows:

number=$(( $$ % $biggest ) # random number, between 1 and $biggest

This is fixed by adding the closing parenthesis. Now are we ready to try this game? Let's find out:

$ hilow
Guess? 33
... bigger!
Guess? 66
... bigger!
Guess? 99
... bigger!
Guess? 100
... bigger!
Guess? ^C

Because 100 is the maximum possible value, there seems to be a logic error in the code. These errors are particularly tricky because there's no fancy grep or sed invocation to identify the problem. Look back at the code and see if you can identify what's going wrong.

To try and debug this, I'm going to add a few echo statements in the code to output the number chosen and verify that what I entered is what's being tested. The relevant section of the code is

echo -n "Guess? " ; read answer
if [ "$guess" -lt $number ] ; then

In fact, as I modified the echo statement and looked at these two lines, I realized the error: The variable being read is answer, but the variable being tested is called guess. A bonehead error, but not an uncommon one (particularly if you have oddly spelled variable names). To fix this, I change read answer to read guess.

The Results
Finally, it works as expected.

$ hilow
Guess? 50
... bigger!
Guess? 75
... bigger!
Guess? 88
... smaller!
Guess? 83
... smaller!
Guess? 80
... smaller!
Guess? 77
... bigger!
Guess? 79
Right!! Guessed 79 in 7 guesses.

Hacking the Script
The most grievous bug lurking in this little script is that there's no checking of input. Enter anything at all other than an integer and the script spews up bits and fails. Including a rudimentary test could be as easy as adding the following lines of code:

if [ -z "$guess" ] ; then
echo "Please enter a number. Use ^C to quit"; continue;