Experts Exchange: How to Validate Input: Alphanumeric Only using Shell script

Users are constantly ignoring directions and entering data that's inconsistent or incorrectly formatted, or that uses incorrect syntax. As a shell script developer, you need to intercept and correct these errors before they become problems.

A typical situation you may encounter in this regard involves filenames or database keys. You prompt the user for a string that's supposed to be made up exclusively of uppercase characters, lowercase characters, and digits. No punctuation, no special characters, no spaces. Did they enter a valid string or not? That's what this script tests.

The Code
#!/bin/sh
# validAlphaNum - Ensures that input consists only of alphabetical
# and numeric characters.

validAlphaNum()
{
# Validate arg: returns 0 if all upper+lower+digits, 1 otherwise

# Remove all unacceptable chars
compressed="$(echo $1 | sed -e 's/[^[:alnum:]]//g')"

if [ "$compressed" != "$input" ] ; then
return 1
else
return 0
fi
}

# Sample usage of this function in a script

echo -n "Enter input: "
read input

if ! validAlphaNum "$input" ; then
echo "Your input must consist of only letters and numbers." >&2
exit 1
else
echo "Input is valid."
fi

exit 0

How It Works
The logic of this script is straightforward. First, it transforms the input with a sed-based transform to create a new version of the input data, and then it compares the new version with the original. If the two versions are the same, all is well. If not, the transform lost data that wasn't part of the acceptable alphanumeric (alphabetic plus numeric) character set, and the input was unacceptable.

Specifically, the sed substitution is for any characters not in the set [:alnum:], the POSIX shorthand for the local definition of all upper-and lowercase characters and digits (alnum stands for alphanumeric). If this new, compressed value doesn't match the input entered earlier, the removal of all the alphanumeric values reveals nonalphanumeric values in the input string (which is illegal) and the function returns a nonzero result, indicating a problem.

Running the Script
This particular script is self-contained. It prompts for input and then informs you whether the result is valid or not. A more typical use of this function, however, would be to include it at the top of another shell script or in a library, as shown in Script #12, Building a Shell Script Library.

This script is a good example of a general shell script programming technique. Write your functions and then test them before you integrate them into larger, more complex scripts. It'll save lots of headaches.

The Results
$ validalnum
Enter input: valid123SAMPLE
Input is valid.
$ validalnum
Enter input: this is most assuredly NOT valid, 12345
Your input must consist of only letters and numbers.

Hacking the Script
This "remove the good characters and then see what's left" approach is nice because it's tremendously flexible. Want to force uppercase letters but also allow spaces, commas, and periods? Simply change the substitution pattern:

sed 's/[^[:upper:] ,.]//g'

A simple test for valid phone number input (allowing integer values, spaces, parentheses, and dashes) could be

sed 's/[^[:digit:]- ]//g'

To force integer values only, though, beware of a pitfall. You might try the following:

sed 's/[^[:digit:]]//g'

But what if you want to permit entry of negative numbers? If you just add the minus sign to the valid character set, -3-4 would be a valid input, though it's clearly not a legal integer. The particular issue of handling negative numbers is addressed in Script #5, Validating Integer Input,