How to normalizing Date Formats using shell script

Posted on 7:10 PM by Bharathvn

One problematic issue with shell script development is the number of inconsistent data formats; normalizing them can range from tricky to quite difficult. Date formats are some of the most challenging to work with because a date can be specified in several different ways. Even if you prompt for a specific format, like "month day year," you're likely to be given inconsistent input: a month number instead of a month name, an abbreviation for a month name, or a full name in all uppercase letters.

For this reason, a function that normalizes dates, though rudimentary on its own, will prove to be a very helpful building block for subsequent script work, especially Script #7, Validating Date Formats.

The Code
#!/bin/sh
# normdate -- Normalizes month field in date specification
# to three letters, first letter capitalized. A helper
# function for Script #7, valid-date. Exits w/ zero if no error.

monthnoToName()
{
# Sets the variable 'month' to the appropriate value
case $1 in
1 ) month="Jan" ;; 2 ) month="Feb" ;;
3 ) month="Mar" ;; 4 ) month="Apr" ;;
5 ) month="May" ;; 6 ) month="Jun" ;;
7 ) month="Jul" ;; 8 ) month="Aug" ;;
9 ) month="Sep" ;; 10) month="Oct" ;;
11) month="Nov" ;; 12) month="Dec" ;;
* ) echo "$0: Unknown numeric month value $1" >&2; exit 1
esac
return 0
}

## Begin main script

if [ $# -ne 3 ] ; then
echo "Usage: $0 month day year" >&2
echo "Typical input formats are August 3 1962 and 8 3 2002" >&2
exit 1
fi

if [ $3 -lt 99 ] ; then
echo "$0: expected four-digit year value." >&2; exit 1
fi

if [ -z $(echo $1|sed 's/[[:digit:]]//g') ]; then
monthnoToName $1
else
# Normalize to first three letters, first upper, rest lowercase
month="$(echo $1|cut -c1|tr '[:lower:]' '[:upper:]')"
month="$month$(echo $1|cut -c2-3 | tr '[:upper:]' '[:lower:]')"
fi

echo $month $2 $3

exit 0

How It Works
Notice the third conditional in this script:

if [ -z $(echo $1|sed 's/[[:digit:]]//g') ]; then

It strips out all the digits and then uses the -z test to see if the result is blank or not. If the result is blank, the first input field must be a digit or digits, so it's mapped to a month name with a call to monthnoToName. Otherwise, a complex sequence of cut and tr pipes follows to build the value of month by having two subshell-escaped sequences (that is, sequences surrounded by $(and) so that the enclosed command is invoked and its output substituted). The first of the sequences shown here extracts just the first character and forces it to uppercase with tr. (The sequence echo $1|cut -c1 could also be written as ${1%${1#?}} in the POSIX manner, as seen earlier.) The second of the sequences extracts the second and third characters and forces them to be lowercase:

month="$(echo $1|cut -c1|tr '[:lower:]' '[:upper:]')"
month="$month$(echo $1|cut -c2-3 | tr '[:upper:]' '[:lower:]')"

Running the Script
To ensure maximum flexibility with future scripts that incorporate the normdate functionality, this script was designed to accept input as three fields entered on the command line. If you expected to use this script only interactively, by contrast, you'd prompt the user for the three fields, though that would make it more difficult to invoke normdate from other scripts.

The Results
This script does what we hoped, normalizing date formats as long as the format meets a relatively simple set of criteria (month name known, month value between 1 and 12, and a four-digit year value). For example,

$ normdate 8 3 62
normdate: expected four-digit year value.
$ normdate 8 3 1962
Aug 3 1962
$ normdate AUGUST 3 1962
Aug 3 1962

Hacking the Script
Before you get too excited about the many extensions you can add to this script to make it more sophisticated, check out Script #7, which uses normdate to validate input dates. One modification you could make, however, would be to allow the script to accept dates in the format MM/DD/YYYY or MM-DD-YYYY by adding the following snippet immediately before the test to see if three arguments are specified:

if [ $# -eq 1 ] ; then # try to compensate for / or - formats
set -- $(echo $1 | sed 's/[\/\-]/ /g')
fi

With this modification, you could also enter the following common formats and normalize them too:

$ normdate March-11-1911
Mar 11 1911
$ normdate 8/3/1962
Aug 3 1962