#!/bin/sh
# spelldict - Uses the 'aspell' feature and some filtering to allow easy
# command-line spell-checking of a given input file.
# Inevitably you'll find that there are words it flags as wrong but
# you think are fine. Simply save them in a file, one per line, and
# ensure that the variable 'okaywords' points to that file.
okaywords="$HOME/okaywords"
tempout="/tmp/spell.tmp.$$"
spell="aspell" # tweak as needed
trap "/bin/rm -f $tempout" EXIT
if [ -z "$1" ] ; then
echo "Usage: spell file|URL" >&2; exit 1
elif [ ! -f $okaywords ] ; then
echo "No personal dictionary found. Create one and rerun this command" >&2
echo "Your dictionary file: $okaywords" >&2
exit 1
fi
for filename
do
$spell -a < $filename | \
grep -v '@(#)' | sed "s/\'//g" | \
awk '{ if (length($0) > 15 && length($2) > 2) print $2 }' | \
grep -vif $okaywords | \
grep '[[:lower:]]' | grep -v '[[:digit:]]' | sort -u | \
sed 's/^/ /' > $tempout
if [ -s $tempout ] ; then
sed "s/^/${filename}: /" $tempout
fi
done
exit 0
How It Works
Following the model of the Microsoft Office spell-checking feature, this script not only supports a user-defined dictionary of correctly spelled words that the spell-checking program would otherwise think are wrong, it also ignores words that are in all uppercase (because they're probably acronyms) and words that contain a digit.
This particular script is written to use aspell, which interprets the -a flag to mean that it's running in pass-through mode, in which it reads stdin for words, checks them, and outputs only those that it believes are misspelled. The ispell command also requires the -a flag, and many other spell-check commands are smart enough to automatically ascertain that stdin isn't the keyboard and there-fore should be scanned. If you have a different spell-check utility on your system, read the man page to identify which flag or flags are necessary.
Running the Script
This script requires one or more filenames to be specified on the command line.
The Results
First off, with an empty personal dictionary and the excerpt from Alice in Wonderland seen previously, here's what happens:
$ spelldict ragged.txt
ragged.txt: herrself
ragged.txt: teacups
ragged.txt: Gryphon
ragged.txt: clamour
Two of those are not misspellings, so I'm going to add them to my personal spelling dictionary by using the echo command to append them to the okaywords file:
$ echo "Gryphon" >> ~/.okaywords
$ echo "teacups" >> ~/.okaywords
Here are the results of checking the file with the expanded spelling dictionary:
$ spelldict ragged.txt
ragged.txt: herrself
ragged.txt: clamour