As a result, the script must cache the return information and then search through it once to see if it provides a list of matches and then a second time if it proves to be a summary of the film in question.
The Code
#!/bin/sh
# moviedata - Given a movie title, returns a list of matches, if
# there's more than one, or a synopsis of the movie if there's
# just one. Uses the Internet Movie Database (imdb.com).
imdburl="http://us.imdb.com/Tsearch?restrict=Movies+only&title="
titleurl="http://us.imdb.com/Title?"
tempout="/tmp/moviedata.$$"
summarize_film()
{
# Produce an attractive synopsis of the film
grep "^
grep 'Plot Outline:' $tempout | \
sed 's/<[^>]*>//g;s/(more)//;s/(view trailer)//' |fmt|sed 's/^/ /'
exit 0
}
trap "rm -f $tempout" 0 1 15
if [ $# -eq 0 ] ; then
echo "Usage: $0 {movie title | movie ID}" >&2
exit 1
fi
fixedname="$(echo $@ | tr ' ' '+')" # for the URL
if [ $# -eq 1 ] ; then
nodigits="$(echo $1 | sed 's/[[:digit:]]*//g')"
if [ -z "$nodigits" ] ; then
lynx -source "$titleurl$fixedname" > $tempout
summarize_film
fi
fi
url="$imdburl$fixedname"
lynx -source $url > $tempout
if [ ! -z "$(grep "IMDb title search" $tempout" ] ; then
grep 'HREF="/Title?' $tempout | \
sed 's/
- sed 's/">/ -- /;s/<.*//;s/\/Title?//' | \
sort -u | \
more
else
summarize_film
fi
exit 0
How It Works
This script builds a different URL depending on whether the command argument specified is a film name or an IMDb film ID number, and then it saves the lynx output from the web page to the $tempout file.
If the command argument is a film name, the script then examines $tempout for the string "IMDb title search" to see whether the file contains a list of film names (when more than one movie matches the search criteria) or the description of a single film. Using a complex series of sed substitutions that rely on the source code organization of the IMDb site, it then displays the output appropriately for each of those two possible cases.
Running the Script
Though short, this script is quite flexible with input formats: You can specify a film title in quotes or as separate words. If more than one match is returned, you can then specify the eight-digit IMDb ID value to select a specific match.
The Results
$ moviedata lawrence of arabia
0056172 -- Lawrence of Arabia (1962)
0099356 -- Dangerous Man: Lawrence After Arabia, A (1990) (TV)
0194547 -- With Allenby in Palestine and Lawrence in Arabia (1919)
0245226 -- Lawrence of Arabia (1935)
0363762 -- Lawrence of Arabia: A Conversation with Steven Spielberg (2000) (V)
0363791 -- Making of 'Lawrence of Arabia', The (2000) (V)
$ moviedata 0056172
Lawrence of Arabia (1962)
Plot Outline: British lieutenant T.E. Lawrence rewrites the political
history of Saudi Arabia.
$ moviedata monsoon wedding
Monsoon Wedding (2001)
Plot Outline: A stressed father, a bride-to-be with a secret, a
smitten event planner, and relatives from around the world create
much ado about the preparations for an arranged marriage in India.
Hacking the Script
The most obvious hack to this script would be to get rid of the ugly IMDb movie ID numbers. It would be straightforward to hide the movie IDs (because the IDs as shown are rather unfriendly and prone to mistyping) and have the shell script output a simple menu with unique index values (e.g., 1, 2, 3) that can then be typed in to select a particular film.
A problem with this script, as with most scripts that scrape values from a third-party website, is that if IMDb changes its page layout, the script will break and you'll need to rebuild the script sequence. It's a lurking bug, but with a site like IMDb that hasn't changed in years, probably not a dramatic or dangerous one.