While the ls command is a cornerstone of working with the Unix command line, there's one element of the command that's always seemed pointless to me: indicating the size of a directory. When a directory is listed, the program either lists the directory's contents file by file or shows the number of 1,024-byte blocks required for the directory data. A typical entry in an ls -l output might be
drwxrwxr-x 2 taylor taylor 4096 Oct 28 19:07 bin
But that's really not very useful, because what I want to know is how many files are in the specified directory. That's what this script accomplishes, generating a nice multicolumn listing of files and directories that shows file size with file entries and the number of files with directory entries.
The Code
#!/bin/sh
# formatdir - Outputs a directory listing in a friendly and useful format.
gmk()
{
# Given input in Kb, output in Kb, Mb, or Gb for best output format
if [ $1 -ge 1000000 ] ; then
echo "$(scriptbc -p 2 $1 / 1000000)Gb"
elif [ $1 -ge 1000 ] ; then
echo "$(scriptbc -p 2 $1 / 1000)Mb"
else
echo "${1}Kb"
fi
}
if [ $# -gt 1 ] ; then
echo "Usage: $0 [dirname]" >&2; exit 1
elif [ $# -eq 1 ] ; then
cd "$@"
fi
for file in *
do
if [ -d "$file" ] ; then
size=$(ls "$file" | wc -l | sed 's/[^[:digit:]]//g')
if [ $size -eq 1 ] ; then
echo "$file ($size entry)|"
else
echo "$file ($size entries)|"
fi
else
size="$(ls -sk "$file" | awk '{print $1}')"
echo "$file ($(gmk $size))|"
fi
done | \
sed 's/ /^^^/g' | \
xargs -n 2 | \
sed 's/\^\^\^/ /g' | \
awk -F\| '{ printf "%-39s %-39s\n", $1, $2 }'
exit 0
How It Works
One of the most interesting parts of this script is the gmk function, which, given a number in kilobytes, outputs that value in kilobytes, megabytes, or gigabytes, depending on which unit is most appropriate. Instead of having the size of a very large file shown as 2083364KB, for example, this function will instead show a size of 2.08GB. Note that gmk is called with the $() notation in the following line:
echo "$file ($(gmk $size))|"
Because the arguments within the $() sequence are given to a subshell of the running script shell, subshells automatically inherit any functions defined in the running shell.
Near the top of the script, there is also a shortcut that allows users to specify a directory other than the current directory and then changes the current working directory of the running shell script to the desired location, using cd. This follows the mantra of good shell script programming, of course: Where there's a shortcut, there's a better way.
The main logic of this script involves organizing the output into two neat, aligned columns. You can't make a break at spaces in the output stream, because files and directories can have spaces within their names. To get around this problem, the script first replaces each space with a sequence of three carets (^^^). Then it uses the xargs command to merge paired lines so that every two lines become one line separated by a space. Finally, it uses the awk command (rather than paste, which would just intersperse a tab, which rarely, if ever, works out properly because paste doesn't take into account variation in entry width) to output columns in the proper alignment.
Notice how the number of (nonhidden) entries in a directory is easily calculated, with a quick sed invocation cleaning up the output of the wc command:
size=$(ls "$file" | wc -l | sed 's/[^[:digit:]]//g')
Running the Script
For a listing of the current directory, invoke the command without arguments. For information about the contents of a particular directory, specify a directory name as the sole command argument.
The Results
$ formatdir ~
Applications (0 entries) Classes (4Kb)
DEMO (5 entries) Desktop (8 entries)
Documents (38 entries) Incomplete (9 entries)
IntermediateHTML (3 entries) Library (38 entries)
Movies (1 entry) Music (1 entry)
NetInfo (9 entries) Pictures (38 entries)
Public (1 entry) RedHat 7.2 (2.08Gb)
Shared (4 entries) Synchronize! Volume ID (4Kb)
X Desktop (4Kb) automatic-updates.txt (4Kb)
bin (31 entries) cal-liability.tar.gz (104Kb)
cbhma.tar.gz (376Kb) errata (2 entries)
fire aliases (4Kb) games (3 entries)
junk (4Kb) leftside navbar (39 entries)
mail (2 entries) perinatal.org (0 entries)
scripts.old (46 entries) test.sh (4Kb)
testfeatures.sh (4Kb) topcheck (3 entries)
tweakmktargs.c (4Kb) websites.tar.gz (18.85Mb)
Hacking the Script
The GNU version of ls has an -h flag that offers similar functionality. If you have that version of ls available, adding that flag and removing the call to gmk will speed up this script.
The other issue worth considering with this script is whether you happen to have a user who likes to use sequences of three carets in filenames, which could cause some confusion in the output. This naming convention is pretty unlikely, however. A 116,696-file Linux install that I spot-tested didn't have even a single caret within any of its filenames. However, if you really are concerned, you could address this potential pitfall by translating spaces into another sequence of characters that's even less likely to occur in user filenames.