I was writing an email at work one day, but got to thinking – is this too wordy? I finally remembered that prose readability is often measured with something called a "fog index" (how foggy is your writing?)
A search on Google turned up Merlyn's article "Discovering incomprehensible documentation" for his Nov 2000 Linux Magazine column. His article describes "fog indexing" his manpages using Lingua::EN::Fathom (one of the wonders of CPAN), a module for computing the fog index of a piece of text. But I wanted something more – a simple, general-purpose utility.
So, here is "fog", a simple command-line program for discovering the fog index of text. fog (a thin, tasty wrapper around Lingua::EN::Fathom) can work on either a string, standard input, or a set of files. It is a little rough around the edges, so any suggestions are welcomed.
Here is the source for fog:
#!/usr/bin/perl -w
# List various measures of readability -- "fog indexes".
#
# Just a thin, tasty wrapper around Lingua::EN::Fathom.
# ------ pragmas
use warnings;
use strict;
use Lingua::EN::Fathom;
# ------ variables
my $file_count = 0; # count of files to analyze
my $file_name = ""; # current filename
my $fog_index = ""; # fog indexing object (a Lingua::EN::Fathom)
my $ifh = undef; # input file handle
my $input = ""; # string to analyze
my @input = ""; # array of strings to analyze
my $more_files = 0; # TRUE when more than one file to analyze
# ------ process command-line arguments
if (@ARGV < 1) {
die "usage: fog -|-f FILE1 ...|STRING\n"
}
if ($ARGV[0] eq "-") {
no strict 'subs';
$ifh = *STDIN;
@input = <$ifh>;
$input = join("\n", @input);
$file_count = 1;
use strict;
} elsif ($ARGV[0] eq "-f" && @ARGV >= 2) {
$file_count = scalar(@ARGV) - 1;
if ($file_count > 1) {
$more_files++;
}
shift;
open($ifh, $ARGV[0]) || die "cannot open $ARGV[0]: $!\n";
@input = <$ifh>;
close($ifh);
$input = join("\n", @input);
$file_name = $ARGV[0];
} else {
$input = $ARGV[0];
$file_count = 1;
}
while ($file_count > 0) {
$fog_index = new Lingua::EN::Fathom;
$fog_index->analyse_block($input, 1);
if ($more_files > 0) {
print $file_name, ":\n";
}
print $fog_index->report, "\n";
$file_count--;
if ($file_count > 0) {
shift;
$file_name = $ARGV[0];
open($ifh, $file_name) || die "cannot open $file_name: $!\n";
@input = <$ifh>;
close($ifh);
$input = join("\n", @input);
}
}
Your code could be compacted quite a bit, if you change the calling syntax slightly and rely on the while ()
construct - and the surrounding perl magic
Something like the following
if (@ARGV < 1) {
die "usage: fog -|FILE1...|-s STRING\n"
}
my $fog_index = new Lingua::EN::Fathom;
if ($ARGV[0] =~/^-s$/) {
# string argument
$fog_index->analyse_block($ARGV[1], 1);
print $fog_index->report, "\n";
exit;
}
local $/; # slurp whole files
while (<>) {
$fog_index->analyse_block($_, 1);
if ($ARGV ne '-') {
print "$ARGV: ";
}
print $fog_index->report, "\n";
}
Re:compact that code
stramineous on 2006-02-28T03:02:51
Just curious--what was it in the diction/style package that you wanted to improve?Re:compact that code
Blixtor on 2006-03-01T11:36:48
Well, there were a couple of points, some of them are probably a matter of personal taste/preferences
...
- Consistency - I wanted to stick to Mark's command line arguments and allow the string argument. Personally I would have chosen a "clean" '
while ()
' type of interface, meaning if I wanted to know the fog index of a string, I would pass it in via the shell withecho "string" | fog
. This way the script behaves like a normal unix command line tool in that it can work as a filter for piping or be used to work on (one or more) files given on the command line. Thewhile ()
was made for exactly that purpose so I try to use it whenever possible for my scripts. That results in the mentioned consistency - and makes the script even shorter;-). - Doing more work than neccesary - where is the benefit of doing
over$ifh = *STDIN;
@input = <$ifh>;
$input = join("\n", @input);?local $/;
$input = <STDIN>;- Showing the usefulness of $ARGV in connection with the
while ()
loop.- Removing unneccesary 'cruft' - why do you need $file_count and $more_files? Why handle the opening, closing, etc. of files given on the command line manually when
while ()
does that automatically for you (including the error messages and more (*))? Why is this handling of files written twice in the code?(*) The
while ()
loop has built in error handling, it skips e.g. files (with an error message) that it can't open and continues processing the rest of the files.
Thanks for noticing Lingua::EN::Fathom
As for that tiny wrapper, it doesn't look like trying to prove anything (more) about it's capabilities (comparing to Merlyn's article), everything besides Fathom's "report" being just "bloat".
Some simple one-liners could have been enough to demonstrate Lingua:EN::Fathom's usefulness:
# fog report for english_textfile:
perl -MLingua::EN::Fathom -le 'print Lingua::EN::Fathom->new->analyse_file(shift)->report;' english_textfile
# fog report, "pipe" style:
perl -MLingua::EN::Fathom -le 'undef $/; print Lingua::EN::Fathom->new->analyse_block(<>)->report;' < english_textfile
# fog report for multiple files:
perl -MLingua::EN::Fathom -le 'print Lingua::EN::Fathom->new->analyse_file($_)->report for @ARGV' english_textfile1 english_textfile2 english_textfile3
etc