fog - Discover Your Writing's Infathomability

brian_d_foy on 2006-02-25T18:46:00

I was writing an email at work one day, but got to thinking – is this too wordy? I finally remembered that prose readability is often measured with something called a "fog index" (how foggy is your writing?)

A search on Google turned up Merlyn's article "Discovering incomprehensible documentation" for his Nov 2000 Linux Magazine column. His article describes "fog indexing" his manpages using Lingua::EN::Fathom (one of the wonders of CPAN), a module for computing the fog index of a piece of text. But I wanted something more – a simple, general-purpose utility.

So, here is "fog", a simple command-line program for discovering the fog index of text. fog (a thin, tasty wrapper around Lingua::EN::Fathom) can work on either a string, standard input, or a set of files. It is a little rough around the edges, so any suggestions are welcomed.

Here is the source for fog:

#!/usr/bin/perl -w # List various measures of readability -- "fog indexes". # # Just a thin, tasty wrapper around Lingua::EN::Fathom. # ------ pragmas use warnings; use strict; use Lingua::EN::Fathom; # ------ variables my $file_count = 0; # count of files to analyze my $file_name = ""; # current filename my $fog_index = ""; # fog indexing object (a Lingua::EN::Fathom) my $ifh = undef; # input file handle my $input = ""; # string to analyze my @input = ""; # array of strings to analyze my $more_files = 0; # TRUE when more than one file to analyze # ------ process command-line arguments if (@ARGV < 1) { die "usage: fog -|-f FILE1 ...|STRING\n" } if ($ARGV[0] eq "-") { no strict 'subs'; $ifh = *STDIN; @input = <$ifh>; $input = join("\n", @input); $file_count = 1; use strict; } elsif ($ARGV[0] eq "-f" && @ARGV >= 2) { $file_count = scalar(@ARGV) - 1; if ($file_count > 1) { $more_files++; } shift; open($ifh, $ARGV[0]) || die "cannot open $ARGV[0]: $!\n"; @input = <$ifh>; close($ifh); $input = join("\n", @input); $file_name = $ARGV[0]; } else { $input = $ARGV[0]; $file_count = 1; } while ($file_count > 0) { $fog_index = new Lingua::EN::Fathom; $fog_index->analyse_block($input, 1); if ($more_files > 0) { print $file_name, ":\n"; } print $fog_index->report, "\n"; $file_count--; if ($file_count > 0) { shift; $file_name = $ARGV[0]; open($ifh, $file_name) || die "cannot open $file_name: $!\n"; @input = <$ifh>; close($ifh); $input = join("\n", @input); } }

compact that code

Blixtor on 2006-02-27T13:50:19

Your code could be compacted quite a bit, if you change the calling syntax slightly and rely on the while () construct - and the surrounding perl magic :)

Something like the following

if (@ARGV < 1) { die "usage: fog -|FILE1 ...|-s STRING\n" } my $fog_index = new Lingua::EN::Fathom; if ($ARGV[0] =~ /^-s$/) { # string argument $fog_index->analyse_block($ARGV[1], 1); print $fog_index->report, "\n"; exit; } local $/; # slurp whole files while (<>) { $fog_index->analyse_block($_, 1); if ($ARGV ne '-') { print "$ARGV: "; } print $fog_index->report, "\n"; }

Re:compact that code

stramineous on 2006-02-28T03:02:51
Just curious--what was it in the diction/style package that you wanted to improve?

Re:compact that code

Blixtor on 2006-03-01T11:36:48

Well, there were a couple of points, some of them are probably a matter of personal taste/preferences ...

Consistency - I wanted to stick to Mark's command line arguments and allow the string argument. Personally I would have chosen a "clean" 'while ()' type of interface, meaning if I wanted to know the fog index of a string, I would pass it in via the shell with echo "string" | fog. This way the script behaves like a normal unix command line tool in that it can work as a filter for piping or be used to work on (one or more) files given on the command line. The while () was made for exactly that purpose so I try to use it whenever possible for my scripts. That results in the mentioned consistency - and makes the script even shorter ;-).

Doing more work than neccesary - where is the benefit of doing

$ifh = *STDIN; @input = <$ifh>; $input = join("\n", @input);

over

local $/; $input = <STDIN>;

?

Showing the usefulness of $ARGV in connection with the while () loop.

Removing unneccesary 'cruft' - why do you need $file_count and $more_files? Why handle the opening, closing, etc. of files given on the command line manually when while () does that automatically for you (including the error messages and more (*))? Why is this handling of files written twice in the code?

(*) The while () loop has built in error handling, it skips e.g. files (with an error message) that it can't open and continues processing the rest of the files.

RE: Lingua::EN::Fathom

altblue on 2006-02-28T05:39:55

Thanks for noticing Lingua::EN::Fathom ;-)

As for that tiny wrapper, it doesn't look like trying to prove anything (more) about it's capabilities (comparing to Merlyn's article), everything besides Fathom's "report" being just "bloat".

Some simple one-liners could have been enough to demonstrate Lingua:EN::Fathom's usefulness:

# fog report for english_textfile: perl -MLingua::EN::Fathom -le 'print Lingua::EN::Fathom->new->analyse_file(shift)->report;' english_textfile # fog report, "pipe" style: perl -MLingua::EN::Fathom -le 'undef $/; print Lingua::EN::Fathom->new->analyse_block(<>)->report;' < english_textfile # fog report for multiple files: perl -MLingua::EN::Fathom -le 'print Lingua::EN::Fathom->new->analyse_file($_)->report for @ARGV' english_textfile1 english_textfile2 english_textfile3

etc