fog - Discover Your Writing's Infathomability

brian_d_foy on 2006-02-25T18:46:00

I was writing an email at work one day, but got to thinking – is this too wordy? I finally remembered that prose readability is often measured with something called a "fog index" (how foggy is your writing?)

A search on Google turned up Merlyn's article "Discovering incomprehensible documentation" for his Nov 2000 Linux Magazine column. His article describes "fog indexing" his manpages using Lingua::EN::Fathom (one of the wonders of CPAN), a module for computing the fog index of a piece of text. But I wanted something more – a simple, general-purpose utility.

So, here is "fog", a simple command-line program for discovering the fog index of text. fog (a thin, tasty wrapper around Lingua::EN::Fathom) can work on either a string, standard input, or a set of files. It is a little rough around the edges, so any suggestions are welcomed.

Here is the source for fog:

#!/usr/bin/perl -w
# List various measures of readability -- "fog indexes".
#
# Just a thin, tasty wrapper around Lingua::EN::Fathom.

# ------ pragmas
use warnings;
use strict;
use Lingua::EN::Fathom;

# ------ variables
my $file_count = 0;                     # count of files to analyze
my $file_name  = "";            # current filename
my $fog_index  = "";                    # fog indexing object (a Lingua::EN::Fathom)
my $ifh        = undef;                 # input file handle
my $input      = "";                    # string to analyze
my @input      = "";                    # array of strings to analyze
my $more_files = 0;                     # TRUE when more than one file to analyze

# ------ process command-line arguments
if (@ARGV < 1) {
    die "usage: fog -|-f FILE1 ...|STRING\n"
}
if ($ARGV[0] eq "-") {
no strict 'subs';
    $ifh = *STDIN;
    @input = <$ifh>;
    $input = join("\n", @input);
    $file_count = 1;
use strict;
} elsif ($ARGV[0] eq "-f" && @ARGV >= 2) {
    $file_count = scalar(@ARGV) - 1;
    if ($file_count > 1) {
        $more_files++;
    }
    shift;
    open($ifh, $ARGV[0]) || die "cannot open $ARGV[0]: $!\n";
    @input = <$ifh>;
    close($ifh);
    $input = join("\n", @input);
    $file_name = $ARGV[0];
} else {
    $input = $ARGV[0];
    $file_count = 1;
}

while ($file_count > 0) {
    $fog_index = new Lingua::EN::Fathom;
    $fog_index->analyse_block($input, 1);
    if ($more_files > 0) {
        print $file_name, ":\n";
    }
    print $fog_index->report, "\n";

    $file_count--;
    if ($file_count > 0) {
        shift;
    $file_name = $ARGV[0];
        open($ifh, $file_name) || die "cannot open $file_name: $!\n";
        @input = <$ifh>;
        close($ifh);
        $input = join("\n", @input);
    }
}


compact that code

Blixtor on 2006-02-27T13:50:19

Your code could be compacted quite a bit, if you change the calling syntax slightly and rely on the while () construct - and the surrounding perl magic :)

Something like the following

if (@ARGV < 1) {
    die "usage: fog -|FILE1 ...|-s STRING\n"
}

my $fog_index = new Lingua::EN::Fathom;

if ($ARGV[0] =~ /^-s$/) {
    # string argument
    $fog_index->analyse_block($ARGV[1], 1);
    print $fog_index->report, "\n";
    exit;
}

local $/; # slurp whole files

while (<>) {
    $fog_index->analyse_block($_, 1);

    if ($ARGV ne '-') {
        print "$ARGV: ";
    }
    print $fog_index->report, "\n";
}

Re:compact that code

stramineous on 2006-02-28T03:02:51

Just curious--what was it in the diction/style package that you wanted to improve?

Re:compact that code

Blixtor on 2006-03-01T11:36:48

Well, there were a couple of points, some of them are probably a matter of personal taste/preferences ...

  • Consistency - I wanted to stick to Mark's command line arguments and allow the string argument. Personally I would have chosen a "clean" 'while ()' type of interface, meaning if I wanted to know the fog index of a string, I would pass it in via the shell with echo "string" | fog. This way the script behaves like a normal unix command line tool in that it can work as a filter for piping or be used to work on (one or more) files given on the command line. The while () was made for exactly that purpose so I try to use it whenever possible for my scripts. That results in the mentioned consistency - and makes the script even shorter ;-).
  • Doing more work than neccesary - where is the benefit of doing
    $ifh = *STDIN;
    @input = <$ifh>;
    $input = join("\n", @input);
    over
    local $/;
    $input = <STDIN>;
    ?
  • Showing the usefulness of $ARGV in connection with the while () loop.
  • Removing unneccesary 'cruft' - why do you need $file_count and $more_files? Why handle the opening, closing, etc. of files given on the command line manually when while () does that automatically for you (including the error messages and more (*))? Why is this handling of files written twice in the code?

(*) The while () loop has built in error handling, it skips e.g. files (with an error message) that it can't open and continues processing the rest of the files.

RE: Lingua::EN::Fathom

altblue on 2006-02-28T05:39:55

Thanks for noticing Lingua::EN::Fathom ;-)

As for that tiny wrapper, it doesn't look like trying to prove anything (more) about it's capabilities (comparing to Merlyn's article), everything besides Fathom's "report" being just "bloat".

Some simple one-liners could have been enough to demonstrate Lingua:EN::Fathom's usefulness:

# fog report for english_textfile:
perl -MLingua::EN::Fathom -le 'print Lingua::EN::Fathom->new->analyse_file(shift)->report;' english_textfile

# fog report, "pipe" style:
perl -MLingua::EN::Fathom -le 'undef $/; print Lingua::EN::Fathom->new->analyse_block(<>)->report;' < english_textfile

# fog report for multiple files:
perl -MLingua::EN::Fathom -le 'print Lingua::EN::Fathom->new->analyse_file($_)->report for @ARGV' english_textfile1 english_textfile2 english_textfile3

etc