nth

ziggy on 2005-08-10T13:52:28

Here's a little script I whipped up the other day. I'm grepping through a zillion files, all of which have the same structure but different data. Therefore, an operation like

$ grep '' *.html

produces about a dozen "interesting" lines of output per file. One thing I wanted to do was grab the 3rd such value from each file, and then compare them. Thus was born nth:

#!/usr/bin/perl -w

use strict;

my %values;
my @keys;

my $n = shift(@ARGV) -1;

while (<>) {
    my ($key) = m/^(.*?):/;

    push(@keys, $key) unless exists $values{$key};
    push(@{$values{$key}}, $_);
}

print map {$_->[$n] or "\n"} @values{@keys};

Extracting the interesting bits of data is now as simple as:

$ grep '' *.html | nth 3 | sed -e 'whatever'


defined-or

mary.poppins on 2005-08-10T16:48:28

Rather than "or" you should be using a defined-or, or a length test on the array of the given file.

Otherwise, strings like "0" and "0.0" will not be printed.

Re:defined-or

vsergu on 2005-08-10T18:28:26

He's not chomping, so the newline on the end will make the string true. (Also, "0.0" as a string would be true.)

Re:defined-or

mary.poppins on 2005-08-10T20:31:24

Corrected on both counts. < goes to hide in basement in shame >

Re:defined-or

ziggy on 2005-08-10T20:44:33

Correct. The bug I'm trying to fix is that nth 100 /dev/null doesn't print anything. If you're trying to correlate the results from a list of files, and some files don't produce n lines of data, then you should at least print something, or else there will be a non-deterministic mapping of values. (Specifically, the results from $ join filenames results will be way off...)

Before I knew Perl I would have ...

Limbic Region on 2005-08-10T17:09:05

http://use.perl.org/user/ziggy/journal/26219

$ for $file in '*.html'
> do
> grep '' $file | head -n 3 | tail -1
> done | sort -u | sed -e 'whatever'

Or used awk