Extracting email addresses, again

brian_d_foy on 2004-07-03T19:18:26

I figured there must be an easier way to extract email addresses then sorting through a file full of emails. The problem is more process than technology. I want to be involved in the process as little as possible without letting a program possibly clobber a bunch of data.

Since I use PINE, I went looking for some way to pipe messages to external programs, and indeed it has one. I needed to enable the enable-unix-pipe-cmd command, and then with the | key I can send the entire message to whatever program I like.

I created this little program I named "f" to extract the From address and store it in a file. I'm not ready to let it directly add the address to my white list, so for I leave the addresses in a different file.

#!/usr/bin/perl

while( <> )
	{
	next unless /^From:/;
	chomp;
	s/^From:\s+//; s/.*<(.*@.*)>.*/$1/;
	if( open my( $fh ), ">> $ENV{HOME}/mail/fix-addresses" )
		{
		print $fh "$_\n";
		}
	else
		{
		print "$_: Problem: $!";
		}
	last;
	}

I named the program "f" so I wouldn't have to type a lot when I have to tell PINE which program to use. PINE does remember the last program I specified, but once I quit, it forgets it. Oh well.

Now, I don't have to separate the falsely tagged spam then deal with them later. I can extract the From address right away, then add them to the white list later. If I wanted to get fancy, there would be a database somewhere in all of this, but I have real work to do.

Excuse the nitpicking, I can't help it

Aristotle on 2004-07-04T00:45:33

The following bit still irks me :)

s/.*<(.*@.*)>.*/$1/;

It's one of the things that constantly annoys me, like a pebble in the shoe, when I write sed scripts. What I believe it really should phrased as is

$_ = $1 if /<(.*@.*)>/;

Re:Excuse the nitpicking, I can't help it

brian_d_foy on 2004-07-04T03:44:43
"Should" is strong phrasing. If that's the way you like to do things in sed, that's fine, but around these parts there is more than one way to do it.

But this is open source, so if you decide to use the script, you can change it to anything that you like. :)

Re:Excuse the nitpicking, I can't help it

Aristotle on 2004-07-04T11:47:31
What I'm saying is that I can't do it this way in sed, so I'm forced to repeat myself: "find anything followed by the bit I want followed by anything and replace it by the bit I want". At times, I've pined for a crop() function (in sed more so than in Perl, of course, but the Perl verbiage can get old as well).

Ready-cooked software

Simon on 2004-07-04T08:38:55

I use a program called the "little brother database" (lbdb in Debian) to handle this for me.

'Pine' Piping

Smylers on 2004-07-09T19:37:40

To get round 'Pine' only remembering one pipe command, and forgetting that when you quit, you can use its print functionality instead.

The definition of a printer in 'Pine' can just be a command to pipe stuff to; obviously you're supposed to put commands like lpr in there, but there's no reason why you can't set up anything else as a printer.

When I was a 'Pine' user I had gvim as my default printer.

Smylers

Re:'Pine' Piping

brian_d_foy on 2004-07-09T19:55:09
Ah, very cool indeed. Thanks :)

I never thought of that because my PINE machine does not have a printer that I can use, so I never bothered to look into that.