There are many many modules on CPAN to parse emails. So of course I wrote my own.
I've had this email parser written for a while now, and while it deals nicely with RFC compliant mails I've wanted for a long time to add support for all those annoying RFC deviations - binary headers, binary body parts, etc.
So this week I wrote a test suite, bumped up the perl requirement to 5.8, and started hacking on a test suite that would include lots of problematic emails. Once I'd got the tests together it became fairly easy (though not very easy!). So now I have a mail parser that nicely handles binary headers and decodes them to UTF-8 (as well as doing the same for MIME encoded headers). Unfortunately I can't reveal exactly how it's done as it's proprietary technology (it involves a lot of heuristics, and a lot of evil test emails that we see). But it's been an interesting project.
Interestingly this system now displays more broken emails as the original author intended than any email client I have access to. Perl++!
Re:Oooh...
Matts on 2003-09-14T07:03:26
Unfortunately I don't think so. I can ask about it, but the heuristics are probably considered IP.
The other tragic thing is the API sucks. I wish I could start again with a sane API, but now I have so many applications that use the code I think I'd have to start again with a new module name.Re:Oooh...
jesse on 2003-09-14T07:34:38
Simon's Email::MIME never really got much past the "Design the first part of an API" stage. Perhaps that would be the right vehicle to usurp and make go.