message de-duping in exim filters?

nicholas on 2007-01-03T15:52:44

Dear lazyweb...

There's a rather simple recipe in the procmail examples for filtering out duplicate e-mail messages using their message ID:

:0 Whc: msgid.lock
| formail -D 8192 msgid.cache

:0 a:
duplicates

I'm wondering, and Google fails me, is there an easy way to achieve this same effect* in an exim filter?

* Or something close. Specifically the ability to remember what message IDs have already been seen, and if the message ID has been seen already, deliver the duplicate message somewhere else.


Something like the following....

Tony Finch on 2007-01-03T18:48:22

# MBM had the idea of using a filter log file for lsearch lookups.
logfile $home/.msgid.log
if ${lookup{$h_message-id:}lsearch{$home/.msgid.log}} is "seen"
then
    seen
    finish
else
    save $home/inbox
    logwrite "$h_message-id: seen"
endif

Re:Something like the following....

nicholas on 2007-01-03T19:55:24

I had been thinking that something structured like this might work. But lsearch will be O(n), won't it? And if most message IDs aren't repeated, then most searches will be for the whole file. Whereas using a DBM file would be O(1), wouldn't it? But would require writing a custom program to insert seen message IDs into the DBM file, which is a fork hit.

Re:Something like the following....

Tony Finch on 2007-01-04T10:34:00

You can write a persistent program to look after the database and use ${readsocket. Or link your Exim with perl and use ${perl :-)

Re:Something like the following....

nicholas on 2007-01-07T16:23:01

I'm not root, so I'm not in a position to link with perl. Even if I were, I prefer the decoupling provided by a separate program, so I went with ${readsocket . Thanks for the help