I want a name for my new module, that automatically detects the best, conservative encodings to be used in Email messages, from the strings.
It'll be useful to encode email message in iso-2022-jp if all content are in Japanese, iso-2022-kr for Korean etc. Gmail does it by default: http://mail.google.com/support/bin/answer.py?ctx=%67mail&hl=en&answer=22841>
I'm thinking of Encode::Email::Best and Encode::Mail::Traditional. Have a suggestion?
Re:Coordinate with RJBS
miyagawa on 2007-03-24T01:20:55
Well I was thinking about Email:: namespace at first, but the actual code wouldn't do anything specific with Email messages actually.
It tries to encode the messages into a narrow-to-wide certain set of encodings and see if all characters are safely encoded, using Encode:: and possibly with Dan's Encode::InCharset.
Anyway I'll think about it more.
Re:Encode::First
miyagawa on 2007-03-24T02:52:37
Oh yeah, I like that interface. Maybe I'll suggest an utility function that takes the string and array reference to return the best encoding, and also provide an encode() compatible function just as you described. Thanks!
It seems to me that email is just what you want to use the module for. I don’t see how the module’s operation actually has anything whatsoever to do with email. “Best” doesn’t really say anything; maybe Encode::MinCharsetPicker?
(Btw, I’d have the module only suggest the minimal applicable charset, but not actually do the encoding itself (or only if you ask for it by way of a convenience function). Probably the main function should simply take a list of encodings and then try to pick the applicable encoding with the smallest index.)
Re:Does it really have anything to do with mail?
miyagawa on 2007-03-24T02:54:13
As said in the other comment replies, the actual code doesn't have anything to do with email, other than the default "list of encodings known to be safe in emails" are almost specific to email (which is the point of this module) obviously.
I'd probably make two functions, one is compatible as encode() (and does encoding itself) and other one like detect_best_encoding(), which returns the name of the encoding but doesn'nt encode itself.Re:Does it really have anything to do with mail?
Juerd on 2007-03-24T11:53:56
The easiest way to detect the "best" encoding would be to just encode it, with a CHECK argument to make it fail if impossible. Why create a utility function to throw away the encoded string, if the user can easily choose to do so himself?
my ($enc) = encode_first(...);
Or, have you found another efficient way of finding a suitable encoding?Re:Does it really have anything to do with mail?
miyagawa on 2007-03-24T12:03:54
Yes, I was thinking of the exact same logic, as well as using charset tables like the one used in Encode::InCharset. I prefer the easiest, if not the most efficient, so I guess that'll be same as what you described.
The reason we want the encoding itself back it that we'd like to use it in the Email header. If we return the encoded string only, the caller doesn't know which encoding it's actually encoded in.Re:Does it really have anything to do with mail?
Aristotle on 2007-03-24T12:49:43
The reason I suggested that sort of interface is that some APIs expect to receive character strings that they will then encode themselves; XML serialisers come to mind. In such a case, giving the caller an encoded string is pretty useless.
Re:Does it really have anything to do with mail?
Juerd on 2007-03-24T11:56:12
Oh, and a list of several email-safe country-specific encodings is of course more common than latin1:utf8, and would make a better default.