Mail charset autoconversion

darobin on 2003-04-03T10:05:25

It seems to me that mail.pm.org has a strange behaviour. You feed it:

  Content-Type: text/plain; charset=UTF-8; format=flowed

and you get back:

  Content-Type: text/plain; charset=ISO-8859-1; format=flowed
  Content-Transfer-Encoding: quoted-printable
  X-MIME-Autoconverted: from 8bit to quoted-printable by mail.pm.org id h339XIO14283

Is there any reason for doing that? I'm not criticising the fine folks that make this run as it may be a good option, but it is mildly annoying at times so I'd like to know :)


MTAs and RFCs

jsmith on 2003-04-03T14:32:35

One reason it might convert is because only the lower 7 bits in a character can be expected to survive all MTAs. One solution would be to uuencode the UTF-8 content.

The most recent e-mail RFC (2822) states that the body is only US-ASCII (section 2.3, pg. 8).

Good and bad

Matts on 2003-04-03T19:41:18

As jsmith points out - the body is supposed to be US-ASCII, so converting it to QP is the right thing to do.

However, converting it to ISO-8859-1 is bad bad bad. It has no right to do that IMHO ;-)

Re:Good and bad

darobin on 2003-04-04T08:50:07

Is there any hope for the mail infrastructure to be fixed one of these days? I don't remember hitting a 7bit MTA since 97 or something. Will we also have to QP encode addresses when IRIs start appearing all over?

Re:Good and bad

Matts on 2003-04-04T09:43:24

Is there any hope for the mail infrastructure to be fixed one of these days?

You imply it's broken. By the same theory IPv4 is also broken. But I don't see much of a rush to switch to IPv6, and I very much doubt we'd see a big rush to a new mail infrastructure. The pain would be worse than the gain.

Will we also have to QP encode addresses when IRIs start appearing all over?

I suspect we'll see more and more base64 and QP encoded emails over time, yes.

There are many email clients that send out broken crap, and sadly MTA's often just accept their garbage. This leads to the current situation.

It's very much analogous to the HTML/XML situation. Would you have XML parsers try their best to accept any old crap just because you want to produce it?

Re:Good and bad

darobin on 2003-04-04T09:56:40

I knew you'd say that :-p. I'm implying that what is broken is that it's supposed to be a text-based protocol and yet it's impossible to use text in it without encoding it to US-ASCII through QP/B64. I can see the point in having UTF8 mail set to be the standard mail content (perhaps alongside UTF16, just as it is for XML) so that all languages can be supported without special encodings and we only resort to ugly hacks for binary content.

My email is sent out in UTF8 with clear and clean headers saying this is text/plain with a utf8 charset. MTAs should leave that alone, not trying to apply some "fix" to it. As you say, I would much rather have MTAs send stuff back to me instead of munging it in a way that confuses some people's MUAs. Of course, that would be fixing the infrastructure (:p) and whlie we're at it we could fix the RFCs to be I18N-proactive.

IRI based addresses are really going to look ugly :(