MySQL Default Collation

Ovid on 2007-01-22T15:32:02

In researching a database problem, I just discovered that MySQL's default collation is latin1_swedish_ci. Apparently, this is considered OK because english doesn't have accented characters and will therefore sort the same way. Still, sounds a bit dodgy to me.


Unicode, please...

Dom2 on 2007-01-22T22:17:02

Any default that isn't UTF-8 is wrong. Yes, really.

Re:Unicode, please...

Aristotle on 2007-02-08T10:33:05

Uhm, that makes no sense. UTF-8 is an encoding, Unicode is a charset, and neither is a collation. F.ex., Ü will sort to different places depending on whether the collation is English or German (and may sort in yet otherwise in one of the Scandinavian languages, or in Turkish, or what have you). Whether you represent this character in Latin-1 or Unicode (happens to be the same codepoint in both charsets) and whether you encode the Unicode codepoint using UTF-8 or another encoding all has nothing to do with what the sort order of codepoints is.

English characters

drhyde on 2007-01-23T12:18:31

Arguably we have at least three accented characters - the o with diaresis, o with circumflex and e with acute. At least, they occur in Modern English words in the OED even if in practice most people omit them. But we undeniably have two non-ASCII letters both of which occur both capitalised and not - the ae and oe which I dare not type here because your browser will almost certainly get them wrong.

AE occurs in words like encyclopaedia and aesc, and OE in, for example, the proper name OEdipus and a few somewhat obscure variant spellings.

Re:English characters

Ovid on 2007-01-23T12:28:43

And even then, we have words like résumé which also have accents, depending on who is spelling them.

Re:English characters

drhyde on 2007-01-23T13:06:05

Yes, that's e with acute. Some would argue that furrin words adopted into English lose the accents though. For example, when we stole théâtre from the French it became theatre and café is normally cafe.

Re:English characters

Aristotle on 2007-02-08T10:35:00

Don’t forget naïve. Æsthetics and co. work just fine in browsers, btw.

Re:English characters

drhyde on 2007-02-08T12:08:12

It's worth noting that the funny dots in ï and ö in English are diaresis marks, not umlauts. I wonder if they're spelt differently in Unicode ...