In researching a database problem, I just discovered that MySQL's default collation is latin1_swedish_ci. Apparently, this is considered OK because english doesn't have accented characters and will therefore sort the same way. Still, sounds a bit dodgy to me.
Re:Unicode, please...
Aristotle on 2007-02-08T10:33:05
Uhm, that makes no sense. UTF-8 is an encoding, Unicode is a charset, and neither is a collation. F.ex., Ü will sort to different places depending on whether the collation is English or German (and may sort in yet otherwise in one of the Scandinavian languages, or in Turkish, or what have you). Whether you represent this character in Latin-1 or Unicode (happens to be the same codepoint in both charsets) and whether you encode the Unicode codepoint using UTF-8 or another encoding all has nothing to do with what the sort order of codepoints is.
AE occurs in words like encyclopaedia and aesc, and OE in, for example, the proper name OEdipus and a few somewhat obscure variant spellings.
Re:English characters
Ovid on 2007-01-23T12:28:43
And even then, we have words like résumé which also have accents, depending on who is spelling them.
Re:English characters
drhyde on 2007-01-23T13:06:05
Yes, that's e with acute. Some would argue that furrin words adopted into English lose the accents though. For example, when we stole théâtre from the French it became theatre and café is normally cafe.Re:English characters
Aristotle on 2007-02-08T10:35:00
Don’t forget naïve. Æsthetics and co. work just fine in browsers, btw.
Re:English characters
drhyde on 2007-02-08T12:08:12
It's worth noting that the funny dots in ï and ö in English are diaresis marks, not umlauts. I wonder if they're spelt differently in Unicode...