Unicode

acme on 2004-07-22T14:12:24

See, I like the idea of Unicode. However, at the Italian Perl Workshop almost every talk had a little bit about how to get Unicode working in a module / project / database. And there are various CPAN modules with Unicode issues.

I think we've basically screwed up Unicode. We shouldn't have to talk about it, it should just work. And BOMs being optional is madness...


Unicode encodings

drhyde on 2004-08-03T11:19:02

The problem with unicode is the bloody encodings. Do you want utf-8, utf-16, or iso-latin-57.3? And do you want fries with that? Given a chunk of text, how do you *know* what the encoding is? Guessing isn't good enough.

It gets worse - there's been N different versions of Unicode now, capable of encoding different numbers of characters. eg, in Java 1.0, the version of Unicode it supported only permitted 65536 different characters (a character was 16 bits) but now unicode supports at least 32 bits.

Different versions I could just about cope with - it's a truism that you should never use anything before version 3 anyway. But I have yet to find libraries which handle them all both correctly and most importantly SIMPLY.

I suppose that handling variable width character sets is hard, which partly explains a lot of the broken unicode software. The solution is to admit that variable width charsets are a stupid idea, and ditch them. I'd happily use unicode if characters were guaranteed to be a fixed width. But they're not, and so I don't.