text

inkdroid on 2003-05-01T14:13:01

The third part of Tim Bray's four part "trilogy" on text processing concerns programming languages and text. For a while now Java has boasted Unicode...so it's interesting to hear about the wrinkles. These criticisms are relevant to Perl's Unicode handling as well:

print length("\N{LATIN CAPITAL LETTER A}\N{COMBINING ACUTE ACCENT}"), "\n";

prints 2 not 1.

Unicode is cool, but the issues are so huge (especially for a programming language that is so text oriented). I'm still amazed that the perl5 folks were able to graft Unicode into Perl5 without having to wait till Perl6.