UTF-8 on search.cpan.org

darobin on 2002-02-17T17:25:36

It would seem that search.cpan.org (and possibly other parts of the CPAN system as this particular problem may have roots elsewhere in the chain) is having a few problems with some characters outside the Latin-1 range.

If you go to http://search.cpan.org/search?mode=module&query=SOAP::Client, you will note that the second entry seems rather garbled. I know not the author, but judging from his entries it would seem that his name may be ethiopian.

Not that this is a real problem, hardly a buglet in fact as the site works nevertheless, but it shows once more how hard it is to deal properly with encodings, even for experienced programmers. At first this touched mostly XML, and people were making fun of us for the problems we'd thrown ourselves into to get things right encodings-wise. But now people are expecting -- and rightly so -- the entire web to be Unicode safe. This puts quite a few of us into trouble, as few tools are yet ready for this and few know how to deal with this properly. I got burnt many times already so I can only advise anyone that is doing web stuff to pay attention to such issues, as it won't be possible to dodge them long :)


there aren't that many

hfb on 2002-02-17T18:02:59

of those and you can see them for yourself on the who's who list at the bottom. I'm not sure what the cause of the mangling is as it would appear to be more than just an encoding issue.

Re:there aren't that many

darobin on 2002-02-17T18:25:36

I'm not sure either, especially as they appear fine on the who's who list. It could be an encoding problem as only one error in the pipeline is sufficient to rot the whole thing (and it doesn't take much to have just that one bug :-). It might simply have been because search.cpan.org was sending the correct content, but with charset iso-8859-1. Anyway, it looks like you fixed it by using his englishified name now :)

PS: I should have said this more clearly in my journal: I'm not trying to say anything bad about search.cpan.org :)

Re:there aren't that many

hfb on 2002-02-17T18:42:35

Hmm? They appear the same on the list and on search.cpan in my browser. I didn't fix anything :)

Re:there aren't that many

darobin on 2002-02-17T18:58:55

Now that's strange... When I posted earlier, Daniel Yacob (DYACOB) was listed on search.cpan as a garbled string of chars but is now "Daniel Yacob". On the who's who list I see him listed as <something I had to delete to get this post to be accepted> (ie a name in what looks like hebrew but which could be in something else as the fonts on this box are very unclear). search.cpan has definitely changed over the past few hours (as seen from here at last) !

Re:there aren't that many

hfb on 2002-02-17T19:07:39

I swear, I didn't change anything :)