Testing everything that can possibly break, part 2

jonasbn on 2008-01-21T22:05:49

Another problem I have been working on for some days now is a good old UTF-8 problem. Perhaps it is just me, but I am simply being haunted by UTF-8 problems. UTF-8 seems so simple, but it can really be a pain.

I was given a web form application, quite simple, which utilize a simple framework which have been developed for this. This time however the application talks to an external back-end via XML over HTTP instead of sending mails or sticking stuff in a database.

So the form collects some simple data, a name among other things. This name does of course reflect normal danish names, so special characters are involved. So I was asked to encode the data as UTF-8 since this did not seem to be the case.

I have attempted at several different approaches without luck. I wrote up some tests. These tests do however pass nicely with UTF-8 working properly, but they are running outside the mod_perl environment, which so nicely demonstrate the problem.

So testing stuff with unit-tests would not have pointed to the problem, but when developing the stuff the application should have been tested with some proper data. So instead of strings like: 'hest', 'test', 'fest' - 'kødpålæg', 'masseødelæggelsesvåben' and 'æblegrød' should be test data candidates. Way to often we end up with these problems too late in the projects.

So tomorrow it is back to the drawing board. Where I have to find out how to eliminate XML::Simple from eating my XML when it encounters UTF-8 encoded strings.

So my chapter in the developers handbook should also contains something on test data.

My problem really is that it would be great if the tests could be run under mod_perl, so I could demonstrate the bug and test in the proper environment as I wrote in part 1.

UTF-8 is not a pain,

Aristotle on 2008-01-24T19:58:24

… encodings are a pain.