Wiki Replication - Unicode problems solved

jplindstrom on 2005-06-28T12:26:21

In replicating a waaay old MoinMoin wiki to the latest version (it's like ten versions in between, I doubt it's feasible to upgrade all that way) I found that some pages didn't get stored right.

That was strange, because this was an existing wiki replicator that worked fine in the past.

After some detective work I figured out that all failing pages contained åäö or £ or something like that and remembered reading they've moved to utf8 for everything in the new MoinMoin version. Ahh!

So in the POST, I tried to somehow set the content type to utf8 like this:

my $req = POST($url,
    Content_Type => 'application/x-www-form-urlencoded; charset=utf-8',
    Content => [
       action => "savepage",
       datestamp => $timestamp,
       savetext => $markup,
       comment => "WiMi by $author",
       button_save => "Save Changes",
   ]);


That didn't work either, obviously because the old string is in Latin-1 and so the high-byte values was misinterpreted by the MoinMoin Python code.

Next step: perldoc utf8, which pointed to the Encode module. This little thing worked, even without the extra header:

my $utf8Markup = encode("utf8", $markup);


Et voila!