DB_File and MLDBM

ambs on 2004-07-26T18:56:37

I really love Ties (Perl ties, of course) and specially DB_File and MLDBM. I use them very often (specially the second, which imply I use the first too!).

Everytime I need persistence, I call MLDBM. I just hate relational databases. I know some SQL, I did some code to interact with MySQL, but I hate them.

Today, it was another MLDBM day. I was preparing a system for webpapers. WebPaper is a game where you have to answer a set of questions using the Internet, ICQ, MSN, Jabber, MOO, MUD, IRC and so on.

The problem is: I have a set of questions which use unicode characters. Things like and s with a ^ upsidedown... and such things. The same happens with answers. I was storing those answers on a MLDBM but it is storing unicode as a sequence of bytes. That means that when I get the value again, Perl will look to the string as a normal string instead of a utf-8 string. Anybody has any idea of how I can solve this (easily, please)?


Turn on UTF8 flag

grantm on 2004-07-26T21:24:16

I guess the 'right' answer would be to have the DBM layer preserve the UTF8 flag on the strings. In the absence of that solution, if you have a string containing UTF8 byte sequences but not flagged as UTF8, you can turn on the flag like this (perl 5.8 rqd):

use Encode qw(_utf8_on);

my $string = "\xE2\x82\xAC";  # The Euro symbol

print length($string), "\n";

_utf8_on($string);

print length($string), "\n";

Which prints:

3
1

This is documented in the Encode man page

Re:Turn on UTF8 flag

ambs on 2004-07-26T21:28:08

Thanks. This can do the trick, specially if it can be used on strings not containing utf-8 characters (I think it can). I'll try it tomorrow morning.