I really love Ties (Perl ties, of course) and specially DB_File and MLDBM. I use them very often (specially the second, which imply I use the first too!).
Everytime I need persistence, I call MLDBM. I just hate relational databases. I know some SQL, I did some code to interact with MySQL, but I hate them.
Today, it was another MLDBM day. I was preparing a system for webpapers. WebPaper is a game where you have to answer a set of questions using the Internet, ICQ, MSN, Jabber, MOO, MUD, IRC and so on.
The problem is: I have a set of questions which use unicode characters. Things like and s with a ^ upsidedown... and such things. The same happens with answers. I was storing those answers on a MLDBM but it is storing unicode as a sequence of bytes. That means that when I get the value again, Perl will look to the string as a normal string instead of a utf-8 string. Anybody has any idea of how I can solve this (easily, please)?
I guess the 'right' answer would be to have the DBM layer preserve the UTF8 flag on the strings. In the absence of that solution, if you have a string containing UTF8 byte sequences but not flagged as UTF8, you can turn on the flag like this (perl 5.8 rqd):
use Encode qw(_utf8_on);
my $string = "\xE2\x82\xAC"; # The Euro symbol
print length($string), "\n";
_utf8_on($string);
print length($string), "\n";
Which prints:
3
1
This is documented in the Encode man page
Re:Turn on UTF8 flag
ambs on 2004-07-26T21:28:08
Thanks. This can do the trick, specially if it can be used on strings not containing utf-8 characters (I think it can). I'll try it tomorrow morning.