Attention to anyone using DBD::SQLite and $dbh->{unicode} attribute set to 1.
This module has a long standing bug where it assumes passed strings internal encoding is UTF-8 when inserting values into the database and I'm trying to fix it.
use DBI;
use Encode;
my $utf8_string = "This is \x{30c6}\x{30b9}\x{30c8}"; # "Test" in Japanese
my $utf8_bytes = encode_utf8($string);
my $lat1_string = "H\xe9llo World"; # Héllo
my $dbh = DBI->connect("DBI:SQLite:...", ...);
$dbh->{unicode} = 1;
my $sth = $dbh->prepare("INSERT INTO foo (bar) VALUES (?)";
$sth->execute($utf8_string); # (1) Good
$sth->execute($utf8_bytes); # (2) ???
$sth->execute($lat1_string); # (3) ???
Current version of DBD::SQLite (prior to 1.21) assumes given string's INTERNAL encoding as UTF-8 and stores the octet stream into the database without calling encode_utf8 nor utf8::upgrade, so this makes #2 PASS and #3 FAIL (invalid UTF-8 octet in the database), which is not correct.
My patch solves this, and #2 $utf8_bytes will be now double encoded and FAIL, but #3 PASS with correct UTF-8 octet stream.
That #2 FAIL might break your (potentially-already-broken) app, when you try to save UTF-8 encoded strings into the database under 'unicode' option, but I believe this is a right fix to make it FAIL.
http://svn.ali.as/cpan/trunk/DBD-SQLite/t/rt_25371_asymmetric_unicode.t> is a failing test by Juerd and
Note that if you REALLY want to save the octet bytes without being encoded into UTF-8, you can still define the table with BLOB column type and use 3-arg bind_param like explained in DBD::SQLite POD. That 'unicode' section continues to be entirely correct with this patch.
Let me know your input in #dbd-sqlite on irc.perl.org. Testing your app with my patch and reporting it back would be highly appreciated too.
There's now also a dev release for these changes
http://svn.ali.as/cpan/releases/DBD-SQLite-1.22_01.tar.gz
Hi, thanks for fixing this bug. I can now get rid of many hacky utf8::upgrade calls
Does this also encode/upgrade the query itself, in case someone uses a literal value instead of a placeholder? I forgot to add that to the test.
Re:A big thank you!
miyagawa on 2009-04-08T09:15:28
I just fixed the method to encode text passed to sth_execute (that's what I think my patch does at least, based on my low XS-fu
:)) You could update/patch the test to include the query itself so that we can improve? Thanks!