DBD::SQLite and Unicode

miyagawa on 2009-04-07T23:51:22

Attention to anyone using DBD::SQLite and $dbh->{unicode} attribute set to 1.

This module has a long standing bug where it assumes passed strings internal encoding is UTF-8 when inserting values into the database and I'm trying to fix it.

use DBI; use Encode;

my $utf8_string = "This is \x{30c6}\x{30b9}\x{30c8}"; # "Test" in Japanese my $utf8_bytes = encode_utf8($string); my $lat1_string = "H\xe9llo World"; # Héllo

my $dbh = DBI->connect("DBI:SQLite:...", ...); $dbh->{unicode} = 1;

my $sth = $dbh->prepare("INSERT INTO foo (bar) VALUES (?)";

$sth->execute($utf8_string); # (1) Good $sth->execute($utf8_bytes); # (2) ??? $sth->execute($lat1_string); # (3) ???


Current version of DBD::SQLite (prior to 1.21) assumes given string's INTERNAL encoding as UTF-8 and stores the octet stream into the database without calling encode_utf8 nor utf8::upgrade, so this makes #2 PASS and #3 FAIL (invalid UTF-8 octet in the database), which is not correct.

My patch solves this, and #2 $utf8_bytes will be now double encoded and FAIL, but #3 PASS with correct UTF-8 octet stream.

That #2 FAIL might break your (potentially-already-broken) app, when you try to save UTF-8 encoded strings into the database under 'unicode' option, but I believe this is a right fix to make it FAIL.

http://svn.ali.as/cpan/trunk/DBD-SQLite/t/rt_25371_asymmetric_unicode.t> is a failing test by Juerd and is my patch to fix that. This patch still passes all tests, including 12_unicode.t and 20_blobs.t, and this makes DBD::SQLite's unicode option compatible to what DBD::mysql's mysql_enable_utf8 option does, etc.
Note that if you REALLY want to save the octet bytes without being encoded into UTF-8, you can still define the table with BLOB column type and
use 3-arg bind_param like explained in DBD::SQLite POD. That 'unicode' section continues to be entirely correct with this patch.

Let me know your input in #dbd-sqlite on irc.perl.org. Testing your app with my patch and reporting it back would be highly appreciated too.


Release tarball

Alias on 2009-04-08T02:09:25

There's now also a dev release for these changes

http://svn.ali.as/cpan/releases/DBD-SQLite-1.22_01.tar.gz

A big thank you!

Juerd on 2009-04-08T09:01:29

Hi, thanks for fixing this bug. I can now get rid of many hacky utf8::upgrade calls :).

Does this also encode/upgrade the query itself, in case someone uses a literal value instead of a placeholder? I forgot to add that to the test.

Re:A big thank you!

miyagawa on 2009-04-08T09:15:28

I just fixed the method to encode text passed to sth_execute (that's what I think my patch does at least, based on my low XS-fu :))

You could update/patch the test to include the query itself so that we can improve? Thanks!