UTF-8, perl and Microsoft SQL Server

ddick on 2009-06-09T00:49:22

Attempting to note things without going insane with frustration. SQL Server cannot handle UTF-8. It can however handle UCS-2 (great for windows or java programmers), but in a bizarre fashion.

  • First, char, varchar and text fields do not hold UCS-2 data. nvarchar, nchar and ntext do.<\li>
  • Secondly, quoting of data is somewhat different. Instead of quoting data like so 'foo', it becomes N'foo'.
  • Now finally, when connecting with DBD::Sybase with underlying freetds libraries, you need to define "Client Charset = UTF-8" AND use the Encode module to encode your statement as 'UTF-8' before passing it to DBI->prepare.

Salutations to the DBD::Sybase and freetds team for their excellent work in allowing perl to talk UTF-8 (at least the UCS-2 compat portion of it) to a database that doesn't even support it.

As a side note, i think that when constructing a test suite for a program that uses a database for a backend, it's essential to test the database itself for edge case conditions, such as in this case, making sure that you can fill a varchar(10) with ten three byte unicode characters AND retrieve it AND the retrieved value matches your expected result. Cos maybe your definitions are behaving slightly differently than you thought.


Help :)

jdavidb on 2009-06-09T15:15:52

I've been having some difficulties with this. In my case, I have trouble pulling data out of the database when it uses Windows special characters. Any idea how to go the opposite direction?

Re:Help :)

ddick on 2009-06-09T23:38:32

What sort of trouble are you having? In my case, I Encode::encode the statement before calling $dbh->prepare and when retrieving $row = $sth->fetchrow_hashref() i run Encode::decode on each field in $row.

Re:Help :)

jdavidb on 2009-06-10T13:36:36

It appears to blow up for me on fetchrow_arrayref. I'm not sure I get a chance to use Encode on what came back. I might be able to hack into DBD::Sybase and do it in there.

I haven't had a need to do anything to the statement; everything I'm passing is straight ASCII. The only problems I face are when I deal with records where people have inserted Windows charset members. Then I get this message:

DBD::Sybase::st fetchrow_arrayref failed: OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (9) NUMBER = (99)

Server , database

Message String: WARNING! Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?').

Re:Help :)

ddick on 2009-06-10T18:51:52

the client charset is set in freetds.conf and defaults to ISO-8859-1 (i think) which would die if windows characters are used. set "Client Charset" to "UTF-8" and see what happens. also, check the TDSDUMP file. Mine has the following lines at the start

log.c:190:Starting log file for FreeTDS 0.82
        on 2009-06-11 04:50:38 with debug flags 0x4fff.
iconv.c:363:iconv to convert client-side data to the "UTF-8" character set
iconv.c:516:tds_iconv_info_init: converting "UTF-8"->"UCS-2LE"
iconv.c:516:tds_iconv_info_init: converting "ISO-8859-1"->"UCS-2LE"
net.c:210:Connecting to 192.168.122.35 port 1433 (TDS version 8.0)
net.c:264:tds_open_socket: connect(2) returned "Operation now in progress"
net.c:303:tds_open_socket() succeeded
util.c:162:Changed query state from DEAD to IDLE
net.c:779:Sending packet

Re:Help :)

jdavidb on 2010-05-06T21:11:52

A year later, I'm still struggling with this issue. Basically when I add client charset = UTF-8 to my freetds.conf file, I get "Out of memory!" thrown at me, with no apparent reason.

And strangely enough, my query runs just fine from the tsql freetds command-line program.

Do you happen to have any ideas?

Re:Help :)

ddick on 2010-05-06T23:25:06

First off, if you are hitting memory issues, are you trying to retrieve a text (as distinct from a /(?:var)?char/ field?

Secondly, can you set the TDSDUMP environment variable and post the output? checkout http://www.freetds.org/userguide/logging.htm

Re:Help :)

jdavidb on 2010-05-11T21:47:46

Found my answer. It was a text field, and the default length was massively too long. Apparently SQLServer defaults to something huge. I could change it in freetds.conf, and that fixed the problem.

Re:Help :)

jdavidb on 2010-05-11T21:48:17

Also, TDSDUMP is really helping me ... thanks!

Re:Help :)

ddick on 2010-05-06T23:28:51

Even better, use the TDSDUMP variable when running against DBD::Sybase (the failure) and tsql (the success) and compare the debug logs to find the difference.

Re:Help :)

ddick on 2009-06-09T23:42:34

also, i found setting

export TDSDUMP=/tmp/freetds.log

to be a huge help, as you can then see what your data is encoded as it travels over the network.