UTF-8, perl and Microsoft SQL Server

ddick on 2009-06-09T00:49:22

Attempting to note things without going insane with frustration. SQL Server cannot handle UTF-8. It can however handle UCS-2 (great for windows or java programmers), but in a bizarre fashion.

First, char, varchar and text fields do not hold UCS-2 data. nvarchar, nchar and ntext do.<\li>
Secondly, quoting of data is somewhat different. Instead of quoting data like so 'foo', it becomes N'foo'.
Now finally, when connecting with DBD::Sybase with underlying freetds libraries, you need to define "Client Charset = UTF-8" AND use the Encode module to encode your statement as 'UTF-8' before passing it to DBI->prepare.

Salutations to the DBD::Sybase and freetds team for their excellent work in allowing perl to talk UTF-8 (at least the UCS-2 compat portion of it) to a database that doesn't even support it.

As a side note, i think that when constructing a test suite for a program that uses a database for a backend, it's essential to test the database itself for edge case conditions, such as in this case, making sure that you can fill a varchar(10) with ten three byte unicode characters AND retrieve it AND the retrieved value matches your expected result. Cos maybe your definitions are behaving slightly differently than you thought.

Help :)

jdavidb on 2009-06-09T15:15:52

I've been having some difficulties with this. In my case, I have trouble pulling data out of the database when it uses Windows special characters. Any idea how to go the opposite direction?

Re:Help :)

ddick on 2009-06-09T23:38:32
What sort of trouble are you having? In my case, I Encode::encode the statement before calling $dbh->prepare and when retrieving $row = $sth->fetchrow_hashref() i run Encode::decode on each field in $row.

Re:Help :)

jdavidb on 2009-06-10T13:36:36

It appears to blow up for me on fetchrow_arrayref. I'm not sure I get a chance to use Encode on what came back. I might be able to hack into DBD::Sybase and do it in there.
I haven't had a need to do anything to the statement; everything I'm passing is straight ASCII. The only problems I face are when I deal with records where people have inserted Windows charset members. Then I get this message:
DBD::Sybase::st fetchrow_arrayref failed: OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (9) NUMBER = (99) Server , database Message String: WARNING! Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?').

Re:Help :)

ddick on 2009-06-10T18:51:52

the client charset is set in freetds.conf and defaults to ISO-8859-1 (i think) which would die if windows characters are used. set "Client Charset" to "UTF-8" and see what happens. also, check the TDSDUMP file. Mine has the following lines at the start
log.c:190:Starting log file for FreeTDS 0.82 on 2009-06-11 04:50:38 with debug flags 0x4fff. iconv.c:363:iconv to convert client-side data to the "UTF-8" character set iconv.c:516:tds_iconv_info_init: converting "UTF-8"->"UCS-2LE" iconv.c:516:tds_iconv_info_init: converting "ISO-8859-1"->"UCS-2LE" net.c:210:Connecting to 192.168.122.35 port 1433 (TDS version 8.0) net.c:264:tds_open_socket: connect(2) returned "Operation now in progress" net.c:303:tds_open_socket() succeeded util.c:162:Changed query state from DEAD to IDLE net.c:779:Sending packet

Re:Help :)

jdavidb on 2010-05-06T21:11:52

A year later, I'm still struggling with this issue. Basically when I add client charset = UTF-8 to my freetds.conf file, I get "Out of memory!" thrown at me, with no apparent reason.
And strangely enough, my query runs just fine from the tsql freetds command-line program.
Do you happen to have any ideas?

Re:Help :)

ddick on 2010-05-06T23:25:06

First off, if you are hitting memory issues, are you trying to retrieve a text (as distinct from a /(?:var)?char/ field?
Secondly, can you set the TDSDUMP environment variable and post the output? checkout http://www.freetds.org/userguide/logging.htm

Re:Help :)

jdavidb on 2010-05-11T21:47:46

Found my answer. It was a text field, and the default length was massively too long. Apparently SQLServer defaults to something huge. I could change it in freetds.conf, and that fixed the problem.

Re:Help :)

jdavidb on 2010-05-11T21:48:17

Also, TDSDUMP is really helping me ... thanks!

Re:Help :)

ddick on 2010-05-06T23:28:51

Even better, use the TDSDUMP variable when running against DBD::Sybase (the failure) and tsql (the success) and compare the debug logs to find the difference.

Re:Help :)

ddick on 2009-06-09T23:42:34

also, i found setting
export TDSDUMP=/tmp/freetds.log
to be a huge help, as you can then see what your data is encoded as it travels over the network.