Attempting to note things without going insane with frustration. SQL Server cannot handle UTF-8. It can however handle UCS-2 (great for windows or java programmers), but in a bizarre fashion.
Salutations to the DBD::Sybase and freetds team for their excellent work in allowing perl to talk UTF-8 (at least the UCS-2 compat portion of it) to a database that doesn't even support it.
As a side note, i think that when constructing a test suite for a program that uses a database for a backend, it's essential to test the database itself for edge case conditions, such as in this case, making sure that you can fill a varchar(10) with ten three byte unicode characters AND retrieve it AND the retrieved value matches your expected result. Cos maybe your definitions are behaving slightly differently than you thought.
I've been having some difficulties with this. In my case, I have trouble pulling data out of the database when it uses Windows special characters. Any idea how to go the opposite direction?
Re:Help :)
ddick on 2009-06-09T23:38:32
What sort of trouble are you having? In my case, I Encode::encode the statement before calling $dbh->prepare and when retrieving $row = $sth->fetchrow_hashref() i run Encode::decode on each field in $row.Re:Help :)
jdavidb on 2009-06-10T13:36:36
It appears to blow up for me on fetchrow_arrayref. I'm not sure I get a chance to use Encode on what came back. I might be able to hack into DBD::Sybase and do it in there.
I haven't had a need to do anything to the statement; everything I'm passing is straight ASCII. The only problems I face are when I deal with records where people have inserted Windows charset members. Then I get this message:
DBD::Sybase::st fetchrow_arrayref failed: OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (9) NUMBER = (99)
Server , database
Message String: WARNING! Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?').Re:Help :)
ddick on 2009-06-10T18:51:52
the client charset is set in freetds.conf and defaults to ISO-8859-1 (i think) which would die if windows characters are used. set "Client Charset" to "UTF-8" and see what happens. also, check the TDSDUMP file. Mine has the following lines at the start
log.c:190:Starting log file for FreeTDS 0.82
on 2009-06-11 04:50:38 with debug flags 0x4fff.
iconv.c:363:iconv to convert client-side data to the "UTF-8" character set
iconv.c:516:tds_iconv_info_init: converting "UTF-8"->"UCS-2LE"
iconv.c:516:tds_iconv_info_init: converting "ISO-8859-1"->"UCS-2LE"
net.c:210:Connecting to 192.168.122.35 port 1433 (TDS version 8.0)
net.c:264:tds_open_socket: connect(2) returned "Operation now in progress"
net.c:303:tds_open_socket() succeeded
util.c:162:Changed query state from DEAD to IDLE
net.c:779:Sending packetRe:Help :)
jdavidb on 2010-05-06T21:11:52
A year later, I'm still struggling with this issue. Basically when I add client charset = UTF-8 to my freetds.conf file, I get "Out of memory!" thrown at me, with no apparent reason.
And strangely enough, my query runs just fine from the tsql freetds command-line program.
Do you happen to have any ideas?
Re:Help :)
ddick on 2010-05-06T23:25:06
First off, if you are hitting memory issues, are you trying to retrieve a text (as distinct from a
/(?:var)?char/ field? Secondly, can you set the TDSDUMP environment variable and post the output? checkout http://www.freetds.org/userguide/logging.htm
Re:Help :)
jdavidb on 2010-05-11T21:47:46
Found my answer. It was a text field, and the default length was massively too long. Apparently SQLServer defaults to something huge. I could change it in freetds.conf, and that fixed the problem.
Re:Help :)
jdavidb on 2010-05-11T21:48:17
Also, TDSDUMP is really helping me
... thanks! Re:Help :)
ddick on 2010-05-06T23:28:51
Even better, use the TDSDUMP variable when running against DBD::Sybase (the failure) and tsql (the success) and compare the debug logs to find the difference.
Re:Help :)
ddick on 2009-06-09T23:42:34
also, i found setting
export TDSDUMP=/tmp/freetds.log
to be a huge help, as you can then see what your data is encoded as it travels over the network.