Dear Log,
When write-opening to a Unicode file, it is behoovy to then emit a byte-order mark before anything else:
print OUT "\x{feff}"; # Byte Order Mark
It even works happily with UTF8 files -- many applications correctly interpret as the resulting byte sequence as meaning "YES, THIS IS UTF8!".
Re:Something fishy
TorgoX on 2002-03-13T21:33:17
Note that I have \x{feff} and not \xfe\xff. \x{feff} means "character FEFF", which usually (i.e., until we have more and varied handle disciplines) gets expressed as UTF8.> perl -e "print map sprintf(q{%02x },$_), unpack q{C*}, qq{\x{feff}}"
ef bb bfRe:Something fishy
Matts on 2002-03-13T23:58:42
Ah! You caught me napping. Nice one;-) Re:Something fishy
TorgoX on 2002-03-14T04:56:28
BTW, when does a UTF8 DOM screw things up?Re:Something fishy
Matts on 2002-03-14T10:00:35
Never. But a UTF-8 B.O.M. screws things up;-). For example on POSIX systems it screws up the shebang line, and also screws up interaction with the file magic type command. This is according to the UTF-8/Unicode on Linux FAQ.
"You feffin' kids take your feffin' skateboards and your feffin' boomboxes somewhere else before I rip your feffin' heads open and take a feffin' coredump in the 0xdeadbeef inside!"
--Nat