Recently while testing perl-5.10.1-RC0 at $work I found that it worked seamlessly for everything except our custom debianized build. It turns out http://cpansearch.perl.org/src/ADAMK/Parse-CPAN-Meta-1.39/t/data/utf_16_le_bom.yml> a UTF-16LE encoded file with a BOM (byte order mark) breaks my build of a .deb including this.
debuild makes both the binary and source packages. It's nice for presenting the whole picture to someone trying to follow along later. debuild uses dpkg-source to generate a diff between the vendor's source package and the source tree used in the actual debian build. dpkg-source uses your diff which probably comes from diffutils-2.8.1 which was the last release in 2002 (http://ftp.gnu.org/pub/gnu/diffutils).
It looks like dev continued on til 2008 including the dev releases (ftp://alpha.gnu.org/pub/gnu/diffutils, http://git.savannah.gnu.org/cgit/diffutils.git>). Somewhere along the way it got support for multibyte characters. It still doesn't work for that UTF-16LE text file with a BOM.
Oh well. Anyway, consider this a tale of basic tools just not working nicely when your source code is in Unicode.
It took me a couple of years to get BOM support into PPI. It frustrates me that we still have to fight basic battles like this. The Java world has it pretty good, with Unicode from day 1. I hope Parrot/Rakudo does that well.
Re:Agreed
jjore on 2009-07-27T02:36:41
I've been on the sidelines for getting proper unicode support into perl but I'd think this was the domain of the PerlIO layers to suss this out.