After an excellent talk on the Encode module by Stephen Edmonds and a close encounter with the õ symbol, i've been playing with the various unicode symbols a lot.
A missing component seems to be putting multi-byte unicode into a PDF document. Try this little beggar 狗 on for size.
PDF::API2, htmldoc and html2ps all seem to have problems with characters when the encodings uses more than 1 byte to represent 1 character.
Anyone know a tool that can do the job?
I spent a few days just this week pinning down a bug in PDF::API2 that prints some characters from Unicode strings (in this case, just the Polish character set) on top of each other.
I nailed it. I just have to feed it back to its maintainers.
I have not tried it with your character, I don't even have a clue as to what character set it fits in (something CJK, but that's it) and thus, what font I ought to use.
Re:Funny that you bring this up now...
srezic on 2009-03-15T18:07:18
I spent a few days just this week pinning down a bug in PDF::API2 that prints some characters from Unicode strings (in this case, just the Polish character set) on top of each other.
I nailed it. I just have to feed it back to its maintainers.
Just stumbled over the same problem, using Croation characters. Do you have a patch/solution/workaround? If so, can you send it to me?
Re:Funny that you bring this up now...
bart on 2009-03-17T07:51:11
When is the last time that you updated PDF::API2? Because it is fixed in the latest release on CPAN (0.72).Re:Funny that you bring this up now...
srezic on 2009-03-18T22:24:07
Indeed. I first tried Debian's current package, which has $PDF::API2::VERSION set to 2.015, just like the current CPAN version, so I thought it's the current one. But Debian's libpdf-api2-perl is only based on 0.69.
The maintainer of PDF::Reuse accepted my patch to add this functionality earlier this year.
It's my understanding that if you stick to the built-in PDF fonts you're stuck with characters in the Latin-1 range (roughly speaking). You have to use embedded fonts to get at Unicode characters outside that range.
Re:PDF::Reuse can do Unicode
ChrisDolan on 2008-11-04T05:50:43
That's correct. Appendix D of the PDF Reference explicitly lists the minimum glyphs that must be supported in the 14 standard fonts.
That said, I would not be surprised if non-Latin-1 Unicode characters worked fine in one of the basic fonts on a recent mainstream OS. To get Unicode in strings, you may need to employ to the hex notation (angle brackets).