PDF

TorgoX on 2003-08-17T11:04:28

Dear Log,

I've been looking at PDF as a format lately. Whereas I could sort of wrap my mind around PostScript, and had no trouble with RTF, PDF is not so friendly.

PDF is an odd format, internally. It's a strange blend of simplicity, human-readability, optimization, and inscrutibility. For example, it's very like some kind of simple data-dumping format where you just have:

defineobject "Blorch" => {
    key1 => val1,
    key2 => val2,
    key3 => val3,
   ...
}

...over and over, with the understanding that there's some special semantics imputed to magically-named objects and keys to get the ball rolling.

Except that PDF is only sort of like that -- the object IDs aren't alphanumeric symbol names, they're integers, and there's all sorts of byte-counts and byte-offsets being tossed around. It's as if an XML file ended with a lookup table of all the root's children's byte-offsets. PDF is complicated enough that a simple "Hello World" document is several hundred bytes, because of all the per-document overhead.

Here, this explains it pretty well.

PDF is not a crazy format by any means, although there were some odd decisions made. For example, in the original spec, all the document-indexing data is at the very end of the file, whereas clearly if you were thinking about streaming a file to web clients, you'd probably put it at the beginning instead. (What they were thinking of was not that, but the problem of "fast saves" -- changing a disk file by just adding new objects to it and appending a new cross-references table.) Of course, in time, they did think of streaming, so for that and other reasons, the PDF spec has become a massive massive document.

Suddenly the MIDI File Format sounds friendly in comparison.

Long story short: Thank God there's CPAN modules for dealing with PDFs!


And C-libs!

jjohn on 2003-08-17T13:02:29

Yes, PDF is a bear. I believe most of the Perl modules rely on a C library (whose name escapes me). Oddly enough, PHP comes bundled with support for PDF writing and it uses that same C-lib. I believe you can embedded fonts in PDF, which partially addresses my long-standing beef with typesetting in Unix (it hasn't changed in th 20 years).

Unix Typesetting

Dom2 on 2003-08-17T18:57:57

But, what's wrong with groff? :-)

-Dom

Re:Unix Typesetting

jhi on 2003-08-18T07:34:50

groff? GROFF?! That's for GNU weenies. Real Men use troff.

Re:Unix Typesetting

Dom2 on 2003-08-18T08:58:25

Nah, it's just another implementation. Now as soon as I finish my implementation of troff in perl...

No, I would have to be seriously wacked out to even consider that.

-Dom