PDF::API2 Considered Dangerous

Alias on 2008-02-12T05:51:45

In between doing interesting toolchainy stuff like building a system for safe database migrations, making Perl product rpm'ification sane on RHEL5 and implementing a magic inside+outside-aware web testing module, I've also been tasked by $work to make some upgrades to our online configurable product system.

Which, to sum it up, means business cards.

Business Cards that are preset web-based templates that company employees can provide varying error-checked details to and have cards appear on their desk a week later.

There's letterheads and other easier stuff too, but mostly nobody pays attention to that, since business cards are both the most tricky and have the highest emotional factor for clients.

Business cards are also the print industry's loss leader. If you can get a company's business cards right, then you have a decent chance of getting the REST of their print business too.

So print companies ALSO care a lot about business cards.

Now doing layout logic for something small like a business card doesn't seem THAT complex, until you discover that it's all being done to print industry standards, which adds insane (but necessary) complexity to the problem.

Concepts like "colour" and "dimensions" and "font" and "position" and "background" a myriad of other fundamentals need to be redefined in entirely different ways.

All of this complexity results in about 50,000 SLOC and 30-35 tables, to not only produce you a business card correctly, but let you see it in advance on screen absolutely and provably identical to the way it will come back from the print shop. WYSIWYG for custom print jobs.

The current implementation, integrated into the large 200,000 SLOC system, is a port of an implementation originally created by a startup that $work bought years ago. Their implementation was 100% Microsoft, and couldn't be integrated directly, so it was ported instead.

The current version uses PDF::API2 under the covers to provide the primary layout rendering to PDF, from which we then cook it into a gif for screen previewing.

Unfortunately, PDF::API2 in the default usage does not appear to live up to print industry standards.

Oh sure, it prints basic documents, letterheads, invoices and so on good enough.

But for business cards, it's a Big Deal when a company executive has the hyphen in his last name sitting too far to the left.

Why doesn't it work right? To be honest, I'm not entirely sure yet. I have a week allocated in the next month some time to work out exactly why.

But the short version seems to be that PDF::API2 "doesn't do fonts like any other Windows or Mac program that the rest of the industry uses".

A font taken from Windows or Mac and put into PDF::API2 comes out different. This isn't simply a matter of differences between programs. They appear to work EVERYWHERE else, except for in "our product".

Although I'm entirely sure, we suspect the work of some form of proprietary Adobe auto-hinter magic that is licensed in "everything else", but which isn't available to us.

So PDF::API2 is just using the font metrics available in the font, which can be minimal because they assume the presence of the auto-hinter.

So suffice it to say that while PDF::API2 works just fine, be very cautious if you plan to use it for some form of document that has to replicate a design to high print standards.

You may not get out what the designer put in...

(More details as I investigate further, at which time I'll also hopefully have some proper bugs to submit)

There is also PDFLib

Stevan on 2008-02-12T13:33:02

Lemme start by saying that before I got into programming, I did desktop publishing and typeset many a business card using antiquated versions of Quark and/or PageMaker. So I know very well the horrors of which you speak.

If I were you I would look into replacing PDF::API2 with PDFLib (or the free version PDFLib-lite). Sure, it is not on CPAN and so annoying to install, but the level of quality in the output and control is top notch. We have been using it at $work for creating very high quality reports and graphs for almost 5 years now and I am very happy with it. The API is very C-ish and procedural and not OO like PDF::API2, but if you know the basics about Postscript and how PDF documents are structured it makes sense. And since the library itself is in C it's pretty fast as well.

- Stevan

Re:There is also PDFLib

Alias on 2008-02-13T00:06:31
Good.

We did find PDFLib, and identified it as our last-resort strategy (we'd prefer not to have to replace the renderer).

As for the horrid API you mention, I was just planning to write a CPAN'ified object-oriented wrapper around the C-inspired one.

Chalk one up for inflexibility to Java here.

Every other language they support uses the same ugly hacked API, except for Java which doesn't support it, so they HAD to provide a "real" OO API.

magic inside+outside-aware web testing module

stu42j on 2008-02-12T16:27:58

Any chance you can share more on your magic web testing module?

Re:magic inside+outside-aware web testing module

Alias on 2008-02-13T00:03:21
The inside+outside code isn't anything particularly amazing I'm afraid.

The web app here has a structure vaguely similar to Catalyst/Jifty in that it has a context object that holds the session/user/data etc.

First attempts at a pure LWP-based web testing library had some problems with user accounts and debugging. We could instantiate a controller manually (including creating user accounts on the fly) but lacked similar flexibility in the web version.

So the idea of inside+outside was that the master testing object for a single "website session" contains BOTH a web interface (based on Test::WWW::Mechanize) as well as a full instantiation of the codebase.

This lets us easily deal with things like per-test dynamic user creation for web testing, and simplifies the testing of the results of calls, as we can do something via the web interface and then examine the data model to see if it made the change we expected.

There's nothing particularly magic there, just a collection of state to allow for the tighter integration of both the inside (loaded classes) and the outside (www).

The database migration tool, on the other hand, is something special :)

I'll speak more about it later, as currently has some company-specific aspects (for configuration management as well as being Oracle-only) that I'll need to work out how to factor out before I tried for a CPAN release.