the site that got me interested in pdf was Stas Beckmans site, www.stason.org. He gave a talk to the melbourne pm last year. Through the course of his talk on mod_perl 2 he showed the notes from his site in html with pdf downloads of the site.
So I tried to re-create this html->ps->pdf so that I too could have a printable version of a project I'm working on called Ratpile (make a directory that has *stuff* stored in it searchable by stuffing information about it into a relational database - data mining some may call it.) using perl+DBI+TT2. The template I created is a *bare bones* html page sans images. This is the technique Stas is using with his docset.
the point I guess I'm trying to make is I've used text only and not images. I've done a bit of research and this is what I've come up with...
now given what we have found above I would suggest the following (unless anyone has a better idea) of using:
the real problem maybe rendering the tables generated from word. complicated layout in word (re-rendered to html) will have to be modified to the postscript syntax then rendered to PDF. The problem is defined by converting the html tables to pdf.
it is not rocket science to create a bit of code to extract the data from the table, re-create a table using PDF-API (and its child modules).
but is there a shorcut?of course you could forget all the above and take your chances with Michael Frankl's HTML-HTMLDOC and convert you html files directly to PDF :)
the rocket science bit is trying to get this to work on cygwin or win32
credits
damn I love cpan.
Re:HtmlDoc
goon on 2004-01-26T04:20:07
thanks matt.had a look at the site and I'm pretty impressed with the license and docs and I'll checkout the ax plugin. I found a gnu tool available for multiple platforms - Antiword. It claims to convert "... Word documents to plain text, to PostScript and to XML/DocBook
...". might be interesting to have a showdown between the two and see the results.