TPC presentations

brian_d_foy on 2003-11-14T21:32:04

Ziggy sent me 300+ MB of O'ReillyCon presentations, so I have a lot of reading to do.

I tried to download some others, but a lot of them had converted to HTML and wanted me to look at a lot of pages. I cannot really do that here. I download a lot of stuff, then look at it offline. Bah, humbug!

Somewhere I have a powerpoint to PDF converter that I wrote at the first MacOSXCon. The PDF turned out to be huge since every page had to reproduce the background image. There is probably some PostScripty way around that, though.


I used to use a program...

jordan on 2003-11-14T23:01:29

That would spider a site, download all the pages and change the links to your on-disk copy. Those used to be really commonly used when people used slow 36K MODEM links on the web, but I haven't used one in years and I can't recall the names of any such programs. My naive google searching for something like this hasn't immediately turned up anything. Anybody know of a good one? Seems like a fairly easy perl program to write, actually.

Also, I used to use Plucker for Palm devices, which does this, but downloads the result to a Palm device. I say, used to, because my work gave me a fairly tricked out Pocket PC to use. Bah! I really like the Palm better, but this Pocket PC is SOO much better than my old palm in terms of sound and memory and display and wirelessness. I sadly gave my old Palm III to my daughter.

Re:I used to use a program...

ziggy on 2003-11-14T23:11:38

Seems like a fairly easy perl program to write, actually.
Actually, it's harder to write than you'd think. There are lots of edge cases to handle: not only do you need to fetch images and munge the <img ...> tags, but all of the frames, iframes, CSS stylesheets, media files and so on. Oh, and don't forget to rationalize all of the URLs. LWP can convert relative to absolute URLs for you, but you still need to find either and replace them with something relative on your filesystem.

I tried it a few times. It's not the 30-minute hack it appears to be. You're probably better off using GNU wget instead of rewriting it from scratch in Perl.

Re:I used to use a program...

jordan on 2003-11-15T14:46:28

Hey, cool, I didn't know wget would convert links for you. Thanks!

Actually, I can think of a good use for this. There're a lot of web-based docs that I've used that don't come in the form of one big HTML file. It's sometimes slow to browse these from work. You get the idea...

Hmmmm... Can I maintain a bunch of pages on my local hard drive compressed such that they will uncompress when I access them from a browser? I could run Apache on my desktop, sure, but how do I build something that would support this compression I'm thinking of.

I do maintain several directories of documentation locally of large documentation sets, the Perl CD bookshelf (vers. 2, is vers. 3 worth it? Discuss please), the OpenVMS System documentation (one of the hats I wear includes supporting OpenVMS) and these are both pretty large. OTOH, maybe I'm too focused on their size. Both of these taken together use .5 GB, not much for a 40GB disk when you weigh their value when you need them. I can't really imagine I could grab another 500 MB of docs off the net that I would so often as to require them to be on disk.

Re:I used to use a program...

brian_d_foy on 2003-11-18T06:33:42

With a little (mod-perl) URL translation and content filtering, you could easily read some compressed files and present them uncompressed. I think there even use to be a browser that could gunzip the right sort of response.

Re:I used to use a program...

brian_d_foy on 2003-11-18T07:07:57

Well, getting 80% of the problem done is about 30 minutes, give or take a day. Since I tend to stay away from sites with goofy features, that 80% of the problem is 99% of my experience, so I don't have to do the other 20% of the problem just to get the 1% benefit.

In fact, I had to do this to convert some IE web archives so I can read them.

Re:I used to use a program...

brian_d_foy on 2003-11-15T03:49:56

I have such programs on my computer, but I don't always get to use my computer.

On MacOS X I use either SiteSucker or WebDevil.

Last night, however, I was stuck on an Army computer.

PDF Conversions

ziggy on 2003-11-14T23:06:32

Hey, I already used a PowerPoint to PDF converter on those slides. It's called Keynote. ;-)

Keynote 1.0 had that same problem. The 1.1 update improved the PDF output somewhat, but it's still a little weighty. I think they managed to include the background image once (vs. once per slide), but that's just guesswork based on how the filesize went down 5x-10x.

The PDF files that Keynote generates are still pretty big. I'm just guessing here, but Keynote may be using TIFF for all of its images, instead of something more svelte, like PNG or JPEG.