I have a lot of web scrappers that pull down information and display it to me so I don't have to do a lot of pointing-and-clicking and scrolling.
My Mac has not been on the network for a while, so when I tried out my scrappers the other day, and most of them failed because the regular expressions do not end up matching anything.
Debugging this thing while I am paying $6/hour is a bit more than I want to pay, so I have a new trick.
All of these scrappers cache a lot of intermediate results in a hidden directory---so much so that some may think I overdo it a bit. Despite that, I have never saved the original web page.
I thought I could just save the web page from my browser, but I tried to save them as "Web Page Complete", which munges the HTML so when I look at it offline, it finds all of the right supporting files on my computer. The HTML in "Web Page Complete" is not the HTML my regexen see.
I modified these programs to save the real HTML so I can look at it later, but now I am a bit sad because this used to be so easy, and now there are more layers to looking at the plain source.
Re:Save as HTML
brian_d_foy on 2004-01-23T20:23:56
Indeed, and that is what I am doing.