Finally, I can release my iPhoto shell

brian_d_foy on 2004-02-20T20:40:14

The bits can now fly safely from here to the satellite, so I released the things I have been saving up:

grepurl - a program to extract the URLs from an HTML document and filter the list in various ways, suitable for use with xargs.
webreaper - my program to scrape web sites so I can look at them offline. Now it has more ways to get around tricky web site things, like throttling.
Test::Prereq, Test::Manifest - now fat free, with less dependencies, and documented ways to show how you can make their use optional.
iphoto shell - control iPhoto from this little shell, using Mac::Glue behind the scenes.

grepurl === urifind

dlc on 2004-02-23T13:43:47

grepurl looks a lot like my urifind .

Re:grepurl === urifind

brian_d_foy on 2004-02-23T16:19:50
It is a lot like your urifind!

I do not get enough time on the net to research things as much I would like. I figured there was someone who must have done this before.

I had to sit at a desk for 12 hours doing pretty much nothing, so I made grepurl. Oh well, I had fun with it :)

Re:grepurl === urifind

dlc on 2004-02-23T16:27:10

For the first few weeks after I put this on CPAN, I kept waiting for the same message you just got, and it never came, which leads me to believe that there are now exactly two things that do this.

We should start a club. :)

urifind uses URI::Find, which, in my benchmarks, is slower that HTML::SimpleLinkExtor, and relies on regexes rather than document structure. Which means grepurl will most likely be faster and more accurate (urifind cannot do relative URIs, for example). Then again, I wrote urifind specifically to extract URIs from my mail, none of which is HTML.

So they are only superficially related. I hereby disband our club!

Re:grepurl === urifind

brian_d_foy on 2004-02-23T17:26:49
I looked around for URI::Find for about a week, because I knew something like it existed, but I kept thinking it was called something like Text::Extract::*. I have a minicpan on my computer, but I don't have a good way to search it (like search.cpan.org---the cpan script assumes you have a pretty good idea to start).

I've got urlfind now and i'm going to go back to my tent and look at it. one of the things i wanted to add was different data sources (HTML, text, xml, and so on), and i have a lot of time here recently. :)