The bits can now fly safely from here to the satellite, so I released the things I have been saving up:
grepurl looks a lot like my urifind
Re:grepurl === urifind
brian_d_foy on 2004-02-23T16:19:50
It is a lot like your urifind!
I do not get enough time on the net to research things as much I would like. I figured there was someone who must have done this before.
I had to sit at a desk for 12 hours doing pretty much nothing, so I made grepurl. Oh well, I had fun with it:) Re:grepurl === urifind
dlc on 2004-02-23T16:27:10
For the first few weeks after I put this on CPAN, I kept waiting for the same message you just got, and it never came, which leads me to believe that there are now exactly two things that do this.
We should start a club.
:) urifind uses URI::Find, which, in my benchmarks, is slower that HTML::SimpleLinkExtor, and relies on regexes rather than document structure. Which means grepurl will most likely be faster and more accurate (urifind cannot do relative URIs, for example). Then again, I wrote urifind specifically to extract URIs from my mail, none of which is HTML.
So they are only superficially related. I hereby disband our club!
Re:grepurl === urifind
brian_d_foy on 2004-02-23T17:26:49
I looked around for URI::Find for about a week, because I knew something like it existed, but I kept thinking it was called something like Text::Extract::*. I have a minicpan on my computer, but I don't have a good way to search it (like search.cpan.org---the cpan script assumes you have a pretty good idea to start).
I've got urlfind now and i'm going to go back to my tent and look at it. one of the things i wanted to add was different data sources (HTML, text, xml, and so on), and i have a lot of time here recently.:)