back to searching

TeeJay on 2003-06-01T20:21:22

After getting distracted by the wonderful elegance and sheer funkiness of Hilberts space-filling curve I have poked about in my todo pile and released a documented (but apparently still hard to understand) Data::Iterator::EasyObj, and am now back to work on my restaurant search / portal site.

Hopefully I have about a week of evenings worth left before I can put it on my server and start testing it in anger.

Right now I am playing with the searching - as anybody who reads my journal (in between bitching about the middle east, ASP and HTML::Template - the latter of which hasn't bitten me recently) will know I find ever so interesting.

Following links from that funky vector search article on perl.com I have been reading through some of the research and plan to implement a funky vector-based 'similar to this' search to the site.

This means heavily customising the example code in the perl.com article and basically doing the extra stuff : local / global term weighting (already mostly done for the reverse index ), word finding and indexing (again mostly already done with the reverse index, but now with added magic of Lingua::EN::Tagger which is now available and in version 0.02 (clearly ready for production use).

I am also considering using Lingua::Stem but I don't trust stem'ing of words to give the right results - the results seem a little over-zealous hopefully it has a 'minimal' mode or option.

on a side note - I am fairly happy with my debian installation part from one small detail - I can't get graphviz to install. This is annoying as graphviz is something I use a lot and I want to try out network *mumble* searching which looks really interesting.


Lingua::Stem feedback

rob_au on 2003-06-02T12:38:17

When I posted on PerlMonks about user experience with Lingua::Stem here, there were some really good replies and links offered, including some which profiled differences in the incidence of stemming errors between stemming methods. There may be something there of interest for you also including this link which references an article that compares stemmer performance - Notably, the results of this paper support the use of the Porter stemming technique, the same as that implemented in the Lingua::Stem module.