Just write a simple reverse index

TeeJay on 2004-10-04T09:36:27

Heard any good strategies for super-fast catalog-database search results?

Yes - just write a simple or weighted reverse index. Its trivial in any relational database and works accross tables and can be weighted by key columns or object attributes.

Don't use native RDBMS or SQL like unless you want your site to be as unsearchable as the millions of others out there.

Which reminds me - perl.com, and many perl sites (including search.cpan and perldoc.com) still have sucky searches - who else is interested in getting together and trying to improve them?

I am planning to port my current searchable class to Class::DBI, which might help. I will probably look at a module to provide a similar functionality in a procedural way to non-OO code.


Perl.com

chromatic on 2004-10-04T17:31:07

I'm interested in improvements to Perl.com; I'm exploring ways to improve the searching and findability of articles across the entire O'Reilly Network. Please feel free to send me ideas.

Re:Perl.com

TeeJay on 2004-10-05T09:12:30

see my response on the blog - I couldn't even find my article on searching through perl.com, found it straight away with google.

It would be nice if perl.com could showcase good perl solutions.

If I knew what perl.com was using I would be happy to make suggestions.

Maybe these can help

melo on 2004-10-04T21:25:57

http://search.cpan.org/user/tjmather/DBIx-FullTextSearch-0.73/
http://search.cpan.o rg/user/tmtm/Class-DBI-mysql-FullTextSearch-0.09/
http://search.cpan.org/user/snowhare/ Search-InvertedIndex-1.14/

I used the last one with good results.
--

Re:Maybe these can help

TeeJay on 2004-10-05T12:00:14

Yup the last one is a reverse index. They work nicely and are trivial to write and customise.

I wrote a handy class that implements cross-object context specific searching - Class-Indexed-0.01

Its based on the article I wrote for perl.com

The others aren't a solution - native RDBMS fulltext searching is only useful for really trivial cases - if your data is only in a single table and has a very simple structure so you are only interested in 1 or 2 columns and they don't change - move beyond that and native RDBMS fulltext searching is little use.

Re:Maybe these can help

perrin on 2004-10-06T14:55:25

It depends on how the database implements the full text search. Often it is just a reverse index under the covers, with some stemming and phrase support bolted on.

I agree that a reverse index is typically the way to go for custom searching. It scales well and is very customizable.

Re:Maybe these can help

TeeJay on 2004-10-12T10:33:42

I have yet to see an RDBMS native fulltext search that is as good as a handwritten one - all are held back by a total lack of customisation or weighting and can only apply to a single table. Very few provide features like stemming although they can provide some simple phrase matching and global weightings (i.e. common words may score less, etc).

As I said, even a good native RDBMS can be improved on trivially with a competent custom reverse index, as stemming, etc are trivial in perl. Native RDBMS are only useful if you happen to have one or two columns that you wish to index in a single table along with a primary key in the same table. If you move outside that domain even 'enterprise' databases like Oracle and SQL Server native fulltext searches are next to useless, hell if you use SQL server you need to write your own anyway as it requires manual reindexing, obscure and archaic syntax (both in the query and the stored procedures to use it).