object oriented fulltext indexing and searching

TeeJay on 2004-04-20T13:57:29

I am having fun porting my Class::Indexed code to our Tangram backed and very-OO system at $work.

I have had moderate success making the superclass handle relationships, etc and now it works well with 1 to (1 or 0) relationships.

The system allows you to specify what is indexed for a particular class, by giving an attribute name and a weighting for that attribute. It also allows you to specify a relationship (along with a method to call instead of an accessor), or a lookup (which is non-OO for orthogonal information based on lookups for speed and flexibility) from the index database ( which requires you have the relevent information in the same database as the reverse index database ).

On the whole I am rather pleased and I am working on fixing the gaps in our system, and finishing the indexing class so that it can cope with 1 or more to maNy relationships. (i.e. if an object on the many side is updated it is able to find and update the word index for object related to it, so far this works for 1 to 1, 1 to 0 and many to 1).

I have also added some limiting to word scores (so that if a word occurs frequently in 1 place the score doesn't skew results), class-level stopwords (hopefully later I will be able to add object-level stopwords), and normalisation (so that all scores fit on a curve between 0 and 1).

A lot of this should be back-ported into Class::Indexed some time soon - at least the class-level stopwords, normalisation and word score capping will anyway. The tangram related stuff is heavily dependant on our tangram hacks and isn't much use. Hopefully I can hack an alternative more general solution into Class::Indexed, possibly subclassing for modules like Class::DBI.

edited:

is there a better way of generating placeholders for dbi queries than :

 my $placeholders = join (',' ,map('?',@array));

There has to a nice way to do something that is so frequent. I was also hoping that there is a fast and elegant way to sum the contents of a hash or array somewhere in CPAN or perl 6.


Sum of an array

mir on 2004-04-20T15:08:22

sum in List::Util.

Generating placeholders

runrig on 2004-04-20T15:53:03

From the insert_hash example in the DBI docs:
my $sql = sprintf "insert into %s (%s) values (%s)",
    $table, join(",", @fields), join(",", ("?")x@fields);

Re:Generating placeholders

TeeJay on 2004-04-20T17:14:03

d'oh

but, of course