DBIx::Class::Indexed

LTjake on 2007-04-05T18:34:31

It's pretty much a given that a modern web app will have some sort of database backend. With a modern web app comes a modern web framework. This also means that you'll probably use some sort of ORM. You can then tie your ORM to your web framework and create a CRUD interface.

For the unaware, CRUD stands for Create, Read, Update and Delete. 4 primitive, but essential, operations for managing the data in the store. And alternative acronym is BREAD -- Browse, Read, Edit, Add, Delete. This acromym incorporates the notion that, in order to do read, edit, and delete operations, you need to be able to find a particular record. This typically comes in the form of a x-records per page list that you can navigate until you find what you want.

Finding what you want has become increasingly important. This is why I'd like to add a new letter to the BREAD/CRUD acronyms -- S, for Search.

Although we could live with whatever google provides, I have no doubt that most people would like a more domain-focused search facility for whatever application they're working with. I've seen many interfaces with a bunch of boxes and pull-downs for boolean operations. In such cases, the server side application is probably translating that into SELECT * FROM foo WHERE x LIKE '%y%' or some such.

That may be good enough for some -- but even though the user knows they're not using google, they still expect it to work the same way! Unfortunately, SQL-style searches typically mean that the user must know, to the letter, what they're looking for -- meaning sub-par search results and frustrated users.

This is where indexing (sometimes referred to as fulltext indexing) comes in. For the uninitiated, indexing can be generalized as the process by which data (text in our case) is optimized for document retrieval relevant to a search query. That's a really poor definition, but, I won't go into any more detail than that, so check out wikipedia for more information on the subject.

A lot of databases have, or are getting, the ability to to do said indexing -- but i've not experimented with that so I won't comment.

We're using the Lucene WebService to do our indexing. Other indexing systems with perl bindings include KinoSearch, Xapian, Hyper Estraier and even the C port of Lucene. All of these indexers should allow you to use Google-like syntax to search whatever data you've indexed.

Now, I've gone through all of that back-story to ask the following question: Wouldn't it be nice if every time I added/updated/deleted data with my ORM, my indexer choice would sync itself accordingly?

Well, if your ORM is DBIx::Class, then I might have something for you. I've started creating a component to do just that -- DBIx::Class::Indexed.

Here's a sample usage:

package PetShop::Cat;

use strict;
use warnings;
use base qw( DBIx::Class );

__PACKAGE__->load_components( qw( Indexed Core ) );
__PACKAGE__->set_indexer(
    'KinoSearch',
    {
        invindex => '/path/to/my/index/'
    }
);
__PACKAGE__->table('cat');
__PACKAGE__->add_columns(
    cat_id => {
        data_type         => 'integer',
        is_auto_increment => 1,
    },
    name => {
        data_type => 'varchar',
        size      => 512,
        indexed   => 1,
    },
    color => {
        data_type   => 'varchar',
        size        => 100,
        is_nullable => 1,
        indexed     => 1,
    },
    age => {
        data_type   => 'integer',
        indexed     => 1,
        is_nullable => 1,
    },
);
__PACKAGE__->set_primary_key( qw( cat_id ) );

1;

So, each time I add/update/delete a Cat -- KinoSearch will follow suit. Currently, we only have components for indexing, but I've been chatting with the DBIx-Class devs to figure out the best way to integrate searching into the mix.

If this project interests you, leave a comment or send me an email. I'm starting with lucene-ws and KinoSearch as targets. My knowledge of KinoSearch is slim so I would really appreciate some help there.