Project Stepping Stones

LTjake on 2010-02-03T21:04:44

Ever looked back at something you've worked on and thought: "Gee, it's too bad that project didn't get to it's ultimate goal, but, I've learned a lot from it." I have one of those projects. Such is the world of technology, your toolset is constantly evolving and shape-shifting; even "The Next Big Thing" can become obsolete. We move on to the next "Next Big Thing."

One such area in the Web Developer's toolset is "Search." I'm sure we can all relate to the experience of our first textbox with some behind-the-scenes code doing SQL "SELECT ... LIKE" statements. Perhaps at first it was raw DBI calls; maybe moving on to an abstraction layer (ORM and whatnot) shortly thereafter. Here's where things get interesting.

What happens when this is no longer Good Enough(tm). Google being essentially ubiquitous, people expect to plunk some words in the box and magically get what they want out the other side. I put in "Cat Hat" -- why didn't it give me "Cat in the Hat"? Okay, no problem. We can do some field and query normalization; removing stopwords, add term parsing ... wait, wait wait. There has to be prior art for this.

In 2004, the options are somewhat limited as far as Free/Open Source search software goes. Especially in Perl land. Swish-e looks pretty neat. We actually did some prototyping with it. It was definitely a step ahead of plain old SQL. Plucene came on the scene. Unfortunately, it's poor performance was a bit of a non-starter for us. The fact that it was modeled after the Lucene Java library, however, caught my eye.

I wanted to harness their project and its community, and bring it into our little Perl world. Luckily for me, someone else had already started down that road. The Lucene Web Service was a project by Robert Kaye, sponsored by CD Baby, which allowed users to talk to Lucene via an XML-based web service. After using it for a while, we developed some patches for bug fixes and enhancements. Because of our momentum with the project, we were eventually given total control over its development.

We attempted to strengthen the project by hooking into some existing standards. We leveraged the Atom Publishing Protocol as an analogy for dealing with indexes and documents. Search results were returned as an OpenSearch document. A document's field-value pairs were listed in the XOXO microformat. Creating a client for this setup meant a bunch of glue between the existing components (XML::Atom::Client and WWW::OpenSearch).

Almost in parallel, the Solr project emerged. Similar idea, much more support behind it. In the end, our idea never got very far, and Solr has turned out to be a fabulous product -- which we now use.

To this end, the Lucene WebService website will (finally) be shutting down in about a week's time. I've moved the pertinent code and wiki data to github in case anyone wants it. I still think it has some niche applications, but without some serious revamping of the java code, it will likely just rot.

At least it's a project that has led me to bigger and better things.