Search Engines

pudge on 2004-09-16T17:27:56

We are looking for a new search engine for Slashdot. Right now we use live searching in MySQL, which, well, sucks. I've been told to look into Plucene and Swish-e. Any other suggestions, or comments about those?

Plucene looks like a great, flexible, system, though I am concerned about performance and scalability (those are concerns with any system, but Simon says Plucene is "much slower" than its cousin Lucene, and I can find no info on scalability). I am beginning to look into Swish-e now, and have no comments about it yet.


Swish-e

vsergu on 2004-09-16T17:49:05

I looked briefly at Plucene but never got very far and got discouraged by all the comments about slowness.

I had heard of Swish-e but hadn't even considered it because I didn't think it could handle gigabyte collections of documents. After seeing Josh Rabinowitz's talk at YAPC I decided to try it out, and we've been quite surprised at how fast it is. We're in the process of junking the Windows-based search software we were using (with a jerryrigged Perl/MySQL system for splitting up searches and distributing them across several servers) and replacing it with Swish-e.

Swish-e has some limitations, some of which make it tricky to highlight matches or even give the total number of hits in a document, but the speed is worth it.

Lucene

Dom2 on 2004-09-16T18:00:04

I know it's not Perl, but have you considered looking at Lucene, together with some Inline-Java glue? That way, you'd get the fairly nice Lucene architecture, and possibly better performance.

-Dom

swish-e

da on 2004-09-17T18:17:43

I can't comment on scalability for the likes of slashdot, but swish-e certainly worked well for the site we deployed it on. Its parsing is quite good (for HTML, XML, and whatever you can parse programatically so it can grab data from SQL) and the output is quite flexible (your choice of templating systems); it wasn't too tough to make very nice looking output.

We're not using SWISH::API under mod_perl, so I'm curious about whether it can stand up to slashdot's load.

new search engine for slashdot

tf23 on 2004-10-07T11:53:57

Pudge,

What ever came of this? I'm curious if you found a suitable one or if you're still looking.

Have you looked at Texis' Webinator?