more search magic with mysql

TeeJay on 2002-11-20T15:10:10

one of the problems I have found with mysql 4 is the way it scores results in boolean mode of fulltext.

This means that something that matches '"quite a long phrase"' scores lower than '+word other'. So your scoring is totally different for queries according to how many distinct tokens the query contains.

The way around this is to calculate a max score for a query.. this snippet of code is quite handy for this :

#/usr/bin/perl -w

print "\nstarting...\n";
my @strings = ('"quite long phrase"','+must optional','"short phrase" word', '"quite long phrase" word');
foreach (@strings) {
    my $max = 0;
    print "string:$_\n";
    my @tokens = m/(\"[\s\S]+\"|\S+)/g;
    print "tokens:\n";
    print join(":",@tokens), "\n\n";
    foreach (@tokens) {
	$max += (m/[\"\+]/) ? 0.8 : 0.3;
    }
    print "max score : $max\n";
}

print "done...\n";


Bug?

vsergu on 2002-11-21T19:51:03

my @tokens = m/(\"[\s\S]+\"|\S+)/g;

Shouldn't that regex be m/(\"[^"]+\"|\S+)/g ? Or as I'd normally write it, /("[^"]+"|\S+)/g . The string might include two quoted phrases.

Re:Bug?

TeeJay on 2002-11-21T21:07:24

thats a very good point.

The second regex would do very nicely