one of the problems I have found with mysql 4 is the way it scores results in boolean mode of fulltext.
This means that something that matches '"quite a long phrase"' scores lower than '+word other'. So your scoring is totally different for queries according to how many distinct tokens the query contains.
The way around this is to calculate a max score for a query.. this snippet of code is quite handy for this :
#/usr/bin/perl -w print "\nstarting...\n"; my @strings = ('"quite long phrase"','+must optional','"short phrase" word', '"quite long phrase" word'); foreach (@strings) { my $max = 0; print "string:$_\n"; my @tokens = m/(\"[\s\S]+\"|\S+)/g; print "tokens:\n"; print join(":",@tokens), "\n\n"; foreach (@tokens) { $max += (m/[\"\+]/) ? 0.8 : 0.3; } print "max score : $max\n"; } print "done...\n";
my @tokens = m/(\"[\s\S]+\"|\S+)/g;
Shouldn't that regex be m/(\"[^"]+\"|\S+)/g
? Or as I'd normally write it,/("[^"]+"|\S+)/g
. The string might include two quoted phrases.
Re:Bug?
TeeJay on 2002-11-21T21:07:24
thats a very good point.The second regex would do very nicely