I like general case solutions, like Plucene and DBM::Deep but sometimes you just have to get slightly lower level and do it all yourself. Why, you ask? For speed of course. I've just redone my recipe website, which has a lot of recipes, is used a lot (mostly by American housewives), and is hosted on a bitty box.
Enter CDB_File, a "Perl extension for access to CDB databases". Now, ignoring the fact that CDB is djbware, it rocks. The files are small, and they are very fast to read. Thus I do lots of processing and indexing on my build scripts and then suddenly you can search at lightning speed for orange wine, and find Orange-wine wafers. Note the related recipes on the right, generated with Search::ContextGraph.
Mmmmm, food and Perl. Happy birthday richardc!
Re:Back to CDB?
acme on 2005-01-15T11:05:48
The reason at the time was that CDB_File was leaking memory, so I switched to BerkeleyDB. However, I checked out CDB_File again and lo and behold it doesn't leak, it does what it says on the tin, and it's nice and fast. After doing some more performance tests, one of the nice things about CDB over BDB is that my CDB files are 2.5x smaller.
Re:BerkeleyDB
acme on 2005-01-15T11:30:52
And lo did I do some more benchmarking. Bear in mind that my data has numeric keys and largish objects serialised via Storable. 'deep' is DBM::Deep. 'bdb' is BerkeleyDB::Hash. 'bdb_btree' is BerkeleyDB::Btree with "pack "N" as a key filter and 'cdb' is 'CDB_File'. First time I run the benchmark:And when everything is cached in memory, all of them are about the same speed:Rate deep bdb bdb_btree cdb
deep 0.810/s -- -49% -49% -62%
bdb 1.58/s 96% -- -1% -25%
bdb_btree 1.59/s 97% 1% -- -25%
cdb 2.12/s 162% 34% 33% --Another interesting thing to note is the file sizes. The BerkeleyDB files are much larger:Rate bdb_btree deep cdb bdb
bdb_btree 280082/s -- -1% -1% -1%
deep 282205/s 1% -- -1% -1%
cdb 283829/s 1% 1% -- -0%
bdb 284326/s 2% 1% 0% --In summary: because my server isn't very powerful, I reckon CDB is the best way to go. It seems to work well in low memory and has smaller files too. Any other suggestions?-rw-r--r-- 1 acme acme 7544832 2005-01-15 11:01 test.bdb
-rw-r--r-- 1 acme acme 7532544 2005-01-15 11:01 test_btree.bdb
-rw-r--r-- 1 acme acme 4476905 2005-01-15 11:01 test.deep
-rw-r--r-- 1 acme acme 4421730 2005-01-15 11:01 test.cdbRe:BerkeleyDB
perrin on 2005-01-17T18:41:22
I feel a bit silly trying to optimize a recipe search, but... I suspect your use BerkeleyDB is not optimal. Take a look at this benchmark code for an example of high-speed use of BerkeleyDB.