Everything is optimized for writing, so reading can be a little slow. I've been fixing this while trying not to ruin the features that make it so powerful.
Any good web application that has to serve more than one user per second and serves dynamic content has to have a good caching scheme, or at least spend ridiculous amounts of money on hardware. The classic rule-of-thumb tradeoff applies. To run faster, you'll probably have to use more memory.
(Some of my recent tricks have actually been designed to improve speed and reduce the memory footprint. Anytime you can take care of shared memory pages, do it! I think the parent process size is now larger, but much, much more is actually shared across children, so that the overall effect is incredibly better.)
The node cache was an earlier attempt to save memory and time. Any time the engine needed to fetch a node, it would go to the cache. The simple approach is to keep a cache in each child process, using an LRU scheme with a tunable maximum cache size. Because each child kept its own cache, each child could potentially cache the same nodes. Each child would pay the memory cost of caching that many nodes, too. Worse, there's no child affinity, so the best the cache could do is when several users were all accessing the same nodes multiple times.
My clever "let's make it a little faster without rewriting too much code" scheme, compil-o-cache, compiled dynamic content and saved it in otherwise-unused keys in cached nodes. For frequently accessed nodes, this would save the fetching, parsing, and compiling steps, at the expense of looking in the cache.
Of course, that eliminated one entire class of optimizations. Sharing hashes between children is not difficult, with some of the caching modules out there. Sharing actual code references is harder. Even if you can do it, at some point, it defeats the purpose, as you'll have to recompile on each cache fetch or keep a secondary cache, or something along those lines.
I've largely solved the need to store compiled code in the cache, so now we can consider other options for improving the cache.
One good option remains the caching modules. I've never used them, but there's a simplicity and appeal there. Keeping the cache in sync with updates going on in the database could be tricky, though. Another option is moving the caching to the database layer. We could use DBD::Proxy to connect to a layer around the database that caches nodes. That might mean parsing SQL, however, to determine which queries are requesting nodes and which queries are updating nodes. It also means running another process on the server, which isn't onerous as much as it is just one more thing to install and monitor.
It may turn out that we don't really need improved caching, though I really doubt that's the case.
These are the reasons I don't fall asleep well. Scary, isn't it? Suggestions that don't involve graham crackers and warm milk are welcome.
Seriously, for caching of simple key-value pairs, it rocks. It's superfast and dead simple.
Re:Memcached
chromatic on 2003-08-29T04:45:58
Jay Bonci's looking into this. Can I store objects as the values?
Re:Memcached
jamiemccarthy on 2003-08-29T13:43:52
The server only stores scalars. But the MemCachedClient.pm API automatically does a Storable::freeze() if you pass in any nonscalar value, and automatically thaw()s it on the flipside.