Caching Design Decisions in Everything

chromatic on 2003-08-12T03:58:53

Everything is optimized for writing, so reading can be a little slow. I've been fixing this while trying not to ruin the features that make it so powerful.

Any good web application that has to serve more than one user per second and serves dynamic content has to have a good caching scheme, or at least spend ridiculous amounts of money on hardware. The classic rule-of-thumb tradeoff applies. To run faster, you'll probably have to use more memory.

(Some of my recent tricks have actually been designed to improve speed and reduce the memory footprint. Anytime you can take care of shared memory pages, do it! I think the parent process size is now larger, but much, much more is actually shared across children, so that the overall effect is incredibly better.)

The node cache was an earlier attempt to save memory and time. Any time the engine needed to fetch a node, it would go to the cache. The simple approach is to keep a cache in each child process, using an LRU scheme with a tunable maximum cache size. Because each child kept its own cache, each child could potentially cache the same nodes. Each child would pay the memory cost of caching that many nodes, too. Worse, there's no child affinity, so the best the cache could do is when several users were all accessing the same nodes multiple times.

My clever "let's make it a little faster without rewriting too much code" scheme, compil-o-cache, compiled dynamic content and saved it in otherwise-unused keys in cached nodes. For frequently accessed nodes, this would save the fetching, parsing, and compiling steps, at the expense of looking in the cache.

Of course, that eliminated one entire class of optimizations. Sharing hashes between children is not difficult, with some of the caching modules out there. Sharing actual code references is harder. Even if you can do it, at some point, it defeats the purpose, as you'll have to recompile on each cache fetch or keep a secondary cache, or something along those lines.

I've largely solved the need to store compiled code in the cache, so now we can consider other options for improving the cache.

One good option remains the caching modules. I've never used them, but there's a simplicity and appeal there. Keeping the cache in sync with updates going on in the database could be tricky, though. Another option is moving the caching to the database layer. We could use DBD::Proxy to connect to a layer around the database that caches nodes. That might mean parsing SQL, however, to determine which queries are requesting nodes and which queries are updating nodes. It also means running another process on the server, which isn't onerous as much as it is just one more thing to install and monitor.

It may turn out that we don't really need improved caching, though I really doubt that's the case.

These are the reasons I don't fall asleep well. Scary, isn't it? Suggestions that don't involve graham crackers and warm milk are welcome.


Simple Caching modules

ask on 2003-08-12T04:21:22

I tend to use a variation of this module:

http://svn.develooper.com/combust/trunk/lib/Combust/Cache.pm

(use "anon" for username, no password).

It's quite fast, not a lot of code, supports complex data structures, meta data and a couple of other things.

(Develooper::DB::db_open is an Apache::DBI like dbh caching thing).

    - ask

caching

inkdroid on 2003-08-12T13:37:08

This would make a nice topic for an article Chromatic.

caching modules

perrin on 2003-08-12T18:41:11

I did a talk on caching modules at OSCON a couple of years back. I've been meaning to write it up into an article for a while now. The basic results of my benchmarking were that IPC::MM, BerkeleyDB (native locking), Cache::Mmap, and MLDBM::Sync are the fastest at sharing data. DBD::Proxy was a neat idea, but has terrible performance. You'd be better off just doing an extra fetch to check a cache table.

Memcached

jamiemccarthy on 2003-08-29T04:03:26

Got memcached?

Seriously, for caching of simple key-value pairs, it rocks. It's superfast and dead simple.

Re:Memcached

chromatic on 2003-08-29T04:45:58

Jay Bonci's looking into this. Can I store objects as the values?

Re:Memcached

jamiemccarthy on 2003-08-29T13:43:52

The server only stores scalars. But the MemCachedClient.pm API automatically does a Storable::freeze() if you pass in any nonscalar value, and automatically thaw()s it on the flipside.