2 problems with typical perl cache usage (and 5 solutions)

jonswar on 2008-02-05T01:43:46

A typical cache usage pattern looks like this:

   # Try to get the value from the cache
   my $result = $cache->get('key');

   # If it isn't there, compute and set in the cache
   if (!defined($result)) {
      my $result = do_something_to_compute_result();
      $cache->set('key', $result, $expiration_time);
   }

This pattern is popular because it is easy to wrap around existing code, and easy to understand.

Unfortunately, it suffers from two problems:

Miss Stampedes: When a cache item expires at the specified time, any processes trying to get it will start to recompute it. If it is a popular cache item and if recomputation is expensive, you may get many recomputations for the same item, which is at best wasteful and at worst can bog down a server.

I originally recognized this problem while working on Mason's busy locking, but am obviously not alone in experiencing it. The term "miss stampede" comes from this memcached list discussion - definitely worth a read.
Recomputation Latency: When a cache item is recomputed, the client (whether that be a browser, command-line, whatever) has to wait for the computation to complete. Since caching keeps average latencies down, there is a tendency to ignore the unfortunate customer that gets stuck with one or more cache misses.

Here are some ways of tweaking the usage pattern above to address one or both of these problems. I've added the initials of the problems that each one addresses, and mentioned relevant features from CHI, if any.

Probabilistic expiration (MS)

Instead of specifying a single expiration time, specify a range of time during which expiration might occur. Then each cache get makes an independent probabilistic decision as to whether the item has expired. The probability starts out low at the beginning of the range and increases to 1.0 at the end of the range. What this means for popular cache items is that only one or a handful of gets will most likely expire at the same time.

CHI supports this with the expires_variance parameter. It may be passed to individual set commands or as a default for all sets. Personally, I plan to default it to 0.2 or so in almost all my caches.

Drawbacks: Since this is probabilistic, you get no guarantee of how well stampedes will be avoided (if at all), and you have to try to guess the right variance to use.
Busy locks (MS)

When a cache item expires, flag the item for a short time, either by upping its expiration time or by setting an associated value in the cache. Subsequent misses will see the flag and return the old value instead of duplicating the recompute effort.

CHI supports this with the busy_lock parameter, stolen from Mason. It works by temporarily setting the expiration time forward by the specified amount of time.

Drawbacks: Setting a busy lock involves a separate write. If you use this feature liberally, you'll double the number of write operations you do. Some backends will suffer from a race condition, a small window of time in which many processes may decide to recompute, before the first lock has been successfully set.
Background recomputation (RL)

When a cache item expires, return the old value immediately, then kick off a recomputation in the background. This spares the client from the cost of the recompute.

This requires a non-traditional usage pattern, since the get and set are effectively happening as part of one operation. In CHI it will look like this:
```
    my $result =
      $cache->compute( 'key', sub { do_something_to_compute_result() },
        $expiration_time );
```
CHI already has a working compute API, but doesn't yet know how to run things in the background. Coming soon.

Drawbacks: Requires a non-traditional and somewhat ugly code pattern; background processes are harder to track and debug.
External recomputation (MS + RL)

Recompute cache items entirely from an external process, either when change events occur or when items approach their expiration time. Items never actually expire as the result of a client request. This is the most efficient and client-friendly solution, if you can manage it.

Drawbacks: Requires extra external processes (more moving parts). Code to recompute caches must be available from the external process, which can result in some unwanted code separation, API contortions, or repetition. It is also difficult to know which items to keep repopulating, and when exactly to recompute them.
Externally initiated recomputation (MS + RL)

Use a periodic external process to trigger events that will naturally utilize your caches (e.g. write a cron job that hits common pages on your website), but pass a special flag making items more likely to expire. This makes it less likely that expiration will occur during a real client request.

This is not yet supported in CHI, but the idea would be to add some kind of easily-accessible lever to temporarily view all expiration times as reduced. e.g.
```
    # Reduction ends when $lex goes out of scope
    my $lex = CHI->reduced_expirations(0.5);
```
Drawbacks: Requires extra external processes (more moving parts). Triggers and their run frequencies must be carefully chosen.

What other techniques have you used, and what success/failures have you had with them?