Sometimes you want to cache the results of a function with side-effects. For example, you might cache the results of HTTP requests or database queries.
If you’re using Clojure, you might reach for core.cache. The caches created by core.cache are supposed to be wrapped in an atom, so you would write something like this:
(require '[clojure.core.cache :as cache])
(defn http-get [url] ...)
;; cache the results for a minute
(def my-cache (atom (cache/ttl-cache-factory {} :ttl 60000)))
(defn fetch-data [url]
(-> (swap! my-cache cache/through-cache url http-get))
(cache/lookup url))
If you try it, it seems to work. There’s a bug, though, but you will likely notice it only under high load. Can you spot it?
The problem is that swap!
may re-run the function that is passed to it. This
may cause a variaton of cache stampede.
(swap! atom f)
works something like this:
- Read the value of
atom
. - Apply
f
to the value. - Compare-and-set the new value to
atom
: if the value ofatom
is the same as it was in step 1, update it to the new value. If it has chenged, go to step 1.
In our case, if the cache gets updated while we’re doing a HTTP request in step 2, the request has to be re-done to update the cache – even if the other update was for another cache key! If you’re processing a lot of requests in parallel, it may take multiple retries to succesfully update the cache.
We experienced this at work recently. A microservice was calling another microservice exactly once per incoming request. When we enabled caching for the requests, implemented like above, the typical rate of requests went down but we started to see 10x spikes in requests. This is not what you want to see for one your busiest services.
Luckily there’s a simple solution: wrap the side-effecting call with delay
.
(defn fetch-data [url]
(let [new-value (delay (http-get url))]
(-> (swap! my-cache cache/through-cache url (fn [_] @new-value))
(cache/lookup url))))
The cache update may still take multiple attempts, but the delayed value is computed at most once.
This looks a bit messy, so let’s use the new clojure.core.cache.wrapped
namespace that was introduced in core.cache 0.8.0 (August 2019). It takes care
of wrapping the cache in an atom and implements the delaying logic and more:
(require '[clojure.core.cache.wrapped :as cache])
(def my-cache (cache/ttl-cache-factory {} :ttl 60000))
(defn fetch-data [url]
(cache/lookup-or-miss my-cache url http-get))
This is nice, but there’s still room for improvement.
If multiple threads request the same URL at roughly the same time, they all will do the HTTP request. It would be more efficient if only one of the threads would do the request and the other would wait for it to finish. You could implement this yourself by doing some locking… but you could also use core.memoize, which does it for you.
(require '[clojure.core.memoize :as memo])
(def fetch-data (memo/ttl http-get :ttl/threshold 60000))
I guess the moral of the story is that if you use high-quality higher-level libraries, the authors will have already solved the thorny lower-level problems for you.