We have been working on design patterns and implementation for adding caching to our REST services using the ServiceStack framework. And I have to say, the design patterns took quite a while to iron out with experimentation, but in retrospect implementing them with ServiceStack has been a breeze relatively.
This post is long and technical, and only for those who want to support caching of their REST services.
So, why have we gone there with caching our REST services so soon in our product lifecycle?
Answer: Because, we are continuously evolving a bunch of REST services that our business depends on. We realised (the hard way) that if we wanted minimize the risk of producing slow user experiences, that we need to optimize the data going back and forth over the wire. And to do that we needed to have our clients reuse (as much as they can) any responses that they have already downloaded, that haven't yet changed (something that HTTP caching can address). AND we also wanted to save on compute time and reduce the load on these services from having to re-calculate the same kinds of responses over and over again, sometimes within a few seconds of each other (something that Service Caching can address).
This is not just about just saving stuff you had already downloaded previously (caching), this is about detecting when that stuff may have changed, and when you need to get the update efficiently and reliably (cache management). As Phil Karlton said in two hard things... That's the trickier part, and the place many web developers haven't been or had to go before. In 20+ years of web development, I know I've not had to go too deep here before, and I suspect many others still haven't either. So how easy is it to go there?
You will need some study time.
[BTW: For context, it's worth noting that we have many REST API's under constant evolution (50+). A few different types of services at the 'back-end' which run the core business, and WebAPI service at the 'front-end' that helps display the desktop and mobile UX of our clients. Consequently, the 'front-end' WebAPI does a lot of aggregation of data from the 'back-end' API's, as it is primarily concerned with shaping all this data for user interaction. Aggregation, incidentally brings in a whole set of other issues when HTTP caching is introduced.] [Edit: see the comments too]
Along our learning journey into applying both HTTP Caching and Service Caching to our REST services we have made some significant discoveries while learning about a topic that surprisingly we have not been all that familiar with before. We needed some practical guidance. Go your fav search engine.
- The first major lesson we learned was just how poorly many web developers (ourselves included) understand HTTP caching (if at all). Have you ever studied RFC2616, section 13 before?
- The second lesson was, that there are not a lot of design patterns out there you can pick up and run with if you need to implement HTTP caching for your REST services. Lots of info and ideas (lots of it misleading of course), and very little actionable guidance to start with. Not a good start, as the devil sure is in the detail with this one as usual.
- The third is that HTTP Caching and Service Caching are entirely two different beasts (different goals and policies) that are vaguely related but not the same, and should not be conflated concepts. You kinda need both to really improve overall end to end performance of a REST service, but one does not replace the other. It's a both decision people.
- And, the fourth lesson we learned was how tough it was dealing with freshness between services that a aggregate responses from each other.
Furthermore, at the time we started this journey, our service development framework ServiceStack had built-in hooks for us to build both HTTP Caching and Service Caching, thank god, but we still had to do a lot of work to understand how to pull various pieces together and implement a bunch of components to make an effective end-to-end caching strategy. Thanks god for request and response filters!
[BTW: by the time you read this, that would have changed significantly, which is serendipitous for us, and the whole ServiceStack community). However, part of the reason we wanted to write this post is to give further guidance on how to expand your caching strategy with ServiceStack to accommodate concepts that are not present out of the box in ServiceStack caching strategy.]
The Patterns
Like most things, when investigating caching, it does depend on what you are trying to achieve and why. And understanding the different goals between these two patterns is key.
On the Service Caching side, there is a fairly well-known pattern of: caching most of your GET responses (for longish durations i.e. minutes), and refreshing those cached GET responses on either a POST/PUT/DELETE request for the same resource - as outlined (in part) in the ServiceStack caching guidance. Depending on what verbs your specific resource supports, in many cases the API self-manages it 'freshness' using this pattern.
[Note: There is a missing part to what is often described in this pattern. That for many GET API's that return 'search' kind of results containing multiple resources, you'll also want to refresh the cached GET search results if there is a chance that a POST of a new resource will add to the search results.]
On the HTTP Caching side, it turned out, there are only so many general caching requirements that can be met with the current HTTP 1.1 standard in HTTP caching. Not that the HTTP 1.1. is lacking in any way, it is that the requirements for HTTP caching are pretty well defined and accommodated by the specification - pretty ingenious actually. Given that, it seems like those design patterns should be easily definable in a simple actionable specification agnostic of platform or language.
[One day, I see someone taking the bold step of setting up a site for standardised design patterns where they are defined in some generic unit test format that anyone can read and translate into their own language. I digress however.]
If you are not familiar with HTTP caching, you can find more practical guidance in RFC7234 (which supersedes RFC2616 section 13). It is a typically computer sciencecy specification with tons of potential ambiguity (unless you have hours to study it), and so it is not surprising a lot of developers don't know how exactly to make it actionable in practice.
There are some great and credible references on how (logically) HTTP caching should work, and how Service Caching should be done, notably:
To summarize all this, in HTTP Caching, basically, there are two concepts to grasp.
- The first, and most familiar, is HTTP Expiration. That is where the resource server (your web service) declares that a particular resource has a TTL (time to live, or freshness) in (seconds or absolute UTC time - watch out there!). Clients that make a request read that information from caching response headers, and those headers describe 'advice' or 'directives' to the client that help the client decide how to cache and expire the response. It's a very familiar concept to most web developers, even if implementing it is not that common because many of the web servers take care of this kind of stuff automatically for you for static content. But that is not the case with REST services, where the content is much more dynamic. In general, you will have to do something explicit in code or configuration to make this happen with your responses. And as soon as you do that, you need to understand the second concept in HTTP caching, or you encounter numerous intermittent red lights in your integration testing of your API for unpredictable time-based reasons.
- The second concept is HTTP Validation. That is, where the resource server (your web service) is programmed explicitly to validate some metadata about a response that it previously sent to a client, such as an ETag or a Last-Modified timestamp. The client explicitly asks the server to validate either an ETag or Last-Modified of a response that it has in its cache. By sending a special GET request ('If-Not-Match' for ETags, or 'If-Modified-Since' for Last-Modified date stamps). The server performs the validation using information in the headers of the request, against information is has about the original response. The server responds by either: sending a '304 - Not Modified' response (plus empty content, and some up to date caching headers), if the validation succeeds. Or it responds with a 2XX status code along with the new response (plus up to date caching headers), if the validation fails.
Before we get into the weeds about how you can implement any of this, I think it is worth noting that HTTP Expiration on its own with REST services, has very limited utility without HTTP Validation. Why? because unlike static images/CSS/HTML, the content of REST services responses may change at any time for anyone, frequently or infrequently, and having fresh versions of those 'representations' is critical to the usability of many REST API's. The measure of volatility is harder to asses with API's. So waiting for client caches to time-out and expire does not produce a very responsive (as in, 'up-to-date') API, and pretty poor user experiences. Expiring the cache too often, again does not make a responsive API - it demands too many fetches. So assuming that you can just defer this commitment, and fine tune that TTL sometime in the future will take a lot of time, research and resources.
Service Caching versus HTTP Caching
First off, let's get something clear from the get go. Service Caching DOES NOT EQUAL HTTP Caching. They both have different design goals, they both have different implementations, but may share similar pieces of the overall architecture. However, in practice, Service Caching is likely to inform/enable Http Caching, but unlikely visa versa.
Service Caching, is likely to involve a distributed cache (like Redis/Memcached etc in a scalable) REST service), that needs to remember responses it has already calculated for a caller, for some duration (in secs/mins).
It is a key part of a caching strategy to reduce the workload that your service has to perform in calculating responses, and free up resources to handle more requests. Let your service REST damn it! This will help your service (and data repos) scale better to increased load.
It helps to cache as much as you can for as long as you can, serving responses from memory rather than recalculating them.
But beware! you will still need to authorize the caller each call, and you still need to be aware of who the caller is because many REST responses can look different depending on who is calling the same URI. So, lets summarise the main concerns here:
- Reduce compute load and increase availability, by improving response times (for scalable services using distributed caching)
- Maintaining accurate Authorization of the caller
- Dealing with multiple representations of the same resource for different callers (and never mixing them)
HTTP Caching, which involves using a cache on a client (i.e the browser cache or your own cache for your own client) to remember responses (and the caching 'advice') provided by a service for some duration (secs). This informs clients when to expire and when to validate the cached response. You are depending on the client to do the right things, as directed by the service.
It is a key part of your caching strategy to reduce the amount of data that is transmitted over the wire between client and service. The goal of which is to reduce the network latency of fetching data, to help improve your desktop/mobile UX and improve on your load on the service as more people use your service concurrently.
Also, when you have more and more users use your service, it makes it a cinch to stand up intermediary caches geographically closer to your users so they can share these responses.
It helps for the client to only ask for fresh responses when the data that it has cached is actually out of date (i.e. There needs to be a mechanism for it to tell when that time is right).
But beware! you still need to serve the right representation to the right caller, and you need to always be serving the most up-to-date representation to that caller. So, lets summarise the main concerns here:
- Improving network latency times (using a client cache of some kind, and using HTTP validation)
So, it turns out in practice that you can absolutely have Service Caching without HTTP Caching, and you can have HTTP Caching without Service Caching, and that both will improve performance of your REST services dramatically (one or two orders of magnitude perhaps). But working together, they far out perform each other separately.
In an architecture where you have many REST services, at different layers, like we do, we like to think that each REST service itself has both a Service Cache at the front of it, and a Client Cache at the back of it, for talking to other services out there in our architecture or on the Internet. So for us in practice HTTP caching is really thought of as client-side caching for the rest of this discussion.
So how did we implement both Service Caching and Client Caching in our architecture using ServiceStack?
Service Caching (with ServiceStack)
We applied the following policy for Service Caching on the front side of each of our ServiceStack REST services:
- For each GET service operation in a resource, we decide (case by case) if the response should/could be cached or NOT. Very few should not be, like images perhaps, and certain authorization functions perhaps. It's entirely your call.
- If we are caching this GET response, we decide on the expiration duration. Which for us is a T-Shirt size duration (i.e. VeryShort, Short, Medium, Long, VeryLong) for how long we should cache the GET response. Many REST resources may change infrequently or frequently depending on exactly what our users are doing, and so predicting when they change is near impossible, so the default we chose was a duration of 'Short'. There are some REST resources that change very rarely (if ever at all) like uploaded images or location data, so they get a 'VeryLong' duration instead. The actual values of these T-Shirt size durations we define in the configuration for our deployed service, so that we can tune the durations over time without recompiling the code. For example, 'Short' for us is currently 5mins, 'VeryLong' is 12hours, 'VeryShort' is 60 seconds. Reletively speaking, these are long durations, because the API is managing 'freshness' itself.
- Then we decide if the GET response (representation) will be unique depending on which user is calling. You won't know this upfront until you have implemented the API itself. But some API's verify the caller, or produce different responses for different users. Some produce the same representation of a resource for all users.
- For each POST/PUT/DELETE verb in the same resource, we decide what possible side effects that POST/PUT/DELETE would have on each of the cached GET representations. If a cached GET representation is likely to be impacted, we will wipe out all the GET representations that are cached upon the request for the POST/PUT/DELETE. Remembering that some POSTS can create new resources that would be included in the cached results of some of the GET verbs (i.e. search results).
The way we implement this caching policy is using declarative attributes on each of the service operations of each of the REST services we have.
We have a simple configuration for the caching T-Shirt sizes. And we have a base class (for all our services) that has a method for fetching a cached response (from a ICacheClient) and generating new responses by calling a delegate which includes all the code to calculate a new response, which is then cached. We also calculate an ETag for every cached response (MD5 digest), and we save that ETag alongside our cached response. Our ICacheClient in production is Redis, and we calculate a cache key based off the actual request (URI (PathInfo), + QueryString + the calling user's ID).
Also, code that returns the cached response runs long after the the caller has been identified and authorized.
Here is the code for a typical REST service in our architecture. We like cars: