Brittle Tests or Brittle Code?

We spend a lot of time fixing up integration and end-to-end tests when we change the behaviour of our API and UX. That process is quite frustrating sometimes, especially when you have asynchronous interactions and caching involved throughout all the different layers of the architecture, and it requires a lot of discipline and tenacity to keep moving forward. Sometimes, I definitely dream about giving up on fixing tests (especially the time dependent ones) and moving on with a red light - shame on me.

I am in another one of those chases right now, and there seems no end to the red lights that go off in various places, sometimes frustratingly intermittently. You fix a bunch, then another bunch go red somewhere else. I also very much dislike the idea of adding various time based compensations to fix tests (like waits) to make them synchronize correctly. That's a smell to me.

So, I was venting the other day, "why are my UI tests so 'brittle' that they fail at the slightest little change somewhere deep in the API?", and that in some cases, those changes are miles away from the area of the product we are changing. This is so frustrating sometimes. 

Yes, I've of course, been tempted at some of the more frustrating moments just to disable or eliminate the newly failing tests, just to get my green lights and confidence back. And then I remind myself why we have them in the first place. And I know, I don't want to go to that hell again.

 

So, should tests be brittle?

And if not, what is the practical alternative?

At this point, I'm not sure it's a valid question. Not all tests are created equal. And absolute rules that don't comply to the "it depends" test, I am suspicious of nowadays.

For example, in our current product we make a firm distinction between unit tests, integration tests and end-to-end tests (e2e tests strictly speaking are integration tests, that either start at the UI or at front end of an application - whatever that might be). Since we live in a world of full-stack web apps backed with API's it's a convenient distinction for us right now. 

Unit tests: By design, unit tests ARE brittle, and absolutely should be! Many people don't know this, so it's worth saying out loud. There is (in practice) a 'finite' (i.e. bounded) number of unit tests you can, and should write, and they should be highly cohesive with the the code you are creating that satisfies those tests. This is especially true if you are designing your code test-first. That's because every line of code exists because you wrote a test to require it. Which usually, but not always, means that their is [at least] one test for every code branch in your CUT (code under test). 

You absolutely, want a unit test to fail if you change even just one line of that code.  So being brittle is an essential quality of a unit test for maintaining a good design and refactoring code. If you can change a line of that code and it does not break a test, you should take immediate pause and perhaps remedy that (if you can).

Integration tests: unlike unit tests, you can write [if you try hard enough] an infinite number of integration tests to test your system. There are so many permutations [close to infinite for all practical purposes] of context and variability at this level, that you could test at this level. The number of these kinds of tests is unbounded, and because of that, its hard to learn when enough is enough and stop writing more (it is diminishing returns). At the same time, you absolutely should write some integration tests, and in practice that number [that you should write] is usually small for each feature (i.e per API). I like to teach to developers, that you write an integration test for every common case that you know the code (end-to-end) has been designed for (and at least one for any bug you subsequently uncover, and one for any issue you think is a potential problem in that code path). You also ought to know what the minimal set of integration tests is going to be before you design the code. If you can write these tests first, and they run fast enough, they can be a powerful guide to let you know when you are done writing the feature.

Unit tests and integration tests are brittle by design for good reason and for good design!

I've noticed that lots of coders/developers disagree on this idea. Some even ward off others against writing large number of integration tests. But my experience has taught me that there'd  "be many dragons" when assuming your system works as you designed it without many of these tests to verify it. The fact that integration tests fail when you change something in your API/system, especially miles away from the affected code, is solid testament to that. So, it seems that having brittle integration tests, is a huge benefit too.

Incidentally, I also teach developers that integration and unit tests tell you your code does what you thought you needed it to do. End-to-end tests tell you what the system actually needs to do.

[Let's not even get into UI (e2e tests) needing to be brittle. That is somewhat unavoidable even with great tooling these days. If you change the UX, why wouldn't you would want the existing tests to fail to let you know that. It's so important.]

Anyone not for Brittle Tests?

So, you don't like the sound of brittle tests? Then what is the alternative?

What if you don't want your tests to be brittle, that sounds bad! I know most developers think that they should avoid anything that is 'brittle' because they heard it was bad for various reasons, and so having brittle tests around sounds wrong too, but it is just not true. Brittle tests good, brittle code bad!

I am going to assert that if your tests are not brittle they are for the most part worthless for regression and refactoring. That 'brittality' is a vital quality of effective tests at any level.

If you are a practitioner and you would rather not maintain brittle tests, then you probably don't have brittle tests. Which means you likely have few effective tests, and in all likelihood probably few to no tests at all [suspect strongly].  

Perhaps the tests you do have are just 'smoke tests'. That's good, you should have some of those, but if you claim that you have avoided brittleness and only now actually have smoke tests as a your regression mechanism, then you have just avoided the effort and frustration of maintaining rigorous brittle tests to achieve a state of un-brittleness. 

Is that such a good thing?

Might be, if your goal is to code up a spike or throwaway project and get done with that as quick as possible, and never to have to maintain it over time. A pet peeve of mine when that prototype becomes the basis for a real product, which is very common in project-based solution development that so many coders are involved in. Projects end, people move on. No durability, no robustness, no longevity, but great experience, learned heaps thanks, projects are good for that! Not great for delivering robust products though.

If you avoid brittle tests because you think that brittle is bad (in general) then presumably you don't have brittle tests that are maintained, and then you don't have a early warning system in place to detect regression problems. And if you don't have that, you definitely have areas of your code that don't work the way you expect them to, and you spend most of your time in a debugger (oh boy!).

Perhaps you are cool with that for now, and are happy to wait to hear from your user's when your app or service is used in anger in production and breaks for them?

[BTW: They are not going to tell you when that happens. You know this, so, I assume you accept the  frustration and enjoyment of debugging production code while your customers quietly slip away through your fingers?]

In a way, it could be said that you have designed your code to fail, and fail it will, because you are a good designer, and it will do that when you least expect it of course. Hasn't happened to you yet, right?

I guess most developers who believe this is a good way to proceed to build robust software products also see themselves as: smart, conscientious, cost-efficient, under-rated, and under-paid, and in all likeliness they are probably working alone not having to worry about others dealing with their creation long term. 

The absence of brittle tests, leads to brittle code. 

So I am going to assert that without brittle tests, you are more than likely, in most cases going to end up with brittle code instead. Especially if you are embarking on a sizable product with a long life.

It is well known now that code without tests is usually poorly designed, usually unstable, and often too brittle to refactor, and 'bit rot' sets in very quickly. That's really the scary stuff that drives developers away from maintaining or investing in their code bases long term. So no wonder creating and maintaining brittle tests for it seems like it is in vain. 

Get the brittle tests going before the code gets brittle! 

 

for (int i = 0; i < 10; i++)

I can't crisply explain why it drives me nuts to see a for-loop like this in code:

for (int i = 0; i < 10; i++)
{ 
      // Do almost anything on a single line or multiple lines, it's all ugliness, and almost always involves a statement like this:
      if (myArray[i] == somethingelse)
      {
          // do something complicated, and far worse if that affects the array itself
      }
} 

Is it that I instinctively have to scan and remember the bounds? Is it that it forces me to do bounds checking over and over in my own mental loop? is it that it forces me to do mental arithmetic? or is it that I lose track of what the hell is going on in the actual for-loop code because I have to work too hard to understand it all? If you have to get your finger out and trace it down the code on the screen, then it definitely needs refactoring!

(Yes, I've debugged 100's of these kinds of statements in my career, in multiple different languages, and I hated every one of those experiences, if it is to fix a bug, it has almost always been related to fixing the bounds or the indexing syntax!). I instinctively dislike [strongly] for-loops!

 

I'm going to have a go writing a post about this issue as an experiment to see if I can bring that tacit reasoning not to use the for-loop to the forefront of my conscious mind.

It dawned on me recently that I've actually not seen or used this ancient construct in C# code in, what feels like a decade I reckon. This is an ancient feeling in me rising up after being dormant for years.

I recently encountered the dreaded for-loop on several occasions as it has crept into our own codebase recently when pairing with both our budding developers who seem to have no qualms with using it whatsoever. They even use 'i' as the looping variable - straight out of the computer science text books (and probably heaps of stackoverflow answers too). "What else should I you use" they ask?

But the really interesting thing is that even though this construct repels me so viscerally, I can't articulate a convincing argument that will persuade these young guys not to create code with this horrid construct, even if they see others using it in their code base.

The guys must view me as some kind of crazed pedantic for trying so hard to convince them to remove it, since I can't say for sure why it's not so good to use. I think they want to get rid of it just to shut me up, but without seeing anything wrong with its use.

It got me thinking, how do I explain concisely what is so wrong with the dreaded C for-loop construct? How can I communicate that our budding craftswomen/craftsmen entering this craft?

So here goes....

Readability

Just so you know, before we get into it....

I desperately caved at some point in this story and created our own static methods so that theses guys could make their code more readable as a minimal first step to rid our code of the for-loop, if they still insisted on looping some number of times.

The functions looks like this: (can you spot the irony?)

public static class Repeat
{
    public static void This(Action action, int times)
    {
        if (times > 0)
        {
            for (var counter = 0; counter < times; counter++)
            {
                action();
            }
        }
    }

    public static void This(Action<int> action, int to)
    {
        if (to > 0)
        {
            for (var counter = 0; counter < to; counter++)
            {
                action(counter);
            }
        }
    }

    public static void This(Action<int> action, int from, int to)
    {
        if (to > 0
            && from >= 0
            && from <= to)
        {
            var maxCount = (from > 0) ? (to + 1) : to;
            for (var counter = from; counter < maxCount; counter++)
            {
                action(counter);
            }
        }
    }
}

Ironic that I can't write this code without using a for-loop!

I don't think I'd ever want to use this class myself (except perhaps the first overload), but at least the code is a little more readable now perhaps? So, readability has improved, but there is still a big problem with the code used inside this function.

Repeat.This((index) =>
{
    ...whatever you need using an array for, by index x10 
    ...you still use this syntax: myarray[index]
}, 10);

I guess now that I am writing about this, and just put that code usage example in , it shows me my first reason that I have for not liking the for-loop.

(1) The common usage of the for-loop construct results in code that is much harder to read and understand.

Why is that? Because (I think) that you have this indexer into some array, and knowing what that selected array item actually is, is not all that clear or self-describing (by name) in the code. And what's wrong with that? Well in 6 weeks time, am I going to remember what that thing is? Let alone, anyone else on the team coming after me even a hour later, and trying to figure that out!

Enumerables!

The next reason, and possibly the reason I've not used the for-loop ever since LINQ came out, is that I've not had to manipulate arrays of anything since then. Arrays moved on man, to enumerables!

Ever since LINQ came out in C# (circa 2007) I've not had to use arrays ever again. Thank God. I still hated for-loops but now I could avoid them, and their awkward readability entirely, and that's where I've been ever since. 

So, that leads to the next reason not to use for-loops:

(2) It is the 21st century, with LINQ in C#  (or angular/underscore in javascript) you don't need to manipulate arrays directly anymore, use ForEach() instead.

Since, with for-each you get a named variable for each item in the enumerable.

There, that is soooooo much more descriptive and readable than the indexing syntax.  

There is perhaps possibly one case where a for-each construct will not give you what you cleanly want want and force using it is clumsy. (See this stackoverflow post) .That might be when you actually care about the index of the item in the enumerables. Viz: 

var counter = 0;

myCollection.ForEach(item =>
{
    counter++;

    if (item.SomeInt == counter) 
    { 
        ... You have found your item
        return; 
    } 
});

Yep, not hideous but clumsy. Perhaps a for-loop might be appropriate here, perhaps not!

Reuse and Design

One final thought I had when I got this far in the post, and wracking  my brain for the obligatory third reason.

There have been some times when I've noticed the less experienced developers use for-loops to do stuff that is already taken care of better by other libraries or API's. Re-inventing the wheel is always a problem area for the less experienced, but doing stuff manually in a for loop that can be solved with an appropriate library function can sometimes be far more efficient and far more readable and maintainable.

So, for example, replacing all the characters in a string by using a for-loop is clearly a coding horror, and a lot worse than using library functions like string.replace() for example. Or even more robust, use a regex to identify the pattern rather than the character. How many nested for-loops have you seen doing that kind of rubbish?

(3) Use library functions that are optimized for enumerating or comparing items in collections.

OK, well that's all I could think of right now, what do you think?

Do you like for-loops in your code? or do you hate them? Can you tell us why?

Caching Anyone?

We have been working on design patterns and implementation for adding caching to our REST services using the ServiceStack framework. And I have to say, the design patterns took quite a while to iron out with experimentation, but in retrospect implementing them with ServiceStack has been a breeze relatively.

This post is long and technical, and only for those who want to support caching of their REST services.

So, why have we gone there with caching our REST services so soon in our product lifecycle?  

Answer: Because, we are continuously evolving a bunch of REST services that our business depends on. We realised (the hard way) that if we wanted minimize the risk of producing slow user experiences, that we need to optimize the data going back and forth over the wire. And to do that we needed to have our clients reuse (as much as they can) any responses that they have already downloaded, that haven't yet changed (something that HTTP caching can address). AND we also wanted to save on compute time and reduce the load on these services from having to re-calculate the same kinds of responses over and over again, sometimes within a few seconds of each other (something that Service Caching can address).

This is not just about just saving stuff you had already downloaded previously (caching), this is about detecting when that stuff may have changed, and when you need to get the update efficiently and reliably (cache management). As Phil Karlton said in two hard things... That's the trickier part, and the place many web developers haven't been or had to go before. In 20+ years of web development, I know I've not had to go too deep here before, and I suspect many others still haven't either. So how easy is it to go there?

You will need some study time. 

[BTW: For context, it's worth noting that we have many REST API's under constant evolution (50+). A few different types of services at the 'back-end' which run the core business, and WebAPI service at the 'front-end' that helps display the desktop and mobile UX of our clients. Consequently, the 'front-end' WebAPI does a lot of aggregation of data from the 'back-end' API's, as it is primarily concerned with shaping all this data for user interaction. Aggregation, incidentally brings in a whole set of other issues when HTTP caching is introduced.] [Edit: see the comments too]

Along our learning journey into applying both HTTP Caching and Service Caching to our REST services we have made some significant discoveries while learning about a topic that surprisingly we have not been all that familiar with before. We needed some practical guidance. Go your fav search engine.

  • The first major lesson we learned was just how poorly many web developers (ourselves included) understand HTTP caching (if at all). Have you ever studied RFC2616, section 13 before?
  • The second lesson was, that there are not a lot of design patterns out there you can pick up and run with if you need to implement HTTP caching for your REST services. Lots of info and ideas (lots of it misleading of course), and very little actionable guidance to start with. Not a good start, as the devil sure is in the detail with this one as usual.
  • The third is that HTTP Caching and Service Caching are entirely two different beasts (different goals and policies) that are vaguely related but not the same, and should not be conflated concepts. You kinda need both to really improve overall end to end performance of a REST service, but one does not replace the other. It's a both decision people.
  • And, the fourth lesson we learned was how tough it was dealing with freshness between services that a aggregate responses from each other.

Furthermore, at the time we started this journey, our service development framework ServiceStack had built-in hooks for us to build both HTTP Caching and Service Caching, thank god, but we still had to do a lot of work to understand how to pull various pieces together and implement a bunch of components to make an effective end-to-end caching strategy. Thanks god for request and response filters!

[BTW: by the time you read this, that would have changed significantly, which is serendipitous for us, and the whole ServiceStack community). However, part of the reason we wanted to write this post is to give further guidance on how to expand your caching strategy with ServiceStack to accommodate concepts that are not present out of the box in ServiceStack caching strategy.]

The Patterns

Like most things, when investigating caching, it does depend on what you are trying to achieve and why. And understanding the different goals between these two patterns is key. 

On the Service Caching side, there is a fairly well-known pattern of: caching most of your GET responses (for longish durations i.e. minutes), and refreshing those cached GET responses on either a POST/PUT/DELETE request for the same resource - as outlined (in part) in the ServiceStack caching guidance. Depending on what verbs your specific resource supports, in many cases the API self-manages it 'freshness' using this pattern. 

[Note: There is a missing part to what is often described in this pattern. That for many GET API's that return 'search' kind of results containing multiple resources, you'll also want to refresh the cached GET search results if there is a chance that a POST of a new resource will add to the search results.]

On the HTTP Caching side, it turned out, there are only so many general caching requirements that can be met with the current HTTP 1.1 standard in HTTP caching. Not that the HTTP 1.1. is lacking in any way, it is that the requirements for HTTP caching are pretty well defined and accommodated by the specification - pretty ingenious actually. Given that, it seems like those design patterns should be easily definable in a simple actionable specification agnostic of platform or language.

[One day, I see someone taking the bold step of setting up a site for standardised design patterns where they are defined in some generic unit test format that anyone can read and translate into their own language. I digress however.]

If you are not familiar with HTTP caching, you can find more practical guidance in RFC7234 (which supersedes RFC2616 section 13). It is a typically computer sciencecy specification with tons of potential ambiguity (unless you have hours to study it), and so it is not surprising a lot of developers don't know how exactly to make it actionable in practice.

There are some great and credible references on how (logically) HTTP caching should work, and how Service Caching should be done, notably:

To summarize all this, in HTTP Caching, basically, there are two concepts to grasp.

  • The first, and most familiar, is HTTP Expiration. That is where the resource server (your web service) declares that a particular resource has a TTL (time to live, or freshness) in (seconds or absolute UTC time - watch out there!). Clients that make a request read that information from caching response headers, and those headers describe 'advice' or 'directives' to the client that help the client decide how to cache and expire the response. It's a very familiar concept to most web developers, even if implementing it is not that common because many of the web servers take care of this kind of stuff automatically for you for static content. But that is not the case with REST services, where the content is much more dynamic. In general, you will have to do something explicit in code or configuration to make this happen with your responses. And as soon as you do that, you need to understand the second concept in HTTP caching, or you encounter numerous intermittent red lights in your integration testing of your API for unpredictable time-based reasons.
  • The second concept is HTTP Validation. That is, where the resource server (your web service) is programmed explicitly to validate some metadata about a response that it previously sent to a client, such as an ETag or a Last-Modified timestamp. The client explicitly asks the server to validate either an ETag or Last-Modified of a response that it has in its cache. By sending a special GET request ('If-Not-Match' for ETags, or 'If-Modified-Since' for Last-Modified date stamps). The server performs the validation using information in the headers of the request, against information is has about the original response. The server responds by either: sending a '304 - Not Modified' response (plus empty content, and some up to date caching headers), if the validation succeeds. Or it responds with a 2XX status code along with the new response (plus up to date caching headers), if the validation fails.

Before we get into the weeds about how you can implement any of this, I think it is worth noting that HTTP Expiration on its own with REST services, has very limited utility without HTTP Validation. Why? because unlike static images/CSS/HTML, the content of REST services responses may change at any time for anyone, frequently or infrequently, and having fresh versions of those 'representations' is critical to the usability of many REST API's. The measure of volatility is harder to asses with API's. So waiting for client caches to time-out and expire does not produce a very responsive (as in, 'up-to-date') API, and pretty poor user experiences. Expiring the cache too often, again does not make a responsive API - it demands too many fetches.  So assuming that you can just defer this commitment, and fine tune that TTL sometime in the future will take a lot of time, research and resources.

Service Caching versus HTTP Caching

First off, let's get something clear from the get go. Service Caching DOES NOT EQUAL HTTP Caching. They both have different design goals, they both have different implementations, but may share similar pieces of the overall architecture. However, in practice, Service Caching is likely to inform/enable Http Caching, but unlikely visa versa.

Service Caching, is likely to involve a distributed cache (like Redis/Memcached etc in a scalable) REST service), that needs to remember responses it has already calculated for a caller, for some duration (in secs/mins).

It is a key part of a caching strategy to reduce the workload that your service has to perform in calculating responses, and free up resources to handle more requests. Let your service REST damn it! This will help your service (and data repos) scale better to increased load.

It helps to cache as much as you can for as long as you can, serving responses from memory rather than recalculating them.

But beware! you will still need to authorize the caller each call, and you still need to be aware of who the caller is because many REST responses can look different depending on who is calling the same URI. So, lets summarise the main concerns here:

  • Reduce compute load and increase availability, by improving response times (for scalable services using distributed caching)
  • Maintaining accurate Authorization of the caller
  • Dealing with multiple representations of the same resource for different callers (and never mixing them)

HTTP Caching, which involves using a cache on a client (i.e the browser cache or your own cache for your own client) to remember responses (and the caching 'advice') provided by a service for some duration (secs). This informs clients when to expire and when to validate the cached response. You are depending on the client to do the right things, as directed by the service.

It is a key part of your caching strategy to reduce the amount of data that is transmitted over the wire between client and service. The goal of which is to reduce the network latency of fetching data, to help improve your desktop/mobile UX and improve on your load on the service as more people use your service concurrently.

Also, when you have more and more users use your service, it makes it a cinch to stand up intermediary caches geographically closer to your users so they can share these responses.

It helps for the client to only ask for fresh responses when the data that it has cached is actually out of date (i.e. There needs to be a mechanism for it to tell when that time is right).

But beware! you still need to serve the right representation to the right caller, and you need to always be serving the most up-to-date representation to that caller. So, lets summarise the main concerns here:

  • Improving network latency times (using a client cache of some kind, and using HTTP validation)

So, it turns out in practice that you can absolutely have Service Caching without HTTP Caching, and you can have HTTP Caching without Service Caching, and that both will improve performance of your REST services dramatically (one or two orders of magnitude perhaps). But working together, they far out perform each other separately. 

In an architecture where you have many REST services, at different layers, like we do, we like to think that each REST service itself has both a Service Cache at the front of it, and a Client Cache at the back of it, for talking to other services out there in our architecture or on the Internet. So for us in practice HTTP caching is really thought of as client-side caching for the rest of this discussion.

So how did we implement both Service Caching and Client Caching in our architecture using ServiceStack?

Service Caching (with ServiceStack)

We applied the following policy for Service Caching on the front side of each of our ServiceStack REST services:

  • For each GET service operation in a resource, we decide (case by case) if the response should/could be cached or NOT. Very few should not be, like images perhaps, and certain authorization functions perhaps. It's entirely your call.
  • If we are caching this GET response, we decide on the expiration duration. Which for us is a T-Shirt size duration (i.e. VeryShort, Short, Medium, Long, VeryLong) for how long we should cache the GET response. Many REST resources may change infrequently or frequently depending on exactly what our users are doing, and so predicting when they change is near impossible, so the default we chose was a duration of 'Short'. There are some REST resources that change very rarely (if ever at all) like uploaded images or location data, so they get a 'VeryLong' duration instead. The actual values of these T-Shirt size durations we define in the configuration for our deployed service, so that we can tune the durations over time without recompiling the code. For example, 'Short' for us is currently 5mins, 'VeryLong' is 12hours, 'VeryShort' is 60 seconds. Reletively speaking, these are long durations, because the API is managing 'freshness' itself.
  • Then we decide if the GET response (representation) will be unique depending on which user is calling. You won't know this upfront until you have implemented the API itself. But some API's verify the caller, or produce different responses for different users. Some produce the same representation of a resource for all users.
  • For each POST/PUT/DELETE verb in the same resource, we decide what possible side effects that POST/PUT/DELETE would have on each of the cached GET representations. If a cached GET representation is likely to be impacted, we will wipe out all the GET representations that are cached upon the request for the POST/PUT/DELETE. Remembering that some POSTS can create new resources that would be included in the cached results of some of the GET verbs (i.e. search results).

The way we implement this caching policy is using declarative attributes on each of the service operations of each of the REST services we have.

We have a simple configuration for the caching T-Shirt sizes. And we have a base class (for all our services) that has a method for fetching a cached response (from a ICacheClient) and generating new responses by calling a delegate which includes all the code to calculate a new response, which is then cached. We also calculate an ETag for every cached response (MD5 digest), and we save that ETag alongside our cached response. Our ICacheClient in production is Redis, and we calculate a cache key based off the actual request (URI (PathInfo), + QueryString + the calling user's ID).

Also, code that returns the cached response runs long after the the caller has been identified and authorized.

Here is the code for a typical REST service in our architecture. We like cars:

internal partial class Cars : ServiceBase, ICars
{
public ICarsManager CarsManager { get; set; }

    // Route: /cars/{Id}
    [CacheResponse(CacheExpirationDuration.Short, CacheRepresentation.PerUser)]
    [HttpClientShouldCacheResponse(CacheExpirationDuration.Short)]
    public object Get(GetCar body)
    {
        return ProcessCachedRequest(body, HttpStatusCode.OK, () =>
        {
            var response = this.CarsManager.GetCar(this.Request, body);

            return response;
        });
    }

    // Route: /cars/search
    [CacheResponse(CacheExpirationDuration.Short, CacheRepresentation.AllUsers)]
    [HttpClientShouldCacheResponse(CacheExpirationDuration.Short)]
    public object Get(SearchCars body)
    {
        return ProcessCachedRequest(body, HttpStatusCode.OK, () =>
        {
            var response = this.CarsManager.SearchCars(this.Request, body);

            return response;
        });
    }

    // Route: /cars
    [CacheResetRelatedResponses("/cars/search")]
    public CreateCarResponse Post(CreateCar body)
    {
        return ProcessRequest(body, HttpStatusCode.Created, () =>
        {
            var response = this.CarsManager.CreateCar(this.Request, body);
            this.SetLocationHeader(GetCreateCarResponseId(response));

            return response;
        });
    }

    // Route: /cars/{Id}
    [CacheResetRelatedResponses("/cars/{Id}")]
    [CacheResetRelatedResponses("/cars/search")]
    [CacheResetRelatedResponses("/cars")]
    public UpdateCarResponse Put(UpdateCar body)
    {
        return ProcessRequest(body, HttpStatusCode.Accepted, () =>
        {
            var response = this.CarsManager.UpdateCar(this.Request, body);

            return response;
        });
    }

    // Route: /cars/{Id}
    [CacheResetRelatedResponses("/cars/{Id}")]
    [CacheResetRelatedResponses("/cars/search")]
    [CacheResetRelatedResponses("/cars")]
    public DeleteCarResponse Delete(DeleteCar body)
    {
        return ProcessRequest(body, HttpStatusCode.Accepted, () =>
        {
            var response = this.CarsManager.DeleteCar(this.Request, body);

            return response;
        });
    }


    ... Other service operations
}

 

A few things to observe here:

  1. Our service derives from our base class 'ServiceBase' which includes the methods: 'ProcessRequest' and 'ProcessCachedRequest'. Which we will talk about soon.
  2. The injected ICarsManager is just our way, in our architecture of delegating the actual response creation to another layer that has the actual code that applies various business rules and constraints and fetches data from the data store, etc. Your implementation would be different.
  3. The GET service operations that will have cached responses return `object` rather than their associated `IReturn<Dto>`. That's because the 'ProcessCachedRequest()' method uses the `ToOptmizedResultFromCache()` method in ServiceStack which returns a `CompressedResult` not the DTO result.
  4. The two GET operations are attributed with the `[CacheResponse(CacheExpirationDuration.Short, CacheRepresentation.PerUser ]` declare the T-Shirt size expiration, and response representation (i.e. whether the response is specific to the calling user, or is the same for all users). As you can see, the 'GET /cars/{Id}' response will return data that is unique to each user who is calling, but the 'GET /cars/search' response will be the same for whomever is calling. This is hugely significant part of any caching strategy.
  5. The two GET operations are also attributed with the `[HttpClientShouldCacheResponse(CacheExpirationDuration.Short)] ` which is our lead-into HTTP caching, which we will talk about later. But for now, all you need to know is that these attributes will yield HTTP response caching headers that 'advise' the HTTP client on how to cache and whether or not this response is cacheable. Without this attribute, no HTTP caching headers are generated for this response, and the default on the internet is not to cache a response, especially if the response is HTTPS. The T-Shirt sizes durations have vastly different value ranges than the ones used for Service Caching. For example, 'Short' here is 5 secs, 'VeryLong' is an hour, and 'VeryShort' is 1sec at present. That is because, we want the HTTP clients to be validating on a different frequency than the actual caching of responses on the service. 
  6. The POST operation is attributed with a `[CacheResetRelatedResponses("/cars/search")]` attribute. The first argument is a GET route (somewhere else in the API) which if previously cached, will be wiped out before this response is recalculated. In the case of the 'cars' API, we know that if a new car resource is created in this POST operation, then the 'GET /cars/search' operation response should be re-calculated, since it could now include the new POST'ed resource.
  7. The PUT and DELETE operations are attributed with a few `[CacheResetRelatedResponses("/cars/search")] ` attributes, because they either update or delete a specific resource, and so depending on how many GET API's your resource has, and whether an update affects their data, will determine how many of these attributes are present, and what routes they have. In this case, we only had two cached GET operations. What is interesting here is that the route of the `[CacheResetRelatedResponses("/cars/{Id}")] ` attribute includes a substitution '{Id}' that must be made at runtime so that the specified cached result can be wiped from the cache only. This attribute, in fact uses the current PUT/DELETE request to make that substitution based on its DTO. So that only the current 'car' is wiped from the cache, rather than all 'cars'. This means that your REST API must be designed so that the GET/PUT/POST/DELETE verbs will have consistent and harmonious routes (as REST recommends anyway).
  8. Out of interest, we have over 50+ of these kinds of services in our architecture (at present), each with on average 5-10 different verbs. You may have noticed that this code looks very regular as it is, since we have a pattern toolkit generate it for us from just a simple visual DSL based on a few properties for each verb. This utterly avoids manual human error when you are building lots of these things, and keeps everything absolutely consistent.
REST Toolkit, showing the metadata required to generate a ServiceStack service interface that includes service caching.

REST Toolkit, showing the metadata required to generate a ServiceStack service interface that includes service caching.

So what do these `[CacheResponse]` and `[CacheResetRelatedResponses]` attribute do? and what about that `ProcessCachedRequest()` method? and any other pieces we need here?

Let's talk attributes and the declarative.

The `[CacheResponse]` attribute is simply a RequestFilterAttribute, that you place on a GET service operation:

[AttributeUsage(AttributeTargets.Method, Inherited = false)]
public class CacheResponseAttribute : RequestFilterAttribute
{
internal const int FilterPriority = RequireRolesAttribute.FilterPriority + 1;
private CacheExpirationDuration expiresIn;

    protected CacheResponseAttribute()
        : this(CacheExpirationDuration.Short)
    {
    }

    public CacheResponseAttribute(CacheRepresentation representation = CacheRepresentation.PerRequest)
        : this(DefaultDuration, representation)
    {
    }

    public CacheResponseAttribute(CacheExpirationDuration expiresIn,
        CacheRepresentation representation = CacheRepresentation.PerRequest)
        : base(ApplyTo.Get)
    {
        this.expiresIn = expiresIn;
        Representation = representation;
        Priority = FilterPriority;
    }

    public CacheRepresentation Representation { get; private set; }

    public TimeSpan ExpiresIn
    {
        get { return GetExpiresInFromConfiguration(Configuration, this.expiresIn); }
    }

    public override void Execute(IRequest req, IResponse res, object requestDto)
    {
        req.Items.Set(RequestItemKey, new CacheInfo
        {
            ExpiresIn = ExpiresIn,
            Representation = Representation
        });
    }
}

and all it does is calculate the actual duration (in secs) from the configured `CacheExpirationDuration` using values in our configuration file (omitted). Then it saves the `CacheExpirationDuration` and the `CacheRepresentation` in a structure that is put into the `IRequest.Items` collection for use later. That's it.

Note: This attribute must run after any Authorization filters you may have in your service (i.e. Priority - see https://github.com/ServiceStack/ServiceStack/wiki/Order-of-Operations for guidance) . Forget that and you could end up serving cached representations for one user to another user, or worse anybody anonymous who calls!!! Definitely a mistake to avoid for caching rookies!

The `[CacheResetRelatedResponses]` attribute is simply another RequestFilterAttribute, that you place on a POST/PUT/DELETE service operations (you can have many):  

[AttributeUsage(AttributeTargets.Method, Inherited = false, AllowMultiple = true)]
public class CacheResetRelatedResponsesAttribute : RequestFilterAttribute
{
internal const int FilterPriority = RequireRolesAttribute.FilterPriority + 1;

    protected CacheResetRelatedResponsesAttribute()
        : base(ApplyTo.Put | ApplyTo.Delete | ApplyTo.Post)
    {
        Priority = FilterPriority;
    }

    public CacheResetRelatedResponsesAttribute(string routePattern)
        : this()
    {
        Guard.AgainstNullOrEmpty(() => routePattern, routePattern);

        RoutePattern = routePattern;
    }

    public string RoutePattern { get; set; }

    public override void Execute(IRequest req, IResponse res, object requestDto)
    {
        if (req.Items.ContainsKey(RequestItemKey))
        {
            var existingInfo = (CacheResetInfo)req.Items[RequestItemKey];
            if (!existingInfo.RoutePatterns.Contains(RoutePattern))
            {
                existingInfo.RoutePatterns.Add(RoutePattern);
            }
        }
        else
        {
            req.Items.Add(RequestItemKey, new CacheResetInfo
            {
                RoutePatterns = new List<string> { RoutePattern }
            });
        }
    }
}

It simply takes the declared routes and saves them to the `IRequest.Items` collection for the service operation to intercept later in the request pipeline.

OK, so both these filters run on a service operation, and they both pass declarative information into the current request pipeline. At some time shortly after they run, the service operation itself is executed and the `ServiceBase.ProcessCachedRequest()` method is run, which will read that information and act upon it.

Here's what essentially happens (for cached GET responses only):

    public ICurrentCaller Caller { get; set; }

    public IServiceInterfaceCacheClient ServiceInterfaceCache { get; set; }

    protected object ProcessCachedRequest(object request, HttpStatusCode code, Func<object> action)
    {
           var cacheInfo = request.Items.GetValueOrDefault(CacheResponseAttribute.RequestItemKey) as CacheInfo ??
               CacheResponseAttribute.GetDefaultCacheInfo(Configuration);

            var response = ServiceInterfaceCache.GetCachedResponse(Request, Response, Caller, cacheInfo.Representation, cacheInfo.ExpiresIn, () => action());

            var cacheResetInfo = request.Items.GetValueOrDefault(CacheResetRelatedResponsesAttribute.RequestItemKey) as CacheResetInfo;
            if (cacheResetInfo == null)
            {
                return;
            }

            cacheResetInfo.RoutePatterns
                .ForEach(pattern =>
                {
                    ServiceInterfaceCache.RemoveAllRelated(Request, pattern);
                });

            SetResponseCode(code);

            return response;
    }

The `ICurrentCaller` implementation obtains the ID of the calling user. In our case we get that from the OAuth 'access_token' that came in the request. Your implementation may be different.

  • The `IServiceInterfaceCache.GetCachedResponse` method essentially uses the `ToOptimizedResultUsingCache()` method already built in ServiceStack, but also calculates and caches an 'ETag' value for the response, as well as caching the response (if none is already cached). It also then generates the following response headers: Cache-Control: no-cache, max-age=<duration>', 'Expires: <now+duration>', 'ETag: <etag>'.
  • The `IServiceInterfaceCache.RemoveAllRelated()` method essentially goes through all the configured routes, makes any substitutions from data in the current request DTO, and wipes out any cached values matching them.

Note: it is funny that the 'Cache-Control' header includes the keyword 'no-cache'. Many developers first think that this keyword means to the client "DO NOT CACHE THIS RESPONSE', but actually it means "Please do not rely on the value in your cache being fresh, and please, please, please re-validate it first before using it".

It is also worth noting that, we calculate cache keys in this format: "{IRequest.PathInfo}?{IRequest.QueryString}.{UserId}". We add the User ID on the end of the cache key, obviously so we can cache responses for different users if specified by the GET service operation, but also so that we can use a '*' wildcard for when it comes time to wipe out all cached entries for particular request. In essence, we want to wipe out each and every 'PerUser' cached response for the same URI, so basically we wipe all all entries with this cache key: "{IRequest.PathInfo}?{IRequest.QueryString}". Redis, for example, has terrible performance penalties for doing searches on cache keys, so we have to have a simple cleanup strategy, using will cards.

The last piece of Service Caching puzzle that we need to mention is the piece that enables HTTP Validation to work at all. This is a bridge to HTTP Caching, and strictly speaking is part of the HTTP Caching policy, but we are adding it here for now. 

HTTP Validation requires that the client sends a 'If-None-Match' (or 'If-Modified-Since') header in a GET request when it wants to validate that a cached response it has (in its cache), has not changed yet. To avoid the needless download of a fresh version of the cached response. The theory is that this validation check is faster and lighter than a full request.

So, the client would call one of our GET service operations, and include in the GET request the 'If-None-Match: <ETag>' header, with the 'ETag' that it would have gotten back in the responses headers of the GET previously.

So without getting into how the client does this yet (read later in HTTP Caching), we need a way that the service can quickly and efficiently check the Etag of a cached response it has with the ETag in a 'If-None-Match' presented by a client, and if they match, then send back a '304 - Not Modified'.

This is straightforward because not only does each service operation cache the response, but it also caches the ETag, and Expiration information along with it. This enables us to create a simple decoupled GlobalRequestFilter that looks out for the 'If-None-Match' header, and when one comes in, it then looks in the service cache for the request. If it finds a cached response, and the ETag matches, it responds with '304 - Not Modified'.

If this filter runs before any service operation, we have a very efficient, and decoupled mechanism to do HTTP Validation, that is cheap in terms of compute and network latency.

This GlobalFilter is configured in the AppHost.Configure() of all our services:

public static Action<IRequest, IResponse, object> HandleHttpExpirationByETagFilter()
    {
        return (request, response, dto) =>
        {
            if (request.Verb != HttpMethods.Get)
            {
                return;
            }

            var eTagFromRequest = request.Headers[HttpHeaders.IfNoneMatch];
            if (eTagFromRequest == null)
            {
                return;
            }

            var configuration = request.TryResolve<IConfigurationSettings>();
            var serviceCache = request.TryResolve<IServiceInterfaceCacheClient>();
            var caller = request.TryResolve<ICurrentCaller>();

            if (!eTagFromRequest.HasValue())
            {
                return;
            }

            if (eTagFromRequest.EqualsIgnoreCase(CachingConstants.PurgeETag))
            {
                PurgeCache(request, serviceCache);
                return;
            }

            var eTagFromCache = GetETagFromCache(request, caller, serviceCache);
            if (eTagFromCache == null)
            {
                return;
            }

            if (!IsETagMatch(eTagFromCache, eTagFromRequest))
            {
                return;
            }

            response.AddCachingExpirationHeaders(GetConfiguredHttpCacheDuration(configuration), eTagFromCache);
            response.StatusCode = (int)HttpStatusCode.NotModified;
            response.EndRequestWithNoContent();
        };
    }

It is a simple filter, but it must run before any other request filter, and it must be quick, to make HTTP Validation worthwhile.

  • Notice that, when it sends a '304 - Not Modified' it also sets the following cache headers: Cache-Control: no-cache, max-age=<duration>', 'Expires: <now+duration>', 'ETag: <etag>'. Just so the client can update its cache again.
  • It must not return any content with a '304' response.
  • Because this filter runs before all other filters, if the Etags don't match (typically because the server response has been updated since the client last cached it), then the service operation will run shortly after this check and send back the freshest response, which the client can then cache and use.

Note: There is an interesting caveat here. It turns out that all the MS HTTP clients including `HttpWebRequest` and `HttpClient` all throw an exception when they see a '304 - Not Modified' in the response stream! So MS clients will never have access to the additional caching information that this filter (and others like it) return, even though it should be there according to the HTTP 1.1. spec. I guess Microsoft had their reasons for throwing instead of returning a '304', but it is somewhat inconvenient for client caches on the MS platform not to have that updated caching information. 

OK, that is it for Service Caching. It amounts to: (1) a set of declarative RequestFilter attributes for each service operation you want to cache and reset, (2) a base class to do the actual caching (using ICacheClient) and setting HTTP Expiration headers, and (3) a global RequestFilter to handle HTTP Validation with 'If-None-Match' and ETags.

The salient points to remember are:

  • Service caching is designed to avoid re-calculating responses that can be re-used over again until they either expire, or are updated through their own API. In fact, because for many resource API's  they are self managing, theoretically you can cache them for very long periods of time (i.e. hours).
  • Your Service Caching strategy provides the mechanisms for HTTP Caching, by providing caching 'advice' in the HTTP Expiration headers from cached responses, and by handling HTTP Validation of cached responses in client caches.

Next, we can move to Client side Caching with ServiceStack and how we address that.

Client Caching (with ServiceStack)

In order to complete the HTTP Caching picture your clients need to have a cache, and they need to comply with the HTTP caching 'advice' provided to it by your service (in the form of HTTP response headers) per response.

In an architecture like ours, just about every REST service communicates with other REST services, and therefore they all have the opportunity to cache responses from those other services. It's beautiful when its all working end to end.

At the very front end of our architecture is a browser or mobile app that is also a client, and must also have a cache. Browsers already do the right thing, but our mobile app needs a cache of its own.

We make extensive use of the JsonServiceClient and the other typed clients in ServiceStack, so the job for us became how to add client caching to them.

We applied the following policy for Client Caching for each of our clients:

  • When a response is request by the client, ask for it from the client cache first.
  • If the response is not cached (i.e. it never was, or it expired) then make a call for a new response from the origin service.
  • When any response comes back, (assuming a 2XX status code), then cache the response, and cache any caching 'advice' that comes with it. Expect headers like: 'Cache-Control' or 'Expires', and 'ETag' or 'Last-Modified', and as long as we either get 'Cache-Control' or 'Expires' and not 'Cache-Control: no-store' we will remember the  'max-age' or 'expires' and the ETag along with the response.
  • If a response is found in the cache, AND it has an ETag, then make a HTTP validation call to the origin service, and set the 'If-None-Match' header with the ETag in the GET request.
  • If the 'If-None-Match' validation check [throws] a '304 - Not Modified' (exception), we update the expiry of the originally cached response (and renew the caching advice) and then return the cached response.
  • If the validation check returns a 2XX status, with some data, then we cache that new response, and any caching advice included, and return the newly cached response.

The way we implemented this client cache was by extending our own JsonServiceClient and adding a 'ClientCache' property. If the 'ClientCache' property is set, we would use the cache, otherwise we don't.

[BTW: We already implement our own JsonServiceClient because in our architecture because all of our services are protected by OAuth, so we decided to extend the JsonServiceClient to manage the retrieval and renewal of OAuth 'access-tokens' for us. Adding client caching was just another convenience.]

ServiceStack's JsonServiceClient already has some hooks that are needed to implement client caching. They are the `ResultsFilter` and the `ResultsResponseFilter` delegates that are called just before a request is made, and just after a response if returned, respectively. By hooking these delegates, you get the chance to serve cached responses from a cache, and store responses in your cache, as described here: https://github.com/ServiceStack/ServiceStack/wiki/C%23-client#custom-client-caching-strategy

Using another `ICacheClient` instance under the covers, we implemented a `IServiceClientCache` class that does what is described in the policy above, and set it to the JsonServiceClient instance.

 

This is our extended JsonServiceClient:

public class JsonServiceClient : ServiceStack.JsonServiceClient, IJsonServiceClient
{
public JsonServiceClient(string baseUrl)
: base(baseUrl)
{
ResultsFilter = FetchResponseFromClientCache;
ResultsFilterResponse = CacheResponseInClientCache;
}

    public IServiceClientCache ClientCache { get; set; }

    public TResponse Purge<TResponse>(IReturn<TResponse> requestDto)
    {
        return Purge<TResponse>((object)requestDto);
    }

    public TResponse Purge<TResponse>(object requestDto)
    {
        Action<HttpWebRequest> filter = request =>
        {
            request.Headers.Add(HttpHeaders.IfNoneMatch, CachingConstants.PurgeETag);
        };

        try
        {
            RequestFilters.Add(filter);
            return Get<TResponse>(requestDto);
        }
        finally
        {
            RequestFilters.Remove(filter);
        }
    }


    ... other custom JsonServiceClient stuff

    private void CacheResponseInClientCache(WebResponse webResponse, object response, string httpMethod,
        string requestUri, object request)
    {
        if (ClientCache == null)
        {
            return;
        }

        ClientCache.CacheResponse(webResponse, response, request, httpMethod, requestUri);
    }

    private object FetchResponseFromClientCache(Type responseType, string httpMethod, string requestUri,
        object request)
    {
        if (ClientCache == null)
        {
            return null;
        }

        var validationClient = new JsonServiceClient(BaseUri)
        {
            RequestFilter = RequestFilter,
            RequestFilters = RequestFilters,
            CookieContainer = CookieContainer,
            ResultsFilter = null,
            ResultsFilterResponse = null,
            ClientCache = null,
        };
        Headers.ToDictionary().ForEach((name, value) =>
        {
            validationClient.Headers.Add(name, value);
        });

        return ClientCache.GetCachedResponse(responseType, request, httpMethod, requestUri, validationClient);
    }
}

 

A few things to observe here:

  1. In the constructor we register the delegates `ResultsFilter` and `ResultsFilterResponse`
  2. The `ClientCache` property is injectable and optional, as can be seen in the delegates themselves
  3. The 'Purge' methods, we will discuss later.
  4. The 'FetchResponseFromClientCache' method (called when the JsonServiceClient wants to make a request across the wire) actually creates a new instance of a JsonServiceClient (validationClient), and clones the headers, any filters and CookiesContainer from the current instance, then passes that validationClient into the 'IServiceClientCache.GetCachedResponse()' method. This is done, because if the response is found in the client cache, and it has an ETag, then we need to call across the wire to do our HTTP validation check with the 'If-None-Match'. So we need to use the validationClient instance to do that, which means it better have all the context (headers, filters, cookies etc) that the calling JsonServiceClient had (for example, any Authorization or CSRF headers included in the request).
  5. The 'IServiceClientCache.GetCachedResponse()' method essentially looks in the cache for a cached response, and if not found, return null. If found in the cache, and has an ETag, then it makes the validation call, and if it gets '304 - Not Modified' it returns the cached response. Otherwise, it returns null, which signals to the `ResultsFilter` delegate that the client must go across the wire to get a fresh response.
  6. 'CacheResponseInClientCache()' simply stores the response in the cache, using the Request.AbsoluteURI as the cache key.

This is what `IServiceClientCache.GetCachedResponse()` looks like:

public object GetCachedResponse(Type responseType, object request, string httpMethod, string requestUri,
IJsonServiceClient client)
{
if (httpMethod.NotEqualsIgnoreCase(HttpMethods.Get))
{
return null;
}

        var cachedResponse = Storage.Get(requestUri, responseType);
        if (cachedResponse == null)
        {
            return null;
        }

        if (IsWithinRevalidateBuffer(cachedResponse.LastValidated))
        {
            return cachedResponse.Response;
        }

        if (!cachedResponse.ETag.HasValue())
        {
            return cachedResponse.Response;
        }

        try
        {
            client.Headers.Add(HttpHeaders.IfNoneMatch, cachedResponse.ETag);
            client.Get<HttpWebResponse>(requestUri);
        }
        catch (WebServiceException ex)
        {
            if (ex.StatusCode == (int)HttpStatusCode.NotModified)
            {
                //TODO: If .NET did not throw exception for 304, we would update the cached expiry with the returned headers
                Storage.Store(requestUri, cachedResponse.Response, cachedResponse.ExpiresIn, cachedResponse.ETag,
                    GetTimeNow());
            }

            return cachedResponse.Response;
        }
        finally
        {
            client.Headers.Remove(HttpHeaders.IfNoneMatch);
        }

        // TODO: Since we cannot return the typed response (of the HttpWebResponse) to the caller, we instead return null, and delete cachedresponse
        Storage.Delete(requestUri);

        return null;
    }

 

A couple observations:

  1. The 'Storage.Get()' method returns a structure with the response and the expiration and validation 'advice' that was stored with the response, as well as a timestamp of the last time it was re-validated.
  2. The `IsWithinRevalidateBuffer()` check is done, because we found in practice that in some cases particularly when aggregating data, our client code was making repeated calls for the same resource very close together in time. This buffer (currently about 3 seconds) prevents us making a re-validation across the wire so close together for the same resource. It is simply an optimization that in practice saves a bunch of 'If-None-Match' checks across the wire in a time-frame where we think its unlikely the response is likely to change.
  3. Unfortunately, the MS .NET clients that JsonServiceClient is built upon throws an exception when it gets a '304 - Not modified'. Thats a shame. We have to deal with it though, in a non-ideal way.
  4. Unfortunately, with ServiceStack, if you want to read the HTTP response headers you have to do a client.Get<HttpWebResponse>() , which forces you to de-serialise the response yourself, which is a minefield we didn't want to cross. We avoided it. (would be nice to have a static method on JsonServiceClient to do this for us). Instead, we decided to return null, triggering the calling client to make the GET request again to fetch the fresh response. Not ideal, and something we could and probably should improve on, to save one other call across the wire.

This is what `IServiceClientCache.CacheResponse ()` looks like:

    public void CacheResponse(WebResponse webResponse, object response, object request, string httpMethod,
        string requestUri)
    {
        if (httpMethod.NotEqualsIgnoreCase(HttpMethods.Get))
        {
            return;
        }

        var rules = CachingAdvisor.GetCachingAdvice(webResponse);
        if (rules.ExpiresIn == 0)
        {
            return;
        }

        Storage.Store(requestUri, response, rules.ExpiresIn, rules.ETag, DateTime.MinValue);
    }

Nice and simple, just cache the response for next time.

The `CachingAdvisor.GetCachingAdvice()` method simply processes the possible HTTP response caching headers (i.e. Cache-Control, Expires, ETag, etc.) and we then save them along with the response, and deal with when we are asked for the response from the cache later.

And that's about it for client caching.

Making It Simpler

Now as it turns out, while we were finalising the client caching side of this pattern, and after much conference on the ServiceStack forums (https://forums.servicestack.net/t/jsonserviceclient-http-client-cache/2115/58) on how to integrate it with ServiceStack, Demis (ServiceStack creator) was also taking client side caching in ServiceStack to the next level, in the pre-release version of ServiceStack v.4.0.55. Which just happened to also contain a bug-fix that we needed to get our client caching working. You could call that serendipitous too! Good on ya Demis!

So getting some of the benefits you have seen above in ServiceStack now, is a lot simpler now than it was then. Demis has done a bunch of work to add HTTP Expiration and HTTP Validation headers in your services a snap,. And has provided a server side filter to handle HTTP Validation. AND, he has created a client side cache for the JsonServiceClient that complies with those headers.

That should be enough to get many started with caching in their ServiceStack services and clients for now.

I hope this article provides some further guidance on how to go to the next level with caching in your services, especially if you want to support declarative service caching, and 'PerUser' representations, and managing expiration of service caches separately from expiration and validation of client side caches.

16 Jobs

Gosh! it's been too long, and what I mean is that I haven't seen a green light on our CI build server for months. Let's get technical.

Now green lights are very important to our prod-dev process, and to our sanity, and our ability to adapt fast to our customers. And for the past few months (I am too embarrassed to say how many) we have been getting our fix of green lights only in our personal development environments, and even then only in fits and starts. (that's pretty embarrassing enough given my past roles in this game)

You see, we currently have over 3000 unit tests and over 1500 integration/e2e tests that are automated across our back-end and front-end architecture. Yes, we code test-first, and yes we invest a lot of time and effort testing our stuff at various layers in the code base. We have unit tests at every layer, and layer upon layer of integration tests, even in the angular javascript, and WebAPI, and we cover all those tests with integration tests that call our API's and UI tests that call the whole stack end to end. If you are interested, we also stub out all our 3rd party services (i.e. googlemaps, paystation, runthered, sendgrid, etc. etc.) in testing, and we even have numerous tests that call those live services to make sure they still respond in the way we expect. (never assume they don't change - they do, and without warning!).

It goes without saying that automated testing of all these parts comes together for the most important reason - which is to give us the 'confidence' to continually refine the architecture, evolve the features and refactor everything we create - continuously. We practice the boy scout rule daily, and maintainability at our age and stage is paramount for a solid future. Go slow to go fast!

So, what about CI? We love it, we live by it, and we had been depending on it up until about 6 months ago, when our online CI service hit a limitation.  We use a CI service called AppVeyor, and we love them. They are a fabulous little outfit in Canada, and the support and service is fabulous. I am not kidding. The guy who supports every project is (called Feodor) is an absolute legend. Day or night the support is amazing. Their service just works, and for our architecture and tec choices it is just perfect. Their service even allows us to auto-deploy our azure cloud solution after it hits green lights, and it deals with complex things like: encrypted config, git submodules, private repos and a bunch of other stuff that is just hard to do right in CI. It is not without its limitations and the one that has pulled us up months ago has been the limitation of build and test time.

You see, although 3000 unit tests build and run in about 30 secs, our 1500 odd Integration and e2e UI tests take about 6hrs on the current hosted hardware!! And the number of those kinds of test is increasing every day still! (BTW: it used to be 2hrs about 6 months ago!)

Now, I don't know about you, but I don't look forward the the end of a great tech-debt refactoring episode or a new feature dev, to have to wait for hours to get a green light on my latest changes, before I push them. In fact, I have such a strong incentive to ensure I have a green light that I will wait with my head in my hands watching progress bars for about 40mins after which time - it is just time to move on man! And moving on is exactly where we all this discipline comes undone. I won't explain all the how's here, but let's say, once you reluctantly move on there is a tendency to stack red lights upon red lights, and at some point you've lost a lot of confidence in what you have, and you have gained a lot of dread and overwhelmingness about what is coming next - and that is the real killer in this game.

Initially, our build/test time limitation was 40mins on a free plan, and we started exceeding that time limit within the first few weeks of development. We then bought a paid plan, and the great guys over at Appveyor had the good grace to extend that first to 1 hour, and then later to 2 hrs where it stands today. "No can do more!" - and fair enough. They have been very generous to us to this point. That just forced us to optimize our testing patterns, and identify some testing bottle-necks and even forced some optimizations in our backend API's. But shortly after gaining back 30mins or so, we shot past the 2hr limit again, and haven't' seen green since then on our CI build. We have so many long-running tests that the service simply times-out and fails the whole process. And that has left us with the only way to see green lights, by running the tests on our more powerful local machines, which execute them all in half the time, for whatever reason. (I suspect 16GB RAM , 3.5GHz CPU and SSD disks has a small part to play in that). But that is not CI, and even then  only the tip of the iceberg why CI is so powerful for any dev team.

So for the last few months, the CI build has become more and more irrelevant to our daily cadence, and slowly but surely test-rot has been setting in. Now, that is not to say we haven't been dealing to it, we absolutely have when we find it. The problem has been that to find it you need to find a few hours in the day to run all the tests, and then a few hours after that to fix them. We just don't have that most days, so it has become normal practice to run all the long running tests overnight (or when you leave the office for a meeting or whatever), and fix the broken tests in the morning. Leaving the early to late afternoon for new creation/refactoring. And boy it has taken some months of soldiering through that before I finally had enough. Had lost my self-worth and decided it was far past time to pay back this technical debt.

So I spoke to Feodor at AppVeyor. Actually, I pleaded with Feador at AppVeyor to put me right, and help us find a solution. The goal being to get the CI server to get us a green light within about 30mins. A green light any longer than than 30mins and it becomes a long feedback loop, and that leads to irrelevancy, and loss of sanity again. (You can rightly argue that 30mins is also too long, but for us it's a great starting point to get back in the saddle.)

Now, prior to this little story, I had been researching all kinds of resolutions for our testing problem, for months. I researched testing grids, parallel testing, even new versions or repurposing resharper and other tools we have or know about. But most of these tools are geared to unit testing, not long running test like e2e tests. There was selenium grid that looked like it would take care of parallel testing for our UI tests, until you realise you have to find the tin to support it, and that's the whole value proposition of cloud-based CI right there. (and it's not just having the tin in the cloud, it goes way past that in having an extensible automated platform and community to support it). Surprisingly, in this day and age, there is not much out there tailored for our needs, and it seems like not too many people either practice this skill, nor, heck, even understand the difference between unit testing and integration testing, let alone how to scale it up!!

So when I was confronted with the only sustainable option of applying more CPU power to the problem, needless to say, I was reluctant to spend, spend, spend on cloud CPU power, having already pissed away hundreds of dollars a month on it for little value.

So here is the deal. Unless you can make your tests run faster (i.e. optimize the code or testing patterns), or you stop writing so many tests (I love that many devs out there on the public forums actually recommend that as a viable solution!) you have one of two choices: You either apply more CPU speed and RAM to the problem, or you apply more CPU's of the same power to them. The reality is (as Feodor soberly administered to me) "If you have 100 tests and each test takes 1 minute of CPU time then it's either 100 minutes on 1 CPU or 1 minute on 100 CPUs". (you can't pay for that kind of wisdom these days - credit Feodor for waking me up).

So, we had to do the math. (It helps my business partner Andrew is an accountant) We figured every hour of every day that we sit and wait for tests cost us about, let say $100 (we are a startup). Then when we discover a broken test we spend about another hour fixing it. This is an hour or two hours later, and of course assumes nothing has been done during that wait.  If any of those broken tests compound (which is what tends to happen if you do something else while you wait of course), the fixes compound and eventually those compounding fixes delay discovering new broken tests. Now, If we are pushing multiple times a day (as we should), we are doing a lot of waiting for red lights, fixing in-haste and scrambling around to verify the fix again, for no good reason. Believe me it drives me crazy. (Of course, what really drives me crazy is that I know all this implicitly, heck I've been coaching dev teams for years about the costs and pitfalls of doing this! forgive my hypocrisy)   So, an extra 'job' (that is what we call an extra CPU running in parallel) costs us about, say $50/month (just for this discussion, the reality is actually cheaper). So, for every hour of every day that we are waiting for tests to give us a red light, we can afford to spend that money on a new parallel job, as long as that extra job reduces the overall time to complete the whole set of tests (which needs to tell you the whole truth not just part of it).

So, you figure in our case, 1500 long-running tests takes 2 parallel jobs 6hrs to complete. We need then 16 jobs in parallel to gets that test run down to under 30mins. (4 jobs to get it to 3hrs, 8 jobs to get that to 1.5hrs, and 16 jobs to get it to 45mins.) In our case, with some test group refactoring we got it down a little further closer to the target 30mins. 16 jobs, wikid!

Before we were forced down this track, I was reluctant to even pay for 2 parallel jobs! Our monthly subscription bill to various cloud services and tools is already painfully high. I didn't want to exacerbate that any further. But now I couldn't be happier, because by taking the economic point of view, we are actually not only saving money each day, we are getting back our velocity and sanity, because we need that more than anything to maintain any level of sustainability in this process. We were insane for not doing this sooner, and the universe is better aligned because I deeply believe that we should be using the best tools we can afford. It feels actually a relief and pleasure to be spending good money on this.

Would you pay $800 for your CI service a month? You probably should. It's worth every penny.

We now got green lights galore, and a desktop 'siren of shame' (and the 'Shamebrero') to let us know pretty quick when we just broke the build, and to wallow in our evil foolish shortcuts.

And yesterday, we just went live! at www.roamride.co.nz.   

 

 

 

 

 

Our New Team

We are very excited to announce our new Design, Build and Validation team at Mindkin.

Here they are:   Rupert and Sasha have cafe-busted in as our User Researchers, Matt has sketched himself back in as our Interaction Designer, Chelsea has posted herself as our Social Media Mogul and Alex has refactored himself in as our Droid Builder.

Mindkin 2015

Over the next few months we will be focusing on gelling the team and skilling up on our product research and development processes and practices. 

We have chosen a bunch of super talented, and highly capable people, that we are very excited will propel us into a new chapter of the company and its products.

Now, we have a new physical workspace to build, a team to grow, and a product to get to market. Let's go.

Hello World...

Hey! hello world. we finally got our blog up and running. 

This blog is primarily about  Mindkin (the people), as we grow into the company we hope to be. You'll have to see the About Us page to know what that is all about. And this (blog) is where we want to talk about that journey the things and experiences we face, and to share that with others.

We just secured our first round of funding this week after our first crowdfunding campaign. We actually didn't reach our initial target of 390K, but we went back to those that did invest in that campaign, and asked them to stick with us, and they did! Actually, turns out they pledged more the second time around. I can't confirm the actual total number yet, because the money is not in the bank at this point, but we have a new plan to move forward and get to market and that just means we will be going to the next round of funding sooner than anticipated.

So the first part of that, the part I am most excited about, is that we get to build our product dev team again! We are hiring!

Now ideally, we would be hiring the best talent there is around, and aim to fulfil the company's long term vision with brilliantly skilled and capable people, but the reality in startup-land is that you often cannot afford to pay the best talent out there what they may expect in cash at an early stage. So we don't think we will be filling this team with highly experienced people who expect to make top dollar out in the marketplace for their experience and specifically honed specialities.

However, we absolutely can educate and grow the people we do want from scratch, (as we already did the first time around). It's part of the benefit of being a long time lean product development coach and educator I guess - although it takes a lot more effort of course.

Another part of building that team, that I want to fulfill this time around, in our team of 5-7, is to have a far greater proportion of female designers and engineers on the team. Why? because empathy and communication skills trumps technical wizardry in product development teams IMHO. Plus, all-male prod-dev teams (just saying) tend to exhibit rather more loutish behaviors that they sure wouldn't be proud to display in front of their female peers, let alone their  mums.

Quite frankly, perhaps more than half of our users are going to be female and male and so delighting them might well require an appropriate balance of genders on the team that is delighting them, I reckon.

Given all that above, we think that we are facing into squarely targeting mostly a team of graduates straight out of Uni and growing them into the stars we want moving and shaking the company in the future. That is not to exclude anyone more experienced, joining us - god I would love that too! but for some reason, the more experienced people in this marketplace that we have spoken to so far seem to assume that a startup like us can carry paying them the rockstar salaries they have been working up to in the prior corporate lives. Clearly, not an expectation we can live up to at this stage. The startup game is about creating something new and reaping the larger benefits further down the track.

Speaking of which, we've decided to start exploring our vision of a better way to reward and appraise performance in the company (to mitigate our poor experiences in the corporates we have worked in prior). Not models we want to proliferate ourselves. We  figure it is a little too early to enact our final plan around this (that could be best summarised a peer-commission scheme) because we are not yet large enough (in people) to make viable yet - perhaps when we reach 10 or more people. But we have decided, in the interim to issue stock options for deferred gain as the company grows. We have set aside at least 20% of the company stock in the employee options pool to issue stocks annually to all employees. And the sooner people jump on board the sooner those options will vest and can be exercised.

So we are all go now, and now it's all about finding the right people to join us and get this product in market! Were Hiring!