Scale Your CI: by the minute!
Building and Testing high quality software is expensive. But why wait for testing?
Hourfleet is our multi-tenanted SAAS car-sharing platform. It is built from about 10 x web services/web apps and several worker processes, and the usual amount of shared libraries etc. At last count, we had over: 5500+ unit tests, 1800+ integration tests, and 800+ UI tests for a moderately sized cloud-deployed code base (in 3 github repos) after 3 years.
As expected, building takes a couple minutes and the unit tests complete in a few seconds, but the Integration and the Selenium UI tests take more than about 10hrs to complete!
Test-first design, and regression testing is fundamental to how we continue to sustain the building of high quality software products, and tests are our early warning system for regression issues that are common as the complexity of a system increases.
In the world of rapidly changing customer-driven software development, waiting 10hrs to know if your latest changes broke some part of the system somewhere, before you deploy, is a debilitating waste of time (and money) on a software product team. Who needs that needless kind of grief and waste in their lives? It also violates the 'release early and often principle' and is untenable and useless as a useful feedback cycle. No engineering team is going to wait that long before starting another fix, feature or refactoring.
Our CI provider is AppVeyor, we have been with them for years, and I can't recommend them highly enough. We love them. The service is excellent, the tooling is awesome, but more than that, the guys behind it are fantastic to deal with. The deal with AppVeyor, and traditionally with others CI providers like them, is that you pay monthly, and largely you pay for compute-time (concurrent jobs) and storage, over and above the tooling features they give you. It has been like that for most CI providers because the real cost ongoing to them as a provider is: support, development, and cloud compute and storage cost they utilize and manage on your behalf. Fair enough I reckon. It is totally worth it to utilize online services like this rather than rolling your own infrastructure - get with it people if you are not already.
OK, so for us, with our platform. If we were paying for just a single build 'job', it would take that job just over 10hrs to complete our build and testing - every time we push a change! 10hrs of raw compute time to get a green light to deploy. Not much we can do about that (except perhaps stop testing). What is worse is that in the future, the feedback loop only gets longer as the code base grows and we add more tests. We don't expect that amount of time to decrease ever. We can however optimize the total "elapsed time" (throughput) it takes to run all tests by using test parallelization. For test parallelization you will need concurrent testing jobs, and as you can see from the pricing (presently USD$50 per concurrent job), that cost is limited only by your budget! A few blog posts back, I was boasting that we were running 16 parallel jobs at a time, which we were very proud of back then. With 16 jobs, we were getting done in just under an hour. That was great then, but now we are lucky to get done in under 2-3hrs with 16 jobs!
To give you and idea, for us, with the 9000+ tests we have now, we need about 80 concurrent jobs to complete an entire test run in under 30mins. About 150 jobs to get it down to less than 15mins! Oh!, and don't do the math on that, please!
We know exactly how many jobs it takes for our code base from months of refining the test categories, learning the profiles of our tests, and maximizing the throughput of tests in each category, as well as optimizing the tooling and how we test. Bottom-line however, is we cant escape the compute-time it takes to actually run all the tests - we got 10+hrs! And run the tests we must, before we can confidently deploy.
OK, so how the hell are we going to keep going and scale our testing?
We either: tolerate waiting hours longer as more tests get written, or stop writing tests (I know you are thinking about it), or we throw an inordinate amount of money at our CI provider to parallelize the piss out of it. The cost of that spend might be hidden from you at your large enterprise, but for us as a small startup, it is not, and that kind of cost is out of the question. We have to 'penny pinch' to stay in business until we scale to enterprise margins! Paying USD$7500 dollars a month to run 150 jobs is all well and good to get a fast feedback loop if you can afford it. But what if you can't?
If you really think about it, and taking the economic point of view, you are paying all that money whether you run one build a month, or run 1000 builds a month. It does not represent to your actual usage very accurately, especially if you are cost-sensitive like most small businesses are. What we really want is not to pay by the month, but to pay by the minute used, and dial that spend, on-demand, up and down as much as we can afford at any one time. Given all the heroics and triumphs we hear about elastic cloud computing, surely we can have a bit of that for building and testing our own software?
Our product team builds on average, lets say, about 5-10 times a day, some days more, some days less. We may also deploy that green light, none or ten times a day. One thing is for sure, we need the CI service at our beck and call all day, everyday, but not necessarily all night, and all weekend perhaps. A small startup of our size could never afford such a gigantic fixed cost every month, but we still desperately need and deserve a fast feedback loop of about 15-20mins tops! and we certainly need our tests forever more!
Well until recently, you couldn't do much about optimizing the cost of your CI provider since that is likely managed, cost optimized and controlled by your CI provider. It's how they stay in business to be fair. CI services at scale is their game.
So what can we do about it? We are just a teeny tiny software company?
You can of course, roll your own online or on-premise CI solution (open source even), and manage your own infrastructure and create your own tooling and build pipeline, if you want. Sounds like fun, right? Personally, I have better things to do than become a full-time CI automation Administrator, and my customers and team needs me to focus elsewhere in creating more value. That ain't gonna work for us.
That is, until, AppVeyor released AppVeyor Enterprise! With AVE you can finally pay-by-the-minute, and only for what you use! You can scale parallelism as fast and as high as you like, paying only for your actual compute usage. As well as getting all the benefits that AppVeyor has in your build and deploy pipeline! And the best part is that you can also control when your service is available, and what it costs you. Bonus, you get the service at your own custom URL too! ours is of course https://ci.mindkin.co.nz !! You can utilize compute power on Azure, Amazon or Google, and really start to optimize your actual resource costs.
What does this mean for us now? We now host our own CI server on a single VM in Azure, which is cool because we can bring it up and down as we please! We have a dedicated build image optimized for our own build and testing environment. Most importantly, we can run as many parallel jobs as we can afford, since now we are only paying for actual compute used, rather than a months worth of idle time! On the flip-side, we do need to pay for and manage our own CI cloud (management/storage/network etc), and manage our own build images (i.e. OS and SDK updates). For us, that is not much overhead over and above what we already do with Azure in hosting our platform, and managing our tooling environments.
The outcome? For us, at the moment, we are running 100 parallel jobs!, with a capacity to run even more whenever we choose. (you'll need to talk to your cloud provider about increasing your quotas!) With AVE we get a build done in about 30mins right now, and with further test category optimization and a couple more parallel jobs we expect to be down to 15mins! And guess what? it costs us right now under USD$2.17/NZD$3.24 per build to do it! (see yourself, a simple pricing calculation for us on a D2_v3 or D2_v2 Promo VM for 10hrs compute)
Let that sink in for a bit. Every build (10hrs+ of compute) costs us theoretically about USD$2.50/NZD$3.50 (in a data center in Southeast Asia!) Conservatively, if we did 7 builds a day everyday this month, that's about USD$525/NZD$750 for the month, plus the fixed cost of a CI server VM (D2_v3 VM) USD$145/NZD$240/month for us. (Your costs with your own choices of your CI build cloud machines will certainly vary).
For us its a no-brainer! We get the full power of the mighty scalable AppVeyor build pipeline, under our own control, for about USD$670/NZD$1000/month.
(Actually we think it will be more like under USD$500/NZD$800/month in practice, given there will be variation in number of builds per day, and some down time of CI server most nights)
Most importantly, we get our testing feedback loop down to 15-20mins, and we can stay in the business of building high quality software.
Job thoroughly done! AppVeyor Enterprise.
To the team at AppVeyor, you rock!