Auto-scaling, a primer

ozgurAutoscalingLeave a Comment

Servers exist to serve. Either they have actively been requested to work, or they sit idle and wait. Their idle time still costs money, though. A cloud application is said to be over-provisioned if there are too many server instances that sit idle, wasting money. When servers are struggling to keep up with demand that is higher than their combined capacity, the application is under-provisioned. Auto-scaling is the automated process of identifying when an application is either under- or over-provisioned, and trying to rectify the situation by acquiring or releasing machines. Before the cloud, this was a rather lengthy process, taking hours or days to finish. In a cloud environment, however, resources can be easily and quickly provisioned and auto-scaling is therefore one of the main selling points of the cloud.


Why should we care?

Over-provisioned applications waste money. Depending on application size, this can either be a little, or a lot. One of our customers see large daily fluctuations in computational power required: a factor 8 of difference between the high- and low-points. Before we helped them, their strategy was to simply size their deployment according to the highest peak (and some extra, just to make sure!) and just let the computers idle during low usage. Avoiding waste due to over-provisioning immediately impacts the bottom line: every cent saved on your cloud bill immediately translates into increased profits for you. The costs of under-provisioning is harder to measure, but that just makes them more interesting. If a product promotion goes viral, what is the cost of having new potential customers be greeted by a slow or malfunctioning web site? How many sales are lost if an online store takes to long to navigate? Studies show that slow services have large impact on reputation and potential earnings. Computer users feel frustrated with slow sites, spend less money at slow retailer sites, and downtime costs are substantial for business-critical services. Google also punishes slow-to-respond web sites in their page ranking algorithms.

How does auto-scaling work?

Essentially, auto-scaling works in two steps. First, it determines whether your current resource availability matches your resource demand. Second, it adjusts the availability accordingly. Say, for example, your daytime users cause a demand for 5 servers, but you only have 3 available. The auto-scaler should provision two more for you to meet demand. Likewise, when users leave for the night, the auto-scaler should shrink your deployment to maybe 1 or 2 servers to save money. To carry out its task, auto-scaling requires monitoring information. That monitoring information needs to be up to date and provide useful information about the status of your cloud application. A good metric to monitor is one that relates to a single layer in your application, and is not capped by the performance of your current deployment. For instance, recording how many users want to access your web site through a load balancer is good. Measuring CPU usage on individual servers, however, is bad. The latter is bad because in case of under-provisioning, it does not indicate how many additional resources are in demand, only that there is a problem. Simplistic auto-scaling works in a reactive fashion: resource availability is modified as a reaction to when a threshold value has been passed. A typical chain reaction would be to:

  • add X servers,
  • wait until they have booted completely, and
  • investigate if resource demand and availability match, repeating as needed.

This works reasonably well for applications that have low and slow usage fluctuations. To deal with larger and more rapid fluctuations, smarter approaches are needed. And how does the system calculate the correct value of X, anyway? Luckily, that’s why we are here. Stay tuned for future updates, as we dive into the fascinating world of automated cloud management!

Looking ahead

In upcoming posts in this series on auto-scaling, we will go into some depth about both general principles of automated cloud management, and specifics on how the Elastisys Cloud Platform works. We will show how to deploy it to City Cloud and have it manage your cloud applications there, helping your cloud application become more robust, high-performing, and cost-efficient. How are you currently dealing with fluctuations in demand? Let us know in the comments below!


About the authors of this post

elastisysThis has been a guest post by elastisys, a Swedish startup company that has spun off from the distributed systems research group at Umeå university. Elastisys turns academic research into products and services with the mission of helping companies improve the robustness, performance, and cost-efficiency of their cloud applications. You can find elastisys on various social media and its home on the web,


Lars is a software architect at elastisys who specializes in research and development of scalable systems. When not directly involved in software development, he gives lectures on advanced topics in distributed systems and cloud application analysis and design. Contacthim via email or visit his LinkedIn page.