Meeting Demands with Auto-Scaling

Karthik Viswanathan
Mar 11, 2022
4 min read

A robust software system must be able to meet its demands under all circumstances. Auto-scaling enables this in a number of sophisticated ways.

"The price of ability does not depend on merit but on supply and demand" - George Bernard Shaw

In this post, we discuss what Scaling is and how Automatic Scaling or Auto-Scaling improves how a Software service is delivered.

The Relationship Between Hardware and Software

Anyone who has studied the basics of Computer Science would understand that an efficient computer-based solution relies on the efficiency of both hardware and software. As a Software Developer, one should strive to write code that is efficient in the way that it uses hardware resources such as memory and storage. For example, using SAX to parse a large XML file is far more memory-efficient than using DOM. Using immutable objects is far more memory-efficient than using mutable ones. Performance optimisations such as these should be at the heart of Software Development.

However, the cost of hardware has gone down dramatically over the past decade, with storage and RAM available in abundance even on what is considered to be a medium-spec computer or laptop. It would then be tempting to write code without worrying too much about hardware resource utilisation, but a good programmer should always assume that resources are scarce while also taking advantage of available resources where absolutely necessary. For example, a multi-threaded application should make use of available processing cores if the application is running on dedicated hardware.

What is Scaling?

Scaling is the process of increasing or decreasing hardware resources in order to achieve the desired performance under varying load conditions.

Vertical Scaling

Though performance optimisations can be obtained by writing efficient code as explained earlier, it may be required to obtain a further boost in performance by increasing hardware resources. For example, increasing the amount of RAM can boost the overall performance of a memory-intensive application. Increasing processing power can boost the performance of a CPU-intensive application. This sort of scaling is referred to as vertical scaling where we increase the resources within the hardware instance that currently hosts the application.

Vertical scaling is usually a manual process and is usually done in response to observation or anticipation of increasing loads. The relevant hardware resource is then scaled up to meet increasing demands in order to maintain acceptable performance. Vertical scaling is usually a long-term commitment as, once more storage or RAM or processing power has been added, it is unlikely that such resources will be reduced again even if application load reduces in the future.

Horizontal Scaling

Scaling can also be done by increasing or decreasing the number of instances that host the application. This works well for stateless applications such as those exposed via a REST interface or similar. Increasing the number of instances can be done in order to fulfil an increase in demand.

Horizontal scaling can be more expensive in comparison to vertical scaling as increasing the number of hosts costs a lot more than increasing storage or RAM on the same host. However, horizontal scaling is more flexible as scaling down can be done more easily. Scaling down usually simply involves turning off an instance and performing related tasks such as de-registering it from a load balancer - usually tasks that can be automated. The instance taken away from one service can be re-used for another if required.

Auto-Scaling

Auto-Scaling is the automatic scaling of hardware resources in response to certain conditions. Auto-scaling is almost always horizontal. Auto-scaling is a feature in cloud environments that is now offered as standard by the major cloud providers such as AWS, Azure, and GCP.

Auto-Scaling can be dynamic, scheduled, or predictive.

Dynamic Auto-Scaling

This kind of auto-scaling works by defining certain metrics that need to be monitored. These are measures such as number of requests, CPU utilisation, memory utilisation, etc. The load balancer responsible for dispatching requests to the application hosts can decide, based on these metrics, whether a new host needs to be added to the group of hosts or whether to remove a host. In this way, the capacity of the service is maintained in accordance with the current load.

Scheduled Auto-Scaling

This kind of auto-scaling works by defining set time periods during which your service can scale up or scale down. For example, if you expect increased loads from 9am to 6pm on weekdays, you can schedule your service to scale up to use a higher number of instances during those hours and to scale down outside of those hours.

Predictive Auto-Scaling

This is a more intelligent form of auto-scaling where scaling is done based on past patterns. Machine Learning is used to analyse patterns of traffic and is then used to predictively scale up or scale down in advance by anticipating increased or decreased loads.

An Example: AWS

In AWS, a group of EC2 instances can be grouped into what is called an Auto-Scaling Group. Certain parameters can be configured for the group, such as minimum and maximum number of instances. If, for example, you have defined a minimum capacity of 2 and a maximum of 6, the number of instances serving your application will vary between 2 and 6.

A scaling policy will need to be defined to instruct the group on the strategies to follow for scaling. If you want dynamic auto-scaling, you can define metrics to monitor. If you want scheduled auto-scaling, you can define the time periods for scaling up and down.

Benefits of Auto-Scaling

There are several benefits of auto-scaling, some of which are:

Improved availability by ensuring there is enough capacity to meet spikes in demand.
Reduced costs as you will be using only the resources that you need at any given time.
Saved time as there is no longer a need to manually monitor metrics and perform continuous capacity planning. That said, there is usually an initial time investment in deciding on the appropriate scaling policy for your application. This may require some trial and error.

When to Use Auto-Scaling

Despite the benefits of auto-scaling, you should only use it when necessary as auto-scaling is usually charged for by cloud providers. If your application has stable, predictable demand, then you are better off using dedicated or reserved instances as this will keep your costs down. Auto-scaling is more suited to applications that have a high variance in load such as an e-commerce website that might get spikes in traffic during sales events.