29 Apr 2022, by Slade Baylis
When it comes to IT infrastructure there is one near-universal truth - it almost always grows in size and scale, needing more and more power and resources as time moves on. Just like how Moore’s law states that the number of transistors on a microchip doubles every two years, it can feel like your systems grow in size at the same rate. If only this was just as certain with business in general!
Along with this growth comes the requirement for capacity planning - which is the process of estimating how much traffic your systems will receive and thus determining the resources that they’ll require. However, spikes and surges in usage or traffic to business-critical applications can require infrastructure to grow and shrink with very short notice.
With these more unexpected or unplanned changes in usage comes the need to increase the power or resources of your systems on a more moment-to-moment basis - and so this is where the scalability of your infrastructure comes into play.
When it comes to hosting and IT infrastructure growth, there are two main aspects that need to be taken into account. The first is the average increase in resource requirements over time, which tends to correlate closely with the growth of your customer base. The second is the shorter-term unpredictable variations in resource requirements that can either happen randomly or more often be generated by marketing campaigns and publicity.
For long-term planning, most businesses will likely be familiar with the process of “capacity management” - even though they might not be familiar with that specific term. Put simply, capacity management is the process of estimating how much traffic, utilisation, or “load” your systems will be under, and then using this information to set how many resources should be allocated to those systems.
For example, even if you’ve only got a simple web hosting account, estimating how much traffic you expect to receive and then choosing a hosting plan with enough CPU and RAM is one example of capacity management. Another example would be taking a look at the change in the size of your website over time as you upload photos and files into it, and then upgrading to a larger plan in order to cope with this increase.
For larger or more complicated systems, the process is still much the same. Businesses analyse data about how their systems are being utilised to make predictions about what resources (aka “capacity”) their systems will require in the future, and then adjust their systems accordingly. Traditionally, IT departments would plan their resource requirements 12, 24, or even 36 months in advance - in fact, that was the only way it could be done! In each of these cases, data is used to form long-term plans about the size and type of infrastructure to utilise.
However, for any web service or website, not having enough processing power will not only make your application run slower than it should, it could also stop your server in its tracks! This is why, as well as capacity planning, organisations need to consider what options are available for quickly scaling their systems up and down. Now with cloud services being available, you dont need to be that meticulous with planning and can be much more responsive to changes in demand. Being able to scale your infrastructure in response to moment-to-moment changes in demand can protect your systems against downtime due to resource scarcity.
That ability to adjust a system quickly in response to changes in demand is what’s known as the “scalability” of that system.
The choice of hosting platform can limit your options for scalability, as well as affect things like how long it takes your systems to scale, or the overall resiliency of the system as a whole.
For example, if you choose to host applications on your own hardware/physical servers, scaling such a system to changes in demand will entail adding extra CPUs if there is room, or adding additional RAM or hard drives. If you don’t have spare parts on hand, this will require reaching out to suppliers to source parts and then waiting for their delivery before they can even be installed – quite a delay if you need the extra horsepower quickly!
On the other side, virtualised solutions benefit from not needing to consider the hardware that they run on. Whichever cloud provider you host with would instead manage the infrastructure and make sure there is always capacity available for customers that need it. If you require additional CPU, RAM, or Disk Space, all you need to do is reach out to your provider and ask them to add it. As another advantage, if your cloud provider is providing a High Availability service (such as our VMware Cloud Server services), that virtual server will be protected from downtime through redundancy and automatic fall-over.
However, using a virtual server isn’t without its limitations. Whilst it’s the simplest and easiest approach, a virtual server still has limits on the number of resources it can be allocated and the amount of traffic it can receive. This is why alternate solutions for distributing load between multiple servers were developed. Rather than having a single highly powerful centralised server set up to receive all the required traffic, many servers are run in parallel and have the traffic shared between them.
The names of these two approaches to scaling a system are Vertical Scaling and Horizontal Scaling.
The simplest way of scaling existing systems is to throw more power at them. Usually, this would be through adding more CPU cores or RAM to the servers that they run on, although it can also mean moving them to faster hardware without increasing the total amount of resources assigned to them. For a long time, this was the standard approach to scaling systems.
This type of approach to scaling is referred to as Vertical Scaling – a common analogy is that of adding additional floors to an existing building to gain more space, thus allowing you to expand without requiring additional land.
An alternative way of scaling systems however is to look at scaling horizontally. Instead of taking an existing server and making it more powerful, systems are instead set up to utilise smaller purpose-built servers that are configured to run side-by-side with one another. These smaller servers are built to handle a certain amount of traffic and load balancers are placed in front to pass requests to servers that are ready and able to handle them.
This type of approach to scaling is referred to as Horizontal Scaling – just like before, a common analogy would be that of adding more buildings onto adjacent land, rather than adding more floors to an existing building.
Due to its simplicity, increasing the resources of existing systems is often attractive to SMBs and smaller companies that are looking to expand. It is even more stable in terms of infrastructure costs - as the number of servers won’t change without human intervention, making the job for your finance team easier. This predictability is a big selling point for smaller companies that often have more limited budgets for their IT infrastructure.
In addition to this, having fewer servers overall means that your systems will be easier to manage, maintain, and protect. This is because, as a general rule, the larger your infrastructure is, the more complicated, difficult, and often expensive it becomes to defend them against security threats.
One drawback to this approach is that adding additional resources to an existing server will require you to know that you need the additional resources and then take manual action to increase them. This could entail contacting your cloud provider to request those resources or increasing them yourself if you have the required access, but either approach will require you to notice that you need the extra resources in the first place and then have someone manually add them. Not only that, but often this can require that servers be restarted, meaning that there will be a short period of downtime during that reboot – though this isn’t required with some systems, such as our VMware Cloud Server environments.
These limitations mean that it can be very difficult - and sometimes impossible – to set up automation around vertical scaling. This limits your ability to set up your environment to automatically scale up and down depending on the demands placed on your systems.
Finally, there are always physical limitations that control how large a single server can be – this means that at a certain point, every organisation that grows to a certain scale will need to consider how they can scale horizontally rather than taking a purely vertical approach.
With vertical scaling having a limit on the size it can grow to – and by extension the amount of traffic it can receive – other solutions had to be designed to work around these limitations.
In order to solve this issue, the solution that was developed was to instead split traffic between many smaller servers by using “Load Balancers”. These load balancers are able to receive traffic, and then delegate that traffic to one of many smaller servers that are set up behind itself – even being able to detect and make sure it’s only passing the traffic to one that is ready to handle that request.
By doing this, if it’s expected that the system will need to deal with higher levels of traffic, then extra servers can be added behind the load balancers. In fact, this can even be automated, allowing your systems to automatically increase or decrease the number of servers in response to real-time data about system utilisation.
Due to this flexibility and ability to automate, this is often the only approach that larger organisations such as Facebook, Twitter, or Google can take with their systems - purely due to the amount of traffic that they work with. As you can imagine, no single server anywhere in the world would be enough to handle the number of people that use their systems!
As another benefit to this type of infrastructure design, due to the system utilising many different servers running side-by-side, this increases the reliability of your infrastructure by avoiding single points of failure.
However, just like with any system, each approach has its drawbacks. With any horizontally scalable system, the initial costs of implementation tend to be higher due to the increased complexity and the number of systems involved. In addition to the higher upfront expenses, purely due to utilising a larger amount of servers, the cost of managing, maintaining, and protecting those servers will also be more than a single server solution.
In general, there exists a tipping point where this type of approach becomes more cost-effective than a single server solution. Where that tipping point resides would depend on the amount of traffic you receive and how resource-intensive you expect each user interaction to be.
As is normally the case, the answer to this question will depend on your situation – what your unique business requirements are and how you plan to use your website and market to your customers. These are key factors that will need to be considered.
If you’re expecting traffic in the number of several thousand over a month and you are only hosting a fairly static and non-interactive website, then a horizontally scaled system is likely to be overkill. On the other hand, if you are expecting several hundred thousand, or your application will be fairly interactive, then it could be time to look into load-balanced options instead.
We recommend discussing your specific system requirements with your own website or application developers (or if you are instead using proprietary software, their developers instead). They will be familiar with the usual amounts of resources that will be required for your application and will be able to give you a recommendation on how best to configure the hosting environment for your needs.
Want us to take a look at your infrastructure and find out which approach would be the most appropriate for you? If so, give us a call on 1300 769 972 (Option #1) or email us at sales@micron21.com and we’d be more than happy to help.
Simple, transparent pricing from Australia's leading cloud provider