Azure Scalability vs Elasticity: What Actually Matters fo...

After a client asked me to sanity-check their Azure capacity plan ahead of a product launch, I remembered that I had written a half-page note on Azure scalability vs elasticity for an internal handover last year. Pulled it up, realized it was thin, and figured I would expand it into something useful. The two terms get mashed together in every kickoff meeting I sit in on, and that confusion shows up later as either a surprise bill or a slow site.

So here is the longer version, written down properly this time.

The terms, kept short

Scalability is the capacity of the system to grow. Elasticity is the behavior of it growing and shrinking on its own. Scalability is the property. Elasticity is the automation around it.

You can have a scalable system that is not elastic. You cannot really have an elastic system that is not scalable.

That is the whole distinction. Everything else is implementation detail.

Vertical vs horizontal, in Azure terms

Vertical scaling means making one instance bigger. Scale up adds CPU, RAM, or I/O. Scale down removes it. In Azure that looks like resizing a VM SKU from a D4s_v5 to a D8s_v5, or moving an App Service Plan from S1 to P2v3.

Horizontal scaling means adding more instances. Scale out adds nodes. Scale in removes them. The Azure-native primitives for this are VM Scale Sets, App Service Plan instance counts, and AKS node pools or pod replicas.

Two practical notes from running this in production:

Scale up almost always requires a restart. Azure does the work for you on App Service, but the worker process recycles. On a VM you eat a reboot. Plan the maintenance window even if the portal makes the change feel instant.

Ad Space

Scale out does not. New instances come online while existing ones keep serving traffic. This is why every serious Azure workload I touch ends up horizontally scaled, even if vertical scaling would technically work.

Where elasticity actually lives

Elasticity is the autoscale rules sitting on top of those primitives. In Azure it shows up as:

Autoscale on VM Scale Sets, driven by metrics like CPU percentage, queue length, or a custom metric from Application Insights.
Autoscale on App Service Plans, same idea, scoped to the plan.
Horizontal Pod Autoscaler and Cluster Autoscaler on AKS.
Consumption-tier services like Functions and Logic Apps, where the elasticity is the entire pricing model.

The opinionated take: if a workload is not on a consumption plan and does not have at least one autoscale rule defined, it is not elastic. It is a fixed allocation that someone might resize manually on a Tuesday. That is fine, but call it what it is and stop putting elasticity on the architecture diagram.

A client engagement that made this concrete

One of our long-term accounts runs a retail platform that does roughly 70 percent of its annual revenue in a six-week window. When we took over management of their Azure tenant, the front end was a single App Service Plan on P1v2 with three fixed instances. The previous vendor had labeled it as elastic. It was not. There were no autoscale rules. Someone manually bumped the instance count every November and bumped it back down in January.

That worked, badly, until the year a marketing push landed mid-week and traffic doubled in an afternoon. The instances pegged, response times climbed, and the cart service started timing out against the SQL backend.

The fix was not exotic. We moved the plan to P2v3, defined autoscale rules on CPU and HTTP queue length with a minimum of three instances and a maximum of twelve, and added a scheduled rule that bumps the minimum to six for the Black Friday window. We also moved the SQL database to a Hyperscale tier so the backend was not the new bottleneck. The total engineering time was under two days. The annual Azure spend went down, because the off-peak floor dropped from three instances of P1v2 to three of the smaller baseline. That is elasticity earning its keep.

If you are inheriting a similar environment, the IT migration work we do almost always starts with this exact audit: list every scalable resource, check if it has autoscale, and decide whether it should.

The decision flow I actually use

I keep this short on purpose. Most planning conversations get stuck because someone wants a framework and the framework is three questions.

1. Is the load predictable or bursty?

Predictable load with a known peak does not need elasticity. Size it for the peak, accept the overhead off-hours, move on. Bursty or unknown load needs autoscale rules and a max ceiling that protects the budget.

Ad Space

2. Is the bottleneck CPU, memory, I/O, or a downstream service?

This decides whether to scale up or scale out. CPU and stateless throughput scale out beautifully. Memory pressure on a stateful workload usually scales up. A downstream database that cannot keep up is not solved by either, and adding more front-end instances will make it worse. I have watched teams scale out an App Service into the ground because the actual ceiling was on the SQL side.

3. How fast does the system need to react?

VM Scale Set autoscale takes minutes to bring an instance up. AKS pods are seconds. Functions are effectively instant for the first invocation but warm-up matters under load. Match the response time of your autoscale to the rate at which traffic actually changes.

Caveats, because they always bite

Autoscale rules need a cooldown longer than you think. The default five minutes is fine for most web workloads. Anything shorter and you get flapping, where the system scales out, the new instance pulls CPU below the threshold, the system scales in, and you do it all again. I have seen a misconfigured rule do this for 18 hours before anyone noticed.

Scale-in is the dangerous direction. Scale-out failures are visible because users see slow responses. Scale-in failures are invisible until the next traffic spike. Test the scale-in path. Actually test it.

Stateful workloads do not horizontally scale without work. If your application keeps session in memory, scaling out splits user sessions across instances and breaks logins. Either externalize state to Redis or sticky-route at the front door. There is no third option that ends well.

Note: autoscale does not protect you from a runaway cost event. Always set a maximum instance count, and set an Azure budget alert against the resource group. We had a client lose access to their admin accounts after a misconfigured rule scaled an AKS cluster to its quota ceiling overnight. The bill was recoverable. The conversation was not fun.

Tooling for the day-to-day

I script most of the resize and autoscale operations through PowerShell and the Az module. Set-AzVmss, Set-AzAppServicePlan, and Add-AzAutoscaleSetting cover roughly 90 percent of what I do in a normal week. If you are comfortable with PowerShell pipelines, my notes on PowerShell OutBuffer behavior are worth a glance before you start piping large result sets from Get-AzMetric.

For AKS specifically, the Cluster Autoscaler config sits in the node pool definition and the HPA sits in the workload manifest. Keep those in source control. I have walked into more than one environment where the cluster autoscaler had been edited live in the portal and nobody knew which values were authoritative.

Ad Space

If you are running hybrid and need to keep on-prem capacity in the picture, our VPS hosting tier is the cheaper floor we point clients at for the baseline that does not need to live in Azure.

Where this connects to disaster recovery

High availability is not scalability. They get conflated because both involve more than one instance, but the goals are different. HA distributes instances across availability zones or regions so a failure does not take the workload offline. Scalability adds instances so the workload can handle more traffic. You generally want both, configured independently.

One pattern I have shipped at a few clients: VM Scale Sets across two availability zones with autoscale rules that maintain a minimum of two instances per zone. That gives you HA at the floor and elasticity above it. Pair it with a Veeam-based DR plan for the stateful tier and you have covered the three failure modes that actually happen. My write-up on Veeam failover plans covers the orchestration side of that.

For workloads where a ransomware event is the primary concern rather than a regional outage, the ransomware protection backup tier sits in front of all of this and is independent of how the front end scales.

The takeaway

Scalability is what your architecture allows. Elasticity is what your automation does with it. Write down which Azure resources have autoscale defined, which do not, and which ones should. Set a max ceiling on everything. Test scale-in, not just scale-out. Stop calling fixed allocations elastic.

If you want a second pair of eyes on an Azure capacity plan before a launch, drop a note at clients.sse.to/contact.php. The Azure docs on autoscale and VM Scale Sets are the canonical reference for the rule syntax and metric sources.