Overview - Minimum, maximum, and desired capacity

What is it?

Minimum, maximum, and desired capacity are settings used in AWS Auto Scaling groups to control how many instances run. Minimum capacity is the smallest number of instances that must always be running. Maximum capacity is the largest number of instances allowed to run. Desired capacity is the target number of instances the system tries to maintain.

Why it matters

These settings help balance cost and performance by automatically adjusting resources based on demand. Without them, you might pay for too many servers or have too few to handle traffic, causing slow or failed services. They ensure your application runs smoothly and efficiently.

Where it fits

Before learning this, you should understand what cloud servers and Auto Scaling groups are. After this, you can learn about scaling policies and alarms that trigger capacity changes automatically.

Mental Model

Core Idea

Minimum, maximum, and desired capacity set the boundaries and target for how many servers run to keep your app healthy and cost-effective.

Think of it like...

It's like setting the thermostat in your house: minimum is the lowest temperature you allow, maximum is the highest, and desired is the temperature you want to keep.

┌───────────────┐
│ Auto Scaling  │
│ Group         │
├───────────────┤
│ Min Capacity  │◄── Smallest number of servers always on
│ Desired Cap.  │◄── Target number of servers to run
│ Max Capacity  │◄── Largest number of servers allowed
└───────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Auto Scaling Groups

Concept: Learn what an Auto Scaling group is and why it manages server counts.

An Auto Scaling group is a collection of servers that can grow or shrink automatically. It helps keep your app available and saves money by adjusting how many servers run based on demand.

Result

You know that Auto Scaling groups control server numbers to match traffic.

Understanding Auto Scaling groups is key because minimum, maximum, and desired capacity only make sense inside this system.

2

FoundationDefining Minimum Capacity

3

IntermediateSetting Maximum Capacity

4

IntermediateChoosing Desired Capacity

5

IntermediateHow Capacities Work Together

6

AdvancedImpact of Capacity Settings on Scaling Behavior

7

ExpertAdvanced Capacity Management and Edge Cases

Under the Hood

AWS Auto Scaling monitors the health and load of instances in the group. It uses the minimum and maximum capacity as hard limits. Desired capacity is the current target number of instances. When scaling triggers occur, Auto Scaling adjusts the number of instances by launching or terminating them to move desired capacity within min and max bounds.

Why designed this way?

This design balances flexibility and control. Minimum and maximum prevent extreme scaling that could cause downtime or excessive cost. Desired capacity allows smooth adjustments. Alternatives like fixed server counts lack responsiveness; fully dynamic without limits risk runaway costs.

┌───────────────┐
│ Scaling Event │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Auto Scaling  │
│ Controller    │
├───────────────┤
│ Min Capacity  │◄── Enforced lower limit
│ Desired Cap.  │◄── Target adjusted here
│ Max Capacity  │◄── Enforced upper limit
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Launch/Term.  │
│ Instances     │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does setting desired capacity higher than maximum launch more servers? Commit to yes or no.

Common Belief:If I set desired capacity higher than maximum, Auto Scaling will launch that many servers.

Tap to reveal reality

Quick: Can minimum capacity be zero and still keep your app always available? Commit to yes or no.

Common Belief:Setting minimum capacity to zero means my app is always available because Auto Scaling will launch servers when needed.

Tap to reveal reality

Quick: Does desired capacity automatically adjust itself without scaling policies? Commit to yes or no.

Common Belief:Desired capacity changes automatically based on traffic without any scaling policies or alarms.

Tap to reveal reality

Quick: Can minimum capacity be higher than maximum capacity? Commit to yes or no.

Common Belief:You can set minimum capacity higher than maximum capacity to force a fixed number of servers.

Tap to reveal reality

Expert Zone

1

Minimum capacity can be temporarily increased during deployments to ensure availability, then lowered after.

2

Desired capacity can be set independently of scaling policies for manual control during special events.

3

Scaling cooldown periods affect how quickly desired capacity changes take effect, preventing rapid fluctuations.

When NOT to use

Avoid relying solely on fixed minimum, maximum, and desired capacities for dynamic workloads. Instead, use scaling policies with metrics and predictive scaling for better responsiveness and cost efficiency.

Production Patterns

In production, teams often set minimum capacity to handle baseline traffic, maximum to cap costs, and desired capacity adjusted by scaling policies triggered by CPU or request metrics. They also use scheduled scaling to prepare for known traffic spikes.

Connections

Feedback Control Systems

Both use target values and limits to maintain system stability.

Understanding capacity settings as a control system helps grasp how Auto Scaling maintains balance between performance and cost.

Thermostat Temperature Control

Both set minimum, maximum, and desired targets to regulate environment conditions.

Seeing capacity like thermostat settings clarifies how boundaries and goals guide automatic adjustments.

Inventory Management

Both manage minimum stock (capacity), maximum stock, and reorder points (desired capacity) to meet demand without waste.

Knowing inventory principles helps understand how cloud resources are provisioned just in time.

Common Pitfalls

#1Setting desired capacity outside min-max range causes confusion or errors.

Wrong approach:MinimumCapacity=2 MaximumCapacity=5 DesiredCapacity=6

Correct approach:MinimumCapacity=2 MaximumCapacity=5 DesiredCapacity=5

Root cause:Misunderstanding that desired capacity must be between minimum and maximum limits.

#2Setting minimum capacity to zero for critical apps causes downtime.

Wrong approach:MinimumCapacity=0 MaximumCapacity=10 DesiredCapacity=0

Correct approach:MinimumCapacity=2 MaximumCapacity=10 DesiredCapacity=2

Root cause:Not realizing minimum capacity ensures baseline availability.

#3Assuming desired capacity changes automatically without scaling policies.

Wrong approach:Set desired capacity once and expect it to adjust with traffic without policies.

Correct approach:Configure scaling policies and alarms to adjust desired capacity dynamically.

Root cause:Confusing desired capacity as an automatic metric-driven value rather than a target set manually or by policies.

Key Takeaways

Minimum, maximum, and desired capacity define the lower limit, upper limit, and target number of servers in an Auto Scaling group.

These settings work together to keep your application reliable and cost-effective by controlling how many servers run.

Desired capacity must always be between minimum and maximum; AWS enforces these limits strictly.

Without proper capacity settings, your app can become unavailable or incur unnecessary costs.

Advanced use involves dynamically adjusting capacities and integrating with scaling policies for optimal performance.