The system watches incoming requests, checks if they are above or below a set limit, then adds or removes instances to handle the load efficiently.
Execution Sample
GCP
1. Monitor request count every minute
2. If requests > 1000, add 1 instance
3. If requests < 500, remove 1 instance
4. Keep instances between 1and5
This simple rule adjusts the number of instances based on request volume to keep performance steady.
Process Table
Minute
Request Count
Condition (>1000?)
Condition (<500?)
Action
Instances Before
Instances After
1
800
No
No
No change
2
2
2
1200
Yes
No
Scale Up +1
2
3
3
1500
Yes
No
Scale Up +1
3
4
4
400
No
Yes
Scale Down -1
4
3
5
300
No
Yes
Scale Down -1
3
2
6
2000
Yes
No
Scale Up +1
2
3
7
600
No
No
No change
3
3
8
450
No
Yes
Scale Down -1
3
2
9
100
No
Yes
Scale Down -1
2
1
10
1100
Yes
No
Scale Up +1
1
2
11
400
No
Yes
Scale Down -1
2
1
12
3000
Yes
No
Scale Up +1
1
2
13
3500
Yes
No
Scale Up +1
2
3
14
4000
Yes
No
Scale Up +1
3
4
15
4500
Yes
No
Scale Up +1
4
5
16
5000
Yes
No
No change (max reached)
5
5
17
200
No
Yes
Scale Down -1
5
4
18
100
No
Yes
Scale Down -1
4
3
19
50
No
Yes
Scale Down -1
3
2
20
400
No
Yes
Scale Down -1
2
1
21
300
No
Yes
No change (min reached)
1
1
💡 At minute 21, instances are at minimum (1), so no further scale down occurs despite low requests.
Status Tracker
Variable
Start
After 1
After 2
After 3
After 4
After 5
After 6
After 7
After 8
After 9
After 10
After 11
After 12
After 13
After 14
After 15
After 16
After 17
After 18
After 19
After 20
After 21
Instances
2
2
3
4
3
2
3
3
2
1
2
1
2
3
4
5
5
4
3
2
1
1
Key Moments - 3 Insights
Why does the number of instances not go below 1 even when requests are very low?
The system has a minimum limit of 1 instance to ensure the service is always available, as shown in execution_table rows 9, 11, 20, and 21 where scale down stops at 1.
What happens when the request count is exactly 1000 or 500?
The conditions check strictly greater than 1000 and less than 500, so at exactly 1000 or 500 no scaling action occurs. This is implied by the conditions in the execution_table.
Why does scaling up stop at 5 instances even if requests keep increasing?
There is a maximum limit of 5 instances to control costs and resource use, as seen at minute 16 where requests are high but instances remain at 5.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at minute 4. What action is taken and why?
AScale Up by 1 because requests > 1000
BNo change because requests are between 500 and 1000
CScale Down by 1 because requests < 500
DScale Up by 2 because requests are very high
💡 Hint
Check the 'Request Count' and 'Condition (<500?)' columns at minute 4 in execution_table.
At which minute does the number of instances first reach the maximum limit?
AMinute 13
BMinute 15
CMinute 16
DMinute 14
💡 Hint
Look at the 'Instances After' column in execution_table and find when it first hits 5.
If the minimum instances were set to 2 instead of 1, what would happen at minute 21?
AInstances would stay at 2
BInstances would scale down to 0
CInstances would stay at 1
DInstances would scale up to 3
💡 Hint
Refer to variable_tracker for Instances and the minimum limit rule explained in key_moments.
Concept Snapshot
Request-based auto scaling watches incoming requests.
If requests go above a high threshold, it adds instances.
If requests fall below a low threshold, it removes instances.
Instances stay within set minimum and maximum limits.
This keeps service responsive and cost-effective.
Full Transcript
Request-based auto scaling is a way to adjust the number of server instances based on how many requests come in. The system checks the request count every minute. If the count is higher than a set limit, it adds an instance to handle the load. If the count is lower than another limit, it removes an instance to save resources. The number of instances never goes below a minimum or above a maximum to keep the service stable and cost-controlled. This process repeats continuously to match capacity with demand.