| Users / Load | Pods | CPU/Memory Usage | Response Time | Autoscaler Behavior |
|---|---|---|---|---|
| 100 users | 1-2 pods | Low (10-30%) | Fast (low latency) | Minimal scaling, stable pod count |
| 10,000 users | 5-10 pods | Moderate (50-70%) | Good (slight increase) | Pods scale up automatically to handle load |
| 1,000,000 users | 1000-2000 pods | High (70-90%) | Acceptable (some latency) | Frequent scaling events, possible cooldown delays |
| 100,000,000 users | 100,000+ pods (cluster limits) | Very High (near max) | Degraded (high latency) | Autoscaler hits cluster or resource limits, scaling bottlenecks |
Horizontal Pod Autoscaler in Microservices - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
The first bottleneck is the cluster resource limits such as CPU, memory, and maximum pod count per node or cluster. When the autoscaler tries to add more pods, it may fail due to insufficient resources or node limits.
Before that, the API server rate limits and autoscaler reaction time can cause delays in scaling, leading to temporary overload on pods.
- Horizontal scaling: Add more nodes to the cluster to increase capacity for pods.
- Vertical scaling: Increase node sizes (CPU, memory) to host more pods per node.
- Autoscaler tuning: Adjust thresholds and cooldown periods for faster, stable scaling.
- Pod resource requests/limits: Optimize pod resource definitions to improve packing efficiency.
- Use multiple clusters: Split load across clusters to avoid single cluster limits.
- Implement caching and queueing: Reduce load spikes and smooth traffic to pods.
Assuming each pod handles ~1000 concurrent requests:
- At 10,000 users: ~10 pods needed.
- At 1,000,000 users: ~1000 pods needed.
- Each pod requires ~0.5 CPU and 1GB RAM.
- Cluster bandwidth depends on request size; e.g., 1MB per request at 1000 QPS = ~1GB/s network.
- Autoscaler API calls increase with pod count; API server must handle scaling requests efficiently.
When discussing Horizontal Pod Autoscaler scaling, start by explaining how it monitors pod metrics (CPU, memory) and adjusts pod count automatically.
Then, describe what happens as load grows: resource limits, API server rate limits, and scaling delays.
Finally, propose concrete solutions like adding nodes, tuning autoscaler settings, and splitting clusters.
This shows understanding of both the autoscaler mechanism and real-world constraints.
Question: Your service handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Since traffic increased 10x, the first step is to scale horizontally by adding more pods using the Horizontal Pod Autoscaler, ensuring the cluster has enough resources. Also, tune autoscaler thresholds to react faster. If cluster limits are reached, add more nodes or split the workload.
Practice
Horizontal Pod Autoscaler in a Kubernetes microservices environment?Solution
Step 1: Understand the role of Horizontal Pod Autoscaler
It is designed to monitor resource usage like CPU or custom metrics and adjust pod count automatically.Step 2: Compare options with this role
Only Automatically adjust the number of pods based on CPU or custom metrics describes automatic scaling based on load, which matches the autoscaler's purpose.Final Answer:
Automatically adjust the number of pods based on CPU or custom metrics -> Option DQuick Check:
Autoscaler adjusts pods automatically = A [OK]
- Confusing autoscaler with manual pod management
- Thinking it balances network traffic
- Assuming it stores data persistently
web-app?Solution
Step 1: Identify correct API version and fields for CPU target
autoscaling/v1 supportstargetCPUUtilizationPercentagedirectly; v2 requires metrics array.Step 2: Check min/max replicas and target CPU utilization
apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50 uses autoscaling/v1 with minReplicas 2, maxReplicas 10, and targetCPUUtilizationPercentage 50, which is valid syntax.Final Answer:
YAML with autoscaling/v1 and targetCPUUtilizationPercentage 50% -> Option CQuick Check:
autoscaling/v1 + targetCPUUtilizationPercentage = B [OK]
- Using wrong apiVersion for the fields
- Confusing CPU with memory metrics
- Setting minReplicas higher than maxReplicas
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
If the current CPU usage is 90% and there are 3 pods running, how many pods will the autoscaler try to set?
Solution
Step 1: Understand scaling formula based on CPU utilization
Desired replicas = current replicas * (current CPU / target CPU) = 3 * (90/60) = 4.5Step 2: Round up and check min/max limits
4.5 rounds up to 5, which is between minReplicas 2 and maxReplicas 6, so 5 pods will be set.Final Answer:
5 pods -> Option AQuick Check:
3 * (90/60) = 4.5 -> 5 pods [OK]
- Rounding down instead of up
- Ignoring min/max replica limits
- Using target CPU as current CPU
Solution
Step 1: Check autoscaler dependency on metrics
Horizontal Pod Autoscaler requires metrics server to get CPU or custom metrics to decide scaling.Step 2: Understand effect of missing metrics
If metrics server is missing or not providing data, autoscaler cannot detect load and keeps pods at minReplicas.Final Answer:
The metrics server is not running or not providing metrics -> Option BQuick Check:
Missing metrics = no scaling beyond minReplicas [OK]
- Assuming maxReplicas lower than minReplicas causes this
- Thinking high load always triggers scaling
- Ignoring metrics server setup
Solution
Step 1: Understand HPA multi-metric support
Horizontal Pod Autoscaler supports multiple metrics in a single configuration to scale pods based on combined criteria.Step 2: Evaluate options for best practice
Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both uses multiple metrics in one HPA, which is efficient and avoids conflicts from multiple HPAs targeting the same deployment.Final Answer:
Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both -> Option AQuick Check:
Single HPA with multiple metrics = A [OK]
- Using multiple HPAs on same deployment causing conflicts
- Ignoring custom metrics support
- Relying only on CPU metrics
