Bird
Raised Fist0
Microservicessystem_design~12 mins

Horizontal Pod Autoscaler in Microservices - Architecture Diagram

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
System Overview - Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a microservices environment based on observed metrics like CPU usage or custom metrics. It ensures the application scales out during high demand and scales in when demand decreases, maintaining performance and resource efficiency.

Architecture Diagram
User
  |
  v
Load Balancer
  |
  v
API Gateway
  |
  v
Service Pods <--> Metrics Server
  |
  v
Horizontal Pod Autoscaler
  |
  v
Kubernetes Control Plane
  |
  v
Container Runtime & Nodes
Components
User
client
Sends requests to the application
Load Balancer
load_balancer
Distributes incoming traffic evenly across service pods
API Gateway
api_gateway
Routes requests to appropriate service pods
Service Pods
service
Run application instances to handle requests
Metrics Server
metrics_collector
Collects resource usage metrics from pods
Horizontal Pod Autoscaler
autoscaler
Monitors metrics and adjusts pod replicas accordingly
Kubernetes Control Plane
orchestrator
Manages cluster state and pod lifecycle
Container Runtime & Nodes
infrastructure
Hosts and runs containerized pods
Request Flow - 7 Hops
UserLoad Balancer
Load BalancerAPI Gateway
API GatewayService Pods
Service PodsMetrics Server
Horizontal Pod AutoscalerMetrics Server
Horizontal Pod AutoscalerKubernetes Control Plane
Kubernetes Control PlaneContainer Runtime & Nodes
Failure Scenario
Component Fails:Metrics Server
Impact:Horizontal Pod Autoscaler cannot get current metrics, so it cannot make scaling decisions. The system may become under or over-provisioned.
Mitigation:Use fallback default replica count or integrate redundant metrics sources. Alert operators to fix metrics server.
Architecture Quiz - 3 Questions
Test your understanding
Which component decides when to add or remove pod replicas?
AAPI Gateway
BLoad Balancer
CHorizontal Pod Autoscaler
DMetrics Server
Design Principle
This architecture demonstrates dynamic scaling by monitoring real-time metrics and adjusting resources automatically. It separates concerns: metrics collection, decision making, and pod lifecycle management, enabling efficient and responsive scaling.

Practice

(1/5)
1. What is the primary purpose of a Horizontal Pod Autoscaler in a Kubernetes microservices environment?
easy
A. Store persistent data for pods
B. Manually restart pods when they fail
C. Balance network traffic between pods
D. Automatically adjust the number of pods based on CPU or custom metrics

Solution

  1. Step 1: Understand the role of Horizontal Pod Autoscaler

    It is designed to monitor resource usage like CPU or custom metrics and adjust pod count automatically.
  2. Step 2: Compare options with this role

    Only Automatically adjust the number of pods based on CPU or custom metrics describes automatic scaling based on load, which matches the autoscaler's purpose.
  3. Final Answer:

    Automatically adjust the number of pods based on CPU or custom metrics -> Option D
  4. Quick Check:

    Autoscaler adjusts pods automatically = A [OK]
Hint: Autoscaler changes pod count automatically based on load [OK]
Common Mistakes:
  • Confusing autoscaler with manual pod management
  • Thinking it balances network traffic
  • Assuming it stores data persistently
2. Which of the following is the correct YAML snippet to define a Horizontal Pod Autoscaler targeting CPU utilization at 50% for a deployment named web-app?
easy
A. apiVersion: autoscaling/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: cpu\n target:\n type: Utilization\n averageUtilization: 70
B. apiVersion: v1\nkind: Pod\nmetadata:\n name: web-app\nspec:\n containers:\n - name: web-app\n image: web-app:latest
C. apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50
D. apiVersion: autoscaling/v2beta2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: memory\n target:\n type: Utilization\n averageUtilization: 50

Solution

  1. Step 1: Identify correct API version and fields for CPU target

    autoscaling/v1 supports targetCPUUtilizationPercentage directly; v2 requires metrics array.
  2. Step 2: Check min/max replicas and target CPU utilization

    apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50 uses autoscaling/v1 with minReplicas 2, maxReplicas 10, and targetCPUUtilizationPercentage 50, which is valid syntax.
  3. Final Answer:

    YAML with autoscaling/v1 and targetCPUUtilizationPercentage 50% -> Option C
  4. Quick Check:

    autoscaling/v1 + targetCPUUtilizationPercentage = B [OK]
Hint: autoscaling/v1 uses targetCPUUtilizationPercentage field [OK]
Common Mistakes:
  • Using wrong apiVersion for the fields
  • Confusing CPU with memory metrics
  • Setting minReplicas higher than maxReplicas
3. Given this Horizontal Pod Autoscaler configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

If the current CPU usage is 90% and there are 3 pods running, how many pods will the autoscaler try to set?
medium
A. 5 pods
B. 3 pods
C. 6 pods
D. 4 pods

Solution

  1. Step 1: Understand scaling formula based on CPU utilization

    Desired replicas = current replicas * (current CPU / target CPU) = 3 * (90/60) = 4.5
  2. Step 2: Round up and check min/max limits

    4.5 rounds up to 5, which is between minReplicas 2 and maxReplicas 6, so 5 pods will be set.
  3. Final Answer:

    5 pods -> Option A
  4. Quick Check:

    3 * (90/60) = 4.5 -> 5 pods [OK]
Hint: Multiply current pods by (current CPU ÷ target CPU) [OK]
Common Mistakes:
  • Rounding down instead of up
  • Ignoring min/max replica limits
  • Using target CPU as current CPU
4. You configured a Horizontal Pod Autoscaler but notice it never scales pods beyond the minimum replicas even under high load. What is the most likely cause?
medium
A. The maxReplicas is set lower than minReplicas
B. The metrics server is not running or not providing metrics
C. The deployment has too many replicas already
D. The pods are using too little CPU

Solution

  1. Step 1: Check autoscaler dependency on metrics

    Horizontal Pod Autoscaler requires metrics server to get CPU or custom metrics to decide scaling.
  2. Step 2: Understand effect of missing metrics

    If metrics server is missing or not providing data, autoscaler cannot detect load and keeps pods at minReplicas.
  3. Final Answer:

    The metrics server is not running or not providing metrics -> Option B
  4. Quick Check:

    Missing metrics = no scaling beyond minReplicas [OK]
Hint: Autoscaler needs metrics server to scale pods [OK]
Common Mistakes:
  • Assuming maxReplicas lower than minReplicas causes this
  • Thinking high load always triggers scaling
  • Ignoring metrics server setup
5. You want to design a microservices system that scales pods horizontally based on both CPU usage and custom queue length metrics. Which approach best uses Horizontal Pod Autoscaler to achieve this?
hard
A. Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both
B. Use two separate HPAs, one for CPU and one for queue length, targeting the same deployment
C. Scale pods manually based on CPU and queue length metrics collected externally
D. Configure HPA to scale only on CPU and ignore queue length metrics

Solution

  1. Step 1: Understand HPA multi-metric support

    Horizontal Pod Autoscaler supports multiple metrics in a single configuration to scale pods based on combined criteria.
  2. Step 2: Evaluate options for best practice

    Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both uses multiple metrics in one HPA, which is efficient and avoids conflicts from multiple HPAs targeting the same deployment.
  3. Final Answer:

    Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both -> Option A
  4. Quick Check:

    Single HPA with multiple metrics = A [OK]
Hint: Use one HPA with multiple metrics for combined scaling [OK]
Common Mistakes:
  • Using multiple HPAs on same deployment causing conflicts
  • Ignoring custom metrics support
  • Relying only on CPU metrics