Bird
Raised Fist0
Microservicessystem_design~25 mins

Horizontal Pod Autoscaler in Microservices - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Horizontal Pod Autoscaler (HPA) System
Design the autoscaling control loop and its integration with Kubernetes. Out of scope: detailed Kubernetes cluster management, pod scheduling, and application-level scaling logic.
Functional Requirements
FR1: Automatically scale the number of pod replicas in a Kubernetes cluster based on observed metrics.
FR2: Support scaling based on CPU utilization and custom metrics like request rate or memory usage.
FR3: Ensure minimum and maximum pod replica limits are respected.
FR4: Provide near real-time scaling decisions with latency under 30 seconds.
FR5: Maintain system availability during scaling operations.
FR6: Expose metrics and scaling status for monitoring.
Non-Functional Requirements
NFR1: Handle up to 10,000 pods across multiple namespaces.
NFR2: Scaling decisions must be made every 15 seconds or less.
NFR3: System availability target of 99.9% uptime.
NFR4: Scaling actions should avoid thrashing (rapid scale up/down).
NFR5: Integrate with Kubernetes API and metrics server.
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Metrics Collector (e.g., Metrics Server or Prometheus Adapter)
Autoscaler Controller (control loop logic)
Kubernetes API Server integration
Scaling Decision Engine
Rate Limiter or Stabilizer to prevent thrashing
Monitoring and Alerting system
Design Patterns
Control Loop Pattern for continuous monitoring and action
Observer Pattern for metrics collection
Circuit Breaker or Rate Limiting to avoid thrashing
Leader Election for high availability of autoscaler
Event-driven architecture for reacting to metric changes
Reference Architecture
                    +---------------------+
                    |  Metrics Server /    |
                    |  Custom Metrics API  |
                    +----------+----------+
                               |
                               v
+----------------+      +---------------------+      +-------------------+
| Kubernetes API |<---->| Horizontal Pod      |<---->| Kubernetes Cluster |
| Server         |      | Autoscaler Controller|      | (Pods, Nodes)      |
+----------------+      +---------------------+      +-------------------+
                               ^
                               |
                    +---------------------+
                    | Monitoring & Logging |
                    +---------------------+
Components
Metrics Server / Custom Metrics API
Kubernetes Metrics Server, Prometheus Adapter
Collects resource usage metrics (CPU, memory) and custom metrics from pods and nodes.
Horizontal Pod Autoscaler Controller
Kubernetes Controller written in Go
Runs control loop to fetch metrics, calculate desired replicas, and update Kubernetes API.
Kubernetes API Server
Kubernetes Core Component
Exposes API to read and update pod replica counts and other cluster state.
Kubernetes Cluster (Pods and Nodes)
Containerized microservices running in pods
Hosts the application workloads that are scaled by the autoscaler.
Monitoring & Logging
Prometheus, Grafana, ELK Stack
Tracks autoscaler performance, scaling events, and system health.
Request Flow
1. 1. Metrics Server collects CPU and custom metrics from pods and nodes.
2. 2. Horizontal Pod Autoscaler Controller queries Metrics Server periodically (every 15 seconds).
3. 3. Controller calculates desired number of replicas based on target utilization and current metrics.
4. 4. Controller checks minimum and maximum replica constraints.
5. 5. Controller updates the Kubernetes API Server with new replica count if scaling is needed.
6. 6. Kubernetes API Server triggers pod creation or deletion in the cluster.
7. 7. Monitoring system records scaling events and metrics for visibility.
Database Schema
Not applicable as Kubernetes stores state in etcd. Key entities: HorizontalPodAutoscaler resource with fields: target metrics, minReplicas, maxReplicas, currentReplicas, desiredReplicas, lastScaleTime.
Scaling Discussion
Bottlenecks
Metrics Server overload when collecting metrics from thousands of pods.
Autoscaler Controller becoming a single point of failure.
API Server rate limits when many scaling requests happen simultaneously.
Thrashing due to rapid scale up/down cycles.
Latency in metrics collection causing delayed scaling decisions.
Solutions
Use scalable metrics backends like Prometheus with efficient scraping and aggregation.
Implement leader election among multiple autoscaler controller instances for high availability.
Batch scaling requests and use exponential backoff to avoid API rate limits.
Add stabilization windows and cooldown periods to prevent thrashing.
Optimize metrics collection intervals and use predictive scaling techniques.
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying metrics. Use 20 minutes to design the control loop and components. Reserve 10 minutes to discuss scaling challenges and trade-offs. Use last 5 minutes for questions and summary.
Explain the control loop concept and how metrics drive scaling decisions.
Discuss integration with Kubernetes API and metrics sources.
Highlight how to prevent thrashing with stabilization techniques.
Mention high availability via leader election for the autoscaler controller.
Address scaling bottlenecks and realistic latency targets.

Practice

(1/5)
1. What is the primary purpose of a Horizontal Pod Autoscaler in a Kubernetes microservices environment?
easy
A. Store persistent data for pods
B. Manually restart pods when they fail
C. Balance network traffic between pods
D. Automatically adjust the number of pods based on CPU or custom metrics

Solution

  1. Step 1: Understand the role of Horizontal Pod Autoscaler

    It is designed to monitor resource usage like CPU or custom metrics and adjust pod count automatically.
  2. Step 2: Compare options with this role

    Only Automatically adjust the number of pods based on CPU or custom metrics describes automatic scaling based on load, which matches the autoscaler's purpose.
  3. Final Answer:

    Automatically adjust the number of pods based on CPU or custom metrics -> Option D
  4. Quick Check:

    Autoscaler adjusts pods automatically = A [OK]
Hint: Autoscaler changes pod count automatically based on load [OK]
Common Mistakes:
  • Confusing autoscaler with manual pod management
  • Thinking it balances network traffic
  • Assuming it stores data persistently
2. Which of the following is the correct YAML snippet to define a Horizontal Pod Autoscaler targeting CPU utilization at 50% for a deployment named web-app?
easy
A. apiVersion: autoscaling/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: cpu\n target:\n type: Utilization\n averageUtilization: 70
B. apiVersion: v1\nkind: Pod\nmetadata:\n name: web-app\nspec:\n containers:\n - name: web-app\n image: web-app:latest
C. apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50
D. apiVersion: autoscaling/v2beta2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: memory\n target:\n type: Utilization\n averageUtilization: 50

Solution

  1. Step 1: Identify correct API version and fields for CPU target

    autoscaling/v1 supports targetCPUUtilizationPercentage directly; v2 requires metrics array.
  2. Step 2: Check min/max replicas and target CPU utilization

    apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50 uses autoscaling/v1 with minReplicas 2, maxReplicas 10, and targetCPUUtilizationPercentage 50, which is valid syntax.
  3. Final Answer:

    YAML with autoscaling/v1 and targetCPUUtilizationPercentage 50% -> Option C
  4. Quick Check:

    autoscaling/v1 + targetCPUUtilizationPercentage = B [OK]
Hint: autoscaling/v1 uses targetCPUUtilizationPercentage field [OK]
Common Mistakes:
  • Using wrong apiVersion for the fields
  • Confusing CPU with memory metrics
  • Setting minReplicas higher than maxReplicas
3. Given this Horizontal Pod Autoscaler configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

If the current CPU usage is 90% and there are 3 pods running, how many pods will the autoscaler try to set?
medium
A. 5 pods
B. 3 pods
C. 6 pods
D. 4 pods

Solution

  1. Step 1: Understand scaling formula based on CPU utilization

    Desired replicas = current replicas * (current CPU / target CPU) = 3 * (90/60) = 4.5
  2. Step 2: Round up and check min/max limits

    4.5 rounds up to 5, which is between minReplicas 2 and maxReplicas 6, so 5 pods will be set.
  3. Final Answer:

    5 pods -> Option A
  4. Quick Check:

    3 * (90/60) = 4.5 -> 5 pods [OK]
Hint: Multiply current pods by (current CPU ÷ target CPU) [OK]
Common Mistakes:
  • Rounding down instead of up
  • Ignoring min/max replica limits
  • Using target CPU as current CPU
4. You configured a Horizontal Pod Autoscaler but notice it never scales pods beyond the minimum replicas even under high load. What is the most likely cause?
medium
A. The maxReplicas is set lower than minReplicas
B. The metrics server is not running or not providing metrics
C. The deployment has too many replicas already
D. The pods are using too little CPU

Solution

  1. Step 1: Check autoscaler dependency on metrics

    Horizontal Pod Autoscaler requires metrics server to get CPU or custom metrics to decide scaling.
  2. Step 2: Understand effect of missing metrics

    If metrics server is missing or not providing data, autoscaler cannot detect load and keeps pods at minReplicas.
  3. Final Answer:

    The metrics server is not running or not providing metrics -> Option B
  4. Quick Check:

    Missing metrics = no scaling beyond minReplicas [OK]
Hint: Autoscaler needs metrics server to scale pods [OK]
Common Mistakes:
  • Assuming maxReplicas lower than minReplicas causes this
  • Thinking high load always triggers scaling
  • Ignoring metrics server setup
5. You want to design a microservices system that scales pods horizontally based on both CPU usage and custom queue length metrics. Which approach best uses Horizontal Pod Autoscaler to achieve this?
hard
A. Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both
B. Use two separate HPAs, one for CPU and one for queue length, targeting the same deployment
C. Scale pods manually based on CPU and queue length metrics collected externally
D. Configure HPA to scale only on CPU and ignore queue length metrics

Solution

  1. Step 1: Understand HPA multi-metric support

    Horizontal Pod Autoscaler supports multiple metrics in a single configuration to scale pods based on combined criteria.
  2. Step 2: Evaluate options for best practice

    Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both uses multiple metrics in one HPA, which is efficient and avoids conflicts from multiple HPAs targeting the same deployment.
  3. Final Answer:

    Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both -> Option A
  4. Quick Check:

    Single HPA with multiple metrics = A [OK]
Hint: Use one HPA with multiple metrics for combined scaling [OK]
Common Mistakes:
  • Using multiple HPAs on same deployment causing conflicts
  • Ignoring custom metrics support
  • Relying only on CPU metrics