Microservicessystem_design~25 mins

Horizontal Pod Autoscaler in Microservices - System Design Exercise

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Design: Horizontal Pod Autoscaler (HPA) System

Design the autoscaling control loop and its integration with Kubernetes. Out of scope: detailed Kubernetes cluster management, pod scheduling, and application-level scaling logic.

Functional Requirements

FR1: Automatically scale the number of pod replicas in a Kubernetes cluster based on observed metrics.

FR2: Support scaling based on CPU utilization and custom metrics like request rate or memory usage.

FR3: Ensure minimum and maximum pod replica limits are respected.

FR4: Provide near real-time scaling decisions with latency under 30 seconds.

FR5: Maintain system availability during scaling operations.

FR6: Expose metrics and scaling status for monitoring.

Non-Functional Requirements

NFR1: Handle up to 10,000 pods across multiple namespaces.

NFR2: Scaling decisions must be made every 15 seconds or less.

NFR3: System availability target of 99.9% uptime.

NFR4: Scaling actions should avoid thrashing (rapid scale up/down).

NFR5: Integrate with Kubernetes API and metrics server.

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Metrics Collector (e.g., Metrics Server or Prometheus Adapter)

Autoscaler Controller (control loop logic)

Kubernetes API Server integration

Scaling Decision Engine

Rate Limiter or Stabilizer to prevent thrashing

Monitoring and Alerting system

Design Patterns

Control Loop Pattern for continuous monitoring and action

Observer Pattern for metrics collection

Circuit Breaker or Rate Limiting to avoid thrashing

Leader Election for high availability of autoscaler

Event-driven architecture for reacting to metric changes

Reference Architecture

                    +---------------------+
                    |  Metrics Server /    |
                    |  Custom Metrics API  |
                    +----------+----------+
                               |
                               v
+----------------+      +---------------------+      +-------------------+
| Kubernetes API |<---->| Horizontal Pod      |<---->| Kubernetes Cluster |
| Server         |      | Autoscaler Controller|      | (Pods, Nodes)      |
+----------------+      +---------------------+      +-------------------+
                               ^
                               |
                    +---------------------+
                    | Monitoring & Logging |
                    +---------------------+

Components

Metrics Server / Custom Metrics API

Kubernetes Metrics Server, Prometheus Adapter

Collects resource usage metrics (CPU, memory) and custom metrics from pods and nodes.

Horizontal Pod Autoscaler Controller

Kubernetes Controller written in Go

Runs control loop to fetch metrics, calculate desired replicas, and update Kubernetes API.

Kubernetes API Server

Kubernetes Core Component

Exposes API to read and update pod replica counts and other cluster state.

Kubernetes Cluster (Pods and Nodes)

Containerized microservices running in pods

Hosts the application workloads that are scaled by the autoscaler.

Monitoring & Logging

Prometheus, Grafana, ELK Stack

Tracks autoscaler performance, scaling events, and system health.

Request Flow

1. 1. Metrics Server collects CPU and custom metrics from pods and nodes.

2. 2. Horizontal Pod Autoscaler Controller queries Metrics Server periodically (every 15 seconds).

3. 3. Controller calculates desired number of replicas based on target utilization and current metrics.

4. 4. Controller checks minimum and maximum replica constraints.

5. 5. Controller updates the Kubernetes API Server with new replica count if scaling is needed.

6. 6. Kubernetes API Server triggers pod creation or deletion in the cluster.

7. 7. Monitoring system records scaling events and metrics for visibility.

Database Schema

Not applicable as Kubernetes stores state in etcd. Key entities: HorizontalPodAutoscaler resource with fields: target metrics, minReplicas, maxReplicas, currentReplicas, desiredReplicas, lastScaleTime.

Scaling Discussion

Bottlenecks

Metrics Server overload when collecting metrics from thousands of pods.

Autoscaler Controller becoming a single point of failure.

API Server rate limits when many scaling requests happen simultaneously.

Thrashing due to rapid scale up/down cycles.

Latency in metrics collection causing delayed scaling decisions.

Solutions

Use scalable metrics backends like Prometheus with efficient scraping and aggregation.

Implement leader election among multiple autoscaler controller instances for high availability.

Batch scaling requests and use exponential backoff to avoid API rate limits.

Add stabilization windows and cooldown periods to prevent thrashing.

Optimize metrics collection intervals and use predictive scaling techniques.

Interview Tips

Time: Spend 10 minutes understanding requirements and clarifying metrics. Use 20 minutes to design the control loop and components. Reserve 10 minutes to discuss scaling challenges and trade-offs. Use last 5 minutes for questions and summary.

Explain the control loop concept and how metrics drive scaling decisions.

Discuss integration with Kubernetes API and metrics sources.

Highlight how to prevent thrashing with stabilization techniques.

Mention high availability via leader election for the autoscaler controller.

Address scaling bottlenecks and realistic latency targets.

Practice

(1/5)

1. What is the primary purpose of a Horizontal Pod Autoscaler in a Kubernetes microservices environment?

easy

A. Store persistent data for pods

B. Manually restart pods when they fail

C. Balance network traffic between pods

D. Automatically adjust the number of pods based on CPU or custom metrics

2. Which of the following is the correct YAML snippet to define a Horizontal Pod Autoscaler targeting CPU utilization at 50% for a deployment named web-app?

easy

A. apiVersion: autoscaling/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: cpu\n target:\n type: Utilization\n averageUtilization: 70

B. apiVersion: v1\nkind: Pod\nmetadata:\n name: web-app\nspec:\n containers:\n - name: web-app\n image: web-app:latest

C. apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50

D. apiVersion: autoscaling/v2beta2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: memory\n target:\n type: Utilization\n averageUtilization: 50

Horizontal Pod Autoscaler in Microservices - System Design Exercise

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of Horizontal Pod Autoscaler

Step 2: Compare options with this role

Final Answer:

Quick Check:

Solution

Step 1: Identify correct API version and fields for CPU target

Step 2: Check min/max replicas and target CPU utilization

Final Answer:

Quick Check:

Solution

Step 1: Understand scaling formula based on CPU utilization

Step 2: Round up and check min/max limits

Final Answer:

Quick Check:

Solution

Step 1: Check autoscaler dependency on metrics

Step 2: Understand effect of missing metrics

Final Answer:

Quick Check:

Solution

Step 1: Understand HPA multi-metric support

Step 2: Evaluate options for best practice

Final Answer:

Quick Check: