Microservicessystem_design~7 mins

Horizontal Pod Autoscaler in Microservices - System Design Guide

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Problem Statement

When traffic to a microservice suddenly spikes, a fixed number of pods cannot handle the load, causing slow responses or failures. Conversely, during low traffic, running many pods wastes resources and increases costs.

Solution

Horizontal Pod Autoscaler automatically adjusts the number of pods in a deployment based on observed metrics like CPU usage or custom metrics. It continuously monitors the load and scales pods out or in to match demand, ensuring efficient resource use and consistent performance.

Architecture

Metrics Server

→Horizontal Pod Autoscaler

↓

Deployment Pods

This diagram shows the Horizontal Pod Autoscaler receiving metrics from the Metrics Server, deciding scaling actions, and instructing the Kubernetes API to adjust the number of deployment pods accordingly.

Trade-offs

✓ Pros

→

Automatically matches pod count to workload, improving performance during traffic spikes.

→

Reduces resource waste by scaling down during low demand.

→

Integrates seamlessly with Kubernetes and supports custom metrics.

✗ Cons

→

Scaling decisions depend on metric accuracy and update frequency, which can cause delayed reactions.

→

Rapid traffic fluctuations can lead to oscillations in pod count if not tuned properly.

→

Requires proper permissions and configuration in the cluster, adding operational complexity.

Use when your microservices experience variable traffic patterns with CPU or custom metrics exceeding 50% utilization regularly, and you want automated scaling without manual intervention.

Avoid if your workload is very stable with minimal traffic changes or if your cluster resources are fixed and cannot accommodate scaling beyond a set pod count.

Real World Examples

Uber

Uber uses Horizontal Pod Autoscaler to dynamically scale ride-matching services during peak hours, ensuring low latency without over-provisioning resources.

Spotify

Spotify applies Horizontal Pod Autoscaler to adjust backend service pods based on streaming demand, optimizing cost and performance.

Airbnb

Airbnb leverages Horizontal Pod Autoscaler to handle sudden surges in booking requests by scaling reservation service pods automatically.

Code Example

The before code shows a fixed number of pods (3) regardless of load. The after code adds a Horizontal Pod Autoscaler that adjusts pod count between 2 and 10 based on CPU usage, targeting 50% average utilization. This enables automatic scaling to meet demand.

Microservices

### Before: Manual fixed pod count
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-service
  template:
    metadata:
      labels:
        app: example-service
    spec:
      containers:
      - name: example-container
        image: example-image
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m

---

### After: Horizontal Pod Autoscaler enabled
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

OutputSuccess

Alternatives

Vertical Pod Autoscaler

Adjusts resource limits (CPU/memory) of existing pods instead of changing pod count.

Use when: Choose when your workload requires more resources per pod rather than more pods, especially for stateful applications.

Cluster Autoscaler

Scales the number of nodes in the cluster instead of pods within nodes.

Use when: Choose when your cluster needs more physical or virtual machines to support pod scaling.

Summary

Horizontal Pod Autoscaler automatically adjusts pod count based on workload metrics to maintain performance and efficiency.

It helps handle variable traffic by scaling out during peaks and scaling in during low demand.

Proper configuration and metric tuning are essential to avoid scaling delays or oscillations.

Practice

(1/5)

1. What is the primary purpose of a Horizontal Pod Autoscaler in a Kubernetes microservices environment?

easy

A. Store persistent data for pods

B. Manually restart pods when they fail

C. Balance network traffic between pods

D. Automatically adjust the number of pods based on CPU or custom metrics

2. Which of the following is the correct YAML snippet to define a Horizontal Pod Autoscaler targeting CPU utilization at 50% for a deployment named web-app?

easy

A. apiVersion: autoscaling/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: cpu\n target:\n type: Utilization\n averageUtilization: 70

B. apiVersion: v1\nkind: Pod\nmetadata:\n name: web-app\nspec:\n containers:\n - name: web-app\n image: web-app:latest

C. apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50

D. apiVersion: autoscaling/v2beta2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: memory\n target:\n type: Utilization\n averageUtilization: 50

Horizontal Pod Autoscaler in Microservices - System Design Guide

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of Horizontal Pod Autoscaler

Step 2: Compare options with this role

Final Answer:

Quick Check:

Solution

Step 1: Identify correct API version and fields for CPU target

Step 2: Check min/max replicas and target CPU utilization

Final Answer:

Quick Check:

Solution

Step 1: Understand scaling formula based on CPU utilization

Step 2: Round up and check min/max limits

Final Answer:

Quick Check:

Solution

Step 1: Check autoscaler dependency on metrics

Step 2: Understand effect of missing metrics

Final Answer:

Quick Check:

Solution

Step 1: Understand HPA multi-metric support

Step 2: Evaluate options for best practice

Final Answer:

Quick Check: