0
0
Microservicessystem_design~7 mins

Horizontal Pod Autoscaler in Microservices - System Design Guide

Choose your learning style9 modes available
Problem Statement
When traffic to a microservice suddenly spikes, a fixed number of pods cannot handle the load, causing slow responses or failures. Conversely, during low traffic, running many pods wastes resources and increases costs.
Solution
Horizontal Pod Autoscaler automatically adjusts the number of pods in a deployment based on observed metrics like CPU usage or custom metrics. It continuously monitors the load and scales pods out or in to match demand, ensuring efficient resource use and consistent performance.
Architecture
Metrics Server
Horizontal Pod Autoscaler
Deployment Pods
Deployment Pods

This diagram shows the Horizontal Pod Autoscaler receiving metrics from the Metrics Server, deciding scaling actions, and instructing the Kubernetes API to adjust the number of deployment pods accordingly.

Trade-offs
✓ Pros
Automatically matches pod count to workload, improving performance during traffic spikes.
Reduces resource waste by scaling down during low demand.
Integrates seamlessly with Kubernetes and supports custom metrics.
✗ Cons
Scaling decisions depend on metric accuracy and update frequency, which can cause delayed reactions.
Rapid traffic fluctuations can lead to oscillations in pod count if not tuned properly.
Requires proper permissions and configuration in the cluster, adding operational complexity.
Use when your microservices experience variable traffic patterns with CPU or custom metrics exceeding 50% utilization regularly, and you want automated scaling without manual intervention.
Avoid if your workload is very stable with minimal traffic changes or if your cluster resources are fixed and cannot accommodate scaling beyond a set pod count.
Real World Examples
Uber
Uber uses Horizontal Pod Autoscaler to dynamically scale ride-matching services during peak hours, ensuring low latency without over-provisioning resources.
Spotify
Spotify applies Horizontal Pod Autoscaler to adjust backend service pods based on streaming demand, optimizing cost and performance.
Airbnb
Airbnb leverages Horizontal Pod Autoscaler to handle sudden surges in booking requests by scaling reservation service pods automatically.
Code Example
The before code shows a fixed number of pods (3) regardless of load. The after code adds a Horizontal Pod Autoscaler that adjusts pod count between 2 and 10 based on CPU usage, targeting 50% average utilization. This enables automatic scaling to meet demand.
Microservices
### Before: Manual fixed pod count
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-service
  template:
    metadata:
      labels:
        app: example-service
    spec:
      containers:
      - name: example-container
        image: example-image
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m

---

### After: Horizontal Pod Autoscaler enabled
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
OutputSuccess
Alternatives
Vertical Pod Autoscaler
Adjusts resource limits (CPU/memory) of existing pods instead of changing pod count.
Use when: Choose when your workload requires more resources per pod rather than more pods, especially for stateful applications.
Cluster Autoscaler
Scales the number of nodes in the cluster instead of pods within nodes.
Use when: Choose when your cluster needs more physical or virtual machines to support pod scaling.
Summary
Horizontal Pod Autoscaler automatically adjusts pod count based on workload metrics to maintain performance and efficiency.
It helps handle variable traffic by scaling out during peaks and scaling in during low demand.
Proper configuration and metric tuning are essential to avoid scaling delays or oscillations.