0
0
Microservicessystem_design~12 mins

Horizontal Pod Autoscaler in Microservices - Architecture Diagram

Choose your learning style9 modes available
System Overview - Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a microservices environment based on observed metrics like CPU usage or custom metrics. It ensures the application scales out during high demand and scales in when demand decreases, maintaining performance and resource efficiency.

Architecture Diagram
User
  |
  v
Load Balancer
  |
  v
API Gateway
  |
  v
Service Pods <--> Metrics Server
  |
  v
Horizontal Pod Autoscaler
  |
  v
Kubernetes Control Plane
  |
  v
Container Runtime & Nodes
Components
User
client
Sends requests to the application
Load Balancer
load_balancer
Distributes incoming traffic evenly across service pods
API Gateway
api_gateway
Routes requests to appropriate service pods
Service Pods
service
Run application instances to handle requests
Metrics Server
metrics_collector
Collects resource usage metrics from pods
Horizontal Pod Autoscaler
autoscaler
Monitors metrics and adjusts pod replicas accordingly
Kubernetes Control Plane
orchestrator
Manages cluster state and pod lifecycle
Container Runtime & Nodes
infrastructure
Hosts and runs containerized pods
Request Flow - 7 Hops
UserLoad Balancer
Load BalancerAPI Gateway
API GatewayService Pods
Service PodsMetrics Server
Horizontal Pod AutoscalerMetrics Server
Horizontal Pod AutoscalerKubernetes Control Plane
Kubernetes Control PlaneContainer Runtime & Nodes
Failure Scenario
Component Fails:Metrics Server
Impact:Horizontal Pod Autoscaler cannot get current metrics, so it cannot make scaling decisions. The system may become under or over-provisioned.
Mitigation:Use fallback default replica count or integrate redundant metrics sources. Alert operators to fix metrics server.
Architecture Quiz - 3 Questions
Test your understanding
Which component decides when to add or remove pod replicas?
AAPI Gateway
BLoad Balancer
CHorizontal Pod Autoscaler
DMetrics Server
Design Principle
This architecture demonstrates dynamic scaling by monitoring real-time metrics and adjusting resources automatically. It separates concerns: metrics collection, decision making, and pod lifecycle management, enabling efficient and responsive scaling.