Problem Statement
When traffic to a microservice suddenly spikes, a fixed number of pods cannot handle the load, causing slow responses or failures. Conversely, during low traffic, running many pods wastes resources and increases costs.
This diagram shows the Horizontal Pod Autoscaler receiving metrics from the Metrics Server, deciding scaling actions, and instructing the Kubernetes API to adjust the number of deployment pods accordingly.
### Before: Manual fixed pod count apiVersion: apps/v1 kind: Deployment metadata: name: example-service spec: replicas: 3 selector: matchLabels: app: example-service template: metadata: labels: app: example-service spec: containers: - name: example-container image: example-image resources: requests: cpu: 100m limits: cpu: 200m --- ### After: Horizontal Pod Autoscaler enabled apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: example-service-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: example-service minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50