Bird
Raised Fist0
Microservicessystem_design~5 mins

Horizontal Pod Autoscaler in Microservices - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a Horizontal Pod Autoscaler (HPA) in Kubernetes?
HPA automatically adjusts the number of pods in a deployment based on observed CPU utilization or other select metrics to maintain application performance.
Click to reveal answer
intermediate
Which metrics can Horizontal Pod Autoscaler use to scale pods?
HPA can use CPU utilization, memory usage, or custom metrics like request rate to decide when to scale pods up or down.
Click to reveal answer
intermediate
How does HPA decide when to increase or decrease pod count?
HPA compares current metric values against target thresholds. If usage is above the target, it adds pods; if below, it removes pods to optimize resource use.
Click to reveal answer
advanced
What is the difference between Horizontal Pod Autoscaler and Vertical Pod Autoscaler?
Horizontal Pod Autoscaler changes the number of pods, while Vertical Pod Autoscaler changes the resource requests and limits (CPU/memory) of existing pods.
Click to reveal answer
intermediate
Why is it important to set minimum and maximum pod limits in HPA?
Setting min and max pod limits prevents scaling too low (causing performance issues) or too high (wasting resources), ensuring stable and efficient operation.
Click to reveal answer
What does Horizontal Pod Autoscaler primarily adjust in a Kubernetes cluster?
ANetwork bandwidth
BCPU limits of pods
CMemory limits of pods
DNumber of pods
Which metric is commonly used by HPA to trigger scaling?
ADisk space usage
BCPU utilization
CNumber of nodes
DPod restart count
What happens if the current CPU usage is below the target in HPA?
APods are removed
BPods remain the same
CPods are added
DNodes are added
Which Kubernetes component typically manages the Horizontal Pod Autoscaler?
Akube-controller-manager
Bkube-scheduler
Ckube-proxy
Detcd
Why should you avoid setting the maximum pod count too high in HPA?
AIt slows down pod startup
BIt can cause pod starvation
CIt wastes cluster resources
DIt causes network congestion
Explain how Horizontal Pod Autoscaler works to maintain application performance.
Think about how a thermostat adjusts temperature automatically.
You got /5 concepts.
    Describe the key differences between Horizontal Pod Autoscaler and Vertical Pod Autoscaler.
    One changes quantity, the other changes size.
    You got /4 concepts.

      Practice

      (1/5)
      1. What is the primary purpose of a Horizontal Pod Autoscaler in a Kubernetes microservices environment?
      easy
      A. Store persistent data for pods
      B. Manually restart pods when they fail
      C. Balance network traffic between pods
      D. Automatically adjust the number of pods based on CPU or custom metrics

      Solution

      1. Step 1: Understand the role of Horizontal Pod Autoscaler

        It is designed to monitor resource usage like CPU or custom metrics and adjust pod count automatically.
      2. Step 2: Compare options with this role

        Only Automatically adjust the number of pods based on CPU or custom metrics describes automatic scaling based on load, which matches the autoscaler's purpose.
      3. Final Answer:

        Automatically adjust the number of pods based on CPU or custom metrics -> Option D
      4. Quick Check:

        Autoscaler adjusts pods automatically = A [OK]
      Hint: Autoscaler changes pod count automatically based on load [OK]
      Common Mistakes:
      • Confusing autoscaler with manual pod management
      • Thinking it balances network traffic
      • Assuming it stores data persistently
      2. Which of the following is the correct YAML snippet to define a Horizontal Pod Autoscaler targeting CPU utilization at 50% for a deployment named web-app?
      easy
      A. apiVersion: autoscaling/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: cpu\n target:\n type: Utilization\n averageUtilization: 70
      B. apiVersion: v1\nkind: Pod\nmetadata:\n name: web-app\nspec:\n containers:\n - name: web-app\n image: web-app:latest
      C. apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50
      D. apiVersion: autoscaling/v2beta2\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 1\n maxReplicas: 5\n metrics:\n - type: Resource\n resource:\n name: memory\n target:\n type: Utilization\n averageUtilization: 50

      Solution

      1. Step 1: Identify correct API version and fields for CPU target

        autoscaling/v1 supports targetCPUUtilizationPercentage directly; v2 requires metrics array.
      2. Step 2: Check min/max replicas and target CPU utilization

        apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: web-app-hpa\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: web-app\n minReplicas: 2\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50 uses autoscaling/v1 with minReplicas 2, maxReplicas 10, and targetCPUUtilizationPercentage 50, which is valid syntax.
      3. Final Answer:

        YAML with autoscaling/v1 and targetCPUUtilizationPercentage 50% -> Option C
      4. Quick Check:

        autoscaling/v1 + targetCPUUtilizationPercentage = B [OK]
      Hint: autoscaling/v1 uses targetCPUUtilizationPercentage field [OK]
      Common Mistakes:
      • Using wrong apiVersion for the fields
      • Confusing CPU with memory metrics
      • Setting minReplicas higher than maxReplicas
      3. Given this Horizontal Pod Autoscaler configuration:
      apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      metadata:
        name: api-hpa
      spec:
        scaleTargetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: api-server
        minReplicas: 2
        maxReplicas: 6
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 60
      

      If the current CPU usage is 90% and there are 3 pods running, how many pods will the autoscaler try to set?
      medium
      A. 5 pods
      B. 3 pods
      C. 6 pods
      D. 4 pods

      Solution

      1. Step 1: Understand scaling formula based on CPU utilization

        Desired replicas = current replicas * (current CPU / target CPU) = 3 * (90/60) = 4.5
      2. Step 2: Round up and check min/max limits

        4.5 rounds up to 5, which is between minReplicas 2 and maxReplicas 6, so 5 pods will be set.
      3. Final Answer:

        5 pods -> Option A
      4. Quick Check:

        3 * (90/60) = 4.5 -> 5 pods [OK]
      Hint: Multiply current pods by (current CPU ÷ target CPU) [OK]
      Common Mistakes:
      • Rounding down instead of up
      • Ignoring min/max replica limits
      • Using target CPU as current CPU
      4. You configured a Horizontal Pod Autoscaler but notice it never scales pods beyond the minimum replicas even under high load. What is the most likely cause?
      medium
      A. The maxReplicas is set lower than minReplicas
      B. The metrics server is not running or not providing metrics
      C. The deployment has too many replicas already
      D. The pods are using too little CPU

      Solution

      1. Step 1: Check autoscaler dependency on metrics

        Horizontal Pod Autoscaler requires metrics server to get CPU or custom metrics to decide scaling.
      2. Step 2: Understand effect of missing metrics

        If metrics server is missing or not providing data, autoscaler cannot detect load and keeps pods at minReplicas.
      3. Final Answer:

        The metrics server is not running or not providing metrics -> Option B
      4. Quick Check:

        Missing metrics = no scaling beyond minReplicas [OK]
      Hint: Autoscaler needs metrics server to scale pods [OK]
      Common Mistakes:
      • Assuming maxReplicas lower than minReplicas causes this
      • Thinking high load always triggers scaling
      • Ignoring metrics server setup
      5. You want to design a microservices system that scales pods horizontally based on both CPU usage and custom queue length metrics. Which approach best uses Horizontal Pod Autoscaler to achieve this?
      hard
      A. Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both
      B. Use two separate HPAs, one for CPU and one for queue length, targeting the same deployment
      C. Scale pods manually based on CPU and queue length metrics collected externally
      D. Configure HPA to scale only on CPU and ignore queue length metrics

      Solution

      1. Step 1: Understand HPA multi-metric support

        Horizontal Pod Autoscaler supports multiple metrics in a single configuration to scale pods based on combined criteria.
      2. Step 2: Evaluate options for best practice

        Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both uses multiple metrics in one HPA, which is efficient and avoids conflicts from multiple HPAs targeting the same deployment.
      3. Final Answer:

        Configure HPA with multiple metrics: CPU utilization and custom queue length, setting thresholds for both -> Option A
      4. Quick Check:

        Single HPA with multiple metrics = A [OK]
      Hint: Use one HPA with multiple metrics for combined scaling [OK]
      Common Mistakes:
      • Using multiple HPAs on same deployment causing conflicts
      • Ignoring custom metrics support
      • Relying only on CPU metrics