How to scale based on memory usage in kubernetes

KubernetesHow-ToBeginner · 4 min read

How to Scale Kubernetes Pods Based on Memory Usage

To scale pods based on memory usage in Kubernetes, use the HorizontalPodAutoscaler resource with memory as the target metric. This requires metrics-server installed to provide memory usage data, and you define the memory utilization threshold in the autoscaler spec.

📐

Syntax

The HorizontalPodAutoscaler (HPA) resource defines how Kubernetes scales pods automatically. Key parts include:

apiVersion and kind: Define the resource type.
metadata.name: Name of the HPA.
spec.scaleTargetRef: The deployment or pod to scale.
spec.minReplicas and spec.maxReplicas: Minimum and maximum pod counts.
spec.metrics: Defines the metric type and target value, e.g., memory usage.

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

💻

Example

This example shows an HPA that scales a deployment named web-app between 1 and 4 pods based on average memory usage reaching 60%.

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-memory-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 60

Output

horizontalpodautoscaler.autoscaling/web-app-memory-hpa created

⚠️

Common Pitfalls

Metrics-server not installed: Memory metrics won't be available without it.
Incorrect metric type: Using cpu instead of memory when you want memory-based scaling.
Resource requests missing: Pods must have memory requests set for utilization metrics to work.
Too narrow thresholds: Setting very low or high utilization can cause flapping or no scaling.

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: wrong-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 10  # Too low, causes constant scaling

# Corrected version:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: correct-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

📊

Quick Reference

Remember these key points when scaling based on memory usage:

Install metrics-server to provide memory metrics.
Set memory as the resource metric in HPA.
Ensure pods have memory requests defined.
Choose a reasonable averageUtilization percentage (50-80%).
Set sensible minReplicas and maxReplicas limits.

✅

Key Takeaways

Use HorizontalPodAutoscaler with memory resource metrics to scale pods based on memory usage.

Ensure metrics-server is installed and pods have memory requests set for accurate scaling.

Set averageUtilization to a balanced value to avoid frequent scaling changes.

Define clear minReplicas and maxReplicas to control scaling boundaries.

Verify your HPA configuration with kubectl after applying to confirm it works.