Overview - Resource requests and limits

What is it?

Resource requests and limits in Kubernetes are settings that tell the system how much CPU and memory a container needs and how much it can use at most. Requests guarantee a minimum amount of resources for a container to run smoothly. Limits set the maximum resources a container can consume to avoid affecting other containers. These settings help Kubernetes manage resources efficiently across many containers.

Why it matters

Without resource requests and limits, containers could use too many resources, causing other containers to slow down or crash. This would make applications unreliable and hard to manage. By defining these, Kubernetes ensures fair sharing and stability, preventing resource shortages or waste. This leads to better performance, cost control, and predictable behavior in cloud environments.

Where it fits

Before learning resource requests and limits, you should understand basic Kubernetes concepts like pods, containers, and nodes. After this, you can learn about Kubernetes scheduling, autoscaling, and quality of service classes, which all depend on resource management.

Mental Model

Core Idea

Resource requests reserve the minimum needed resources for a container, while limits cap the maximum resources it can use to keep the system balanced.

Think of it like...

Imagine a shared kitchen where each cook reserves a certain amount of stove space (requests) to prepare their meal without interruption, but they cannot use more than their allotted burners (limits) so others can cook too.

┌─────────────────────────────┐
│        Kubernetes Node       │
│ ┌───────────────┐           │
│ │   Pod A       │           │
│ │ ┌───────────┐ │           │
│ │ │ Container │ │           │
│ │ │ Requests  │ │           │
│ │ │ Limits    │ │           │
│ │ └───────────┘ │           │
│ └───────────────┘           │
│ ┌───────────────┐           │
│ │   Pod B       │           │
│ │ ┌───────────┐ │           │
│ │ │ Container │ │           │
│ │ │ Requests  │ │           │
│ │ │ Limits    │ │           │
│ │ └───────────┘ │           │
│ └───────────────┘           │
│                             │
│ Total Node Resources        │
└─────────────────────────────┘

Build-Up - 8 Steps

1

FoundationUnderstanding Kubernetes Resources Basics

Concept: Introduce what CPU and memory resources mean in Kubernetes context.

In Kubernetes, CPU is measured in units called millicores (1000 millicores = 1 CPU core). Memory is measured in bytes (usually megabytes or gigabytes). Containers need these resources to run applications. Kubernetes manages these resources on nodes to run many containers efficiently.

Result

Learner understands the units and types of resources Kubernetes manages.

Knowing the units and types of resources is essential before setting any requests or limits.

2

FoundationWhat Are Resource Requests and Limits?

3

IntermediateHow Kubernetes Uses Requests for Scheduling

4

IntermediateWhat Happens When Limits Are Exceeded?

5

IntermediateQuality of Service Classes Based on Requests and Limits

6

AdvancedConfiguring Requests and Limits in Pod Specs

7

AdvancedImpact of Requests and Limits on Cluster Autoscaling

8

ExpertSurprising Effects of Overcommitting Resources

Under the Hood

Kubernetes uses the kube-scheduler to assign pods to nodes based on resource requests. The kubelet on each node enforces limits using Linux cgroups, which control CPU shares and memory usage. When a container exceeds its memory limit, the kernel's OOM killer terminates it. CPU limits cause the cgroup to throttle CPU cycles, slowing the container without killing it.

Why designed this way?

This design separates scheduling decisions (requests) from runtime enforcement (limits) to optimize resource allocation and stability. Using cgroups leverages existing Linux kernel features for resource control, avoiding reinventing complex mechanisms. This approach balances fairness, efficiency, and protection against resource abuse.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ kube-scheduler│──────▶│   Node kubelet│──────▶│ Linux cgroups │
│  (uses requests)│     │ (enforces limits)│    │ (enforce CPU & │
└───────────────┘       └───────────────┘       │ memory limits)│
                                                └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do you think Kubernetes kills containers that exceed CPU limits immediately? Commit yes or no.

Common Belief:Containers that exceed CPU limits are killed immediately like with memory limits.

Tap to reveal reality

Quick: Do you think setting only limits without requests affects pod scheduling? Commit yes or no.

Common Belief:Setting resource limits alone controls pod scheduling on nodes.

Tap to reveal reality

Quick: Do you think BestEffort pods have resource guarantees? Commit yes or no.

Common Belief:Pods without resource requests or limits still get guaranteed resources.

Tap to reveal reality

Quick: Do you think overcommitting resources always improves cluster efficiency? Commit yes or no.

Common Belief:Setting requests lower than actual usage always makes better use of cluster resources.

Tap to reveal reality

Expert Zone

1

Requests and limits can be set differently per container within the same pod, affecting pod QoS class and scheduling.

2

CPU limits use cgroup shares and throttling, which can cause uneven CPU time slices, impacting latency-sensitive apps.

3

Memory limits are hard limits enforced by the kernel, so setting them too low can cause immediate pod crashes.

When NOT to use

Avoid setting resource requests and limits when running very lightweight or short-lived jobs where overhead is unnecessary. Instead, use BestEffort QoS or specialized batch scheduling. Also, in environments with strict resource isolation, consider using dedicated nodes or namespaces with quotas.

Production Patterns

In production, teams often set requests based on average usage and limits based on peak usage to balance efficiency and stability. Monitoring tools track actual usage to adjust these values over time. QoS classes guide eviction policies during node pressure. Autoscalers rely on requests to scale clusters dynamically.

Connections

Operating System Resource Management

Builds-on

Understanding Linux cgroups and kernel OOM killer helps grasp how Kubernetes enforces resource limits at the system level.

Cloud Cost Optimization

Builds-on

Proper resource requests and limits prevent overprovisioning, directly reducing cloud infrastructure costs.

Project Management Resource Allocation

Analogy in resource planning

Just like allocating team members to tasks with minimum and maximum hours, Kubernetes allocates CPU and memory to containers to ensure smooth operation.

Common Pitfalls

#1Not setting resource requests causes pods to be scheduled without guaranteed resources.

Wrong approach:resources: limits: cpu: "1" memory: "512Mi"

Correct approach:resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1" memory: "512Mi"

Root cause:Learners often think setting limits alone is enough, but requests are needed for scheduling guarantees.

#2Setting requests higher than limits causes pod creation errors.

Wrong approach:resources: requests: cpu: "2" memory: "1Gi" limits: cpu: "1" memory: "512Mi"

Correct approach:resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1" memory: "512Mi"

Root cause:Requests must never exceed limits; misunderstanding this leads to invalid pod specs.

#3Assuming CPU limits kill containers like memory limits do.

Wrong approach:Expect container to restart immediately when CPU usage spikes above limit.

Correct approach:Understand CPU limits throttle CPU usage without killing the container.

Root cause:Confusing CPU throttling with memory OOM killing causes wrong troubleshooting steps.

Key Takeaways

Resource requests guarantee the minimum CPU and memory a container needs to run reliably.

Resource limits cap the maximum CPU and memory a container can use to protect other workloads.

Kubernetes schedules pods based on requests, not limits, so requests affect pod placement.

Exceeding memory limits kills containers, but exceeding CPU limits throttles them without killing.

Properly setting requests and limits improves cluster stability, performance, and cost efficiency.