Overview - CPU requests and limits

What is it?

CPU requests and limits are settings in Kubernetes that control how much CPU a container can use. A CPU request is the amount of CPU guaranteed to a container, while a CPU limit is the maximum CPU it can use. These settings help Kubernetes schedule containers efficiently and prevent any container from using too much CPU and affecting others.

Why it matters

Without CPU requests and limits, containers could use unpredictable amounts of CPU, causing some applications to slow down or crash. This would make the system unstable and unfair, as some containers might hog resources while others starve. Setting requests and limits ensures fair sharing and reliable performance for all workloads.

Where it fits

Before learning CPU requests and limits, you should understand basic Kubernetes concepts like pods, containers, and resource management. After this, you can learn about Quality of Service (QoS) classes, node autoscaling, and advanced resource tuning for production environments.

Mental Model

Core Idea

CPU requests guarantee a minimum CPU for a container, while CPU limits cap the maximum CPU it can use to keep the system balanced.

Think of it like...

Imagine a shared kitchen where each cook is guaranteed a certain amount of stove time (CPU request) but cannot use the stove longer than a set limit (CPU limit) so everyone gets a fair chance to cook.

┌───────────────────────────────┐
│         Kubernetes Node        │
│ ┌───────────────┐ ┌───────────┐ │
│ │ Container A   │ │ Container B │
│ │ CPU Request: 1│ │ CPU Request: 0.5│
│ │ CPU Limit: 2  │ │ CPU Limit: 1  │
│ └───────────────┘ └───────────┘ │
│  CPU Capacity: 4 cores         │
└───────────────────────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding CPU in Kubernetes

Concept: Learn what CPU means in Kubernetes and how it is measured.

In Kubernetes, CPU is measured in units called cores. One core equals one CPU thread on the machine. Containers share the CPU cores of the node they run on. Kubernetes uses CPU units like '1' for one full core or '500m' for half a core (500 milli-cores).

Result

You understand how CPU is counted and represented in Kubernetes resource settings.

Knowing how CPU units work helps you set meaningful requests and limits that match your application's needs.

2

FoundationWhat are CPU Requests?

3

IntermediateWhat are CPU Limits?

4

IntermediateHow Requests and Limits Work Together

5

IntermediateImpact on Pod Scheduling and QoS

6

AdvancedCPU Throttling and Performance Effects

7

ExpertAdvanced Scheduling and Overcommit Strategies

Under the Hood

Kubernetes uses the Linux cgroups feature to enforce CPU requests and limits. Requests reserve CPU shares for scheduling and guarantee minimum CPU allocation. Limits set the maximum CPU time a container can consume by throttling its CPU usage via cgroups quota and period settings. The scheduler uses requests to place pods on nodes with enough free CPU capacity. When a container exceeds its CPU limit, the kernel delays its CPU time slices, slowing it down without killing it.

Why designed this way?

This design balances fairness and efficiency. Requests ensure minimum resources so containers run reliably. Limits prevent any container from starving others. Using cgroups leverages existing Linux kernel features for resource control. Overcommit is allowed because CPU is compressible, unlike memory, enabling better utilization. Alternatives like hard CPU caps or no limits would either waste resources or cause instability.

┌───────────────────────────────┐
│ Kubernetes Scheduler           │
│  ┌───────────────┐            │
│  │ CPU Requests  │─┐          │
│  └───────────────┘ │          │
│                    ▼          │
│  ┌───────────────┐            │
│  │ Node with CPU │            │
│  │ Capacity      │            │
│  └───────────────┘            │
│          │                   │
│          ▼                   │
│  ┌───────────────┐           │
│  │ Linux cgroups │           │
│  │ Enforce CPU   │           │
│  │ Requests &    │           │
│  │ Limits        │           │
│  └───────────────┘           │
└───────────────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does setting a CPU limit guarantee your container will get that CPU amount? Commit yes or no.

Common Belief:Setting a CPU limit guarantees your container will always get that much CPU.

Tap to reveal reality

Quick: Can a container use more CPU than its request but less than its limit? Commit yes or no.

Common Belief:A container cannot use more CPU than its request.

Tap to reveal reality

Quick: Does Kubernetes prevent scheduling pods if total CPU requests exceed node capacity? Commit yes or no.

Common Belief:Kubernetes never schedules pods if total CPU requests exceed node CPU capacity.

Tap to reveal reality

Quick: Does exceeding CPU limits cause container crashes? Commit yes or no.

Common Belief:If a container exceeds its CPU limit, it will crash or be killed.

Tap to reveal reality

Expert Zone

1

CPU requests affect pod QoS class, influencing eviction priority under node pressure.

2

CPU limits use cgroups quota and period settings, which can cause bursty throttling behavior depending on kernel timing.

3

Overcommitting CPU requests improves utilization but requires careful monitoring to avoid performance degradation.

When NOT to use

Avoid setting CPU limits for latency-sensitive applications that need consistent CPU performance; instead, rely on requests and node sizing. For batch jobs, consider no limits to allow full CPU usage. Use vertical pod autoscaling or custom metrics for dynamic resource tuning instead of static requests and limits.

Production Patterns

In production, teams set CPU requests based on average usage and limits slightly above peak usage to allow bursts. They monitor throttling metrics to adjust limits. Overcommit is common in large clusters to maximize resource use. QoS classes guide eviction policies during node pressure. Autoscaling policies often depend on CPU requests and limits.

Connections

Quality of Service (QoS) in Kubernetes

CPU requests and limits determine pod QoS classes.

Understanding CPU resource settings helps grasp how Kubernetes prioritizes pods during resource contention.

Linux cgroups

CPU requests and limits are enforced using Linux cgroups features.

Knowing cgroups internals clarifies how Kubernetes controls container CPU usage at the OS level.

Traffic shaping in networking

Both CPU limits and traffic shaping control resource usage to prevent overload.

Recognizing this pattern helps understand resource fairness and throttling across different systems.

Common Pitfalls

#1Setting CPU limits too low causing performance issues.

Wrong approach:resources: requests: cpu: "500m" limits: cpu: "600m"

Correct approach:resources: requests: cpu: "500m" limits: cpu: "1000m"

Root cause:Misunderstanding that limits throttle CPU usage and setting them too close to requests restricts burst capacity.

#2Not setting CPU requests causing pod eviction under pressure.

Wrong approach:resources: limits: cpu: "1"

Correct approach:resources: requests: cpu: "500m" limits: cpu: "1"

Root cause:Believing limits alone guarantee CPU and ignoring that requests influence scheduling and QoS.

#3Assuming total CPU requests must not exceed node capacity.

Wrong approach:Scheduling pods only if sum of requests ≤ node CPU capacity.

Correct approach:Allowing sum of requests > node CPU capacity and relying on limits and usage patterns.

Root cause:Confusing CPU with memory, which cannot be overcommitted safely.

Key Takeaways

CPU requests guarantee minimum CPU for containers and guide Kubernetes scheduling decisions.

CPU limits cap maximum CPU usage to prevent resource hogging and ensure fairness.

Containers can use CPU between their request and limit, allowing flexible resource use.

Exceeding CPU limits causes throttling, which slows containers but does not crash them.

Kubernetes allows CPU overcommit by scheduling pods with total requests exceeding node capacity, balancing utilization and performance.