Overview - Probe failure and container restart behavior

What is it?

In Kubernetes, probes are checks that monitor the health and readiness of containers. There are three main types: liveness, readiness, and startup probes. When a probe fails, Kubernetes can restart the container or stop sending traffic to it, depending on the probe type. This helps keep applications running smoothly by detecting and handling problems automatically.

Why it matters

Without probes, Kubernetes would not know if a container is healthy or ready to serve users. This could lead to broken services, slow responses, or crashes going unnoticed. Probes ensure that unhealthy containers are restarted or removed from service, improving reliability and user experience.

Where it fits

Before learning about probe failures and restarts, you should understand basic Kubernetes concepts like pods, containers, and deployments. After this, you can explore advanced topics like custom health checks, pod lifecycle management, and automated recovery strategies.

Mental Model

Core Idea

Probes are automatic health checks that tell Kubernetes when to restart or stop sending traffic to a container based on its health status.

Think of it like...

It's like a smoke detector in your home: if it senses smoke (a problem), it triggers an alarm (restart or stop traffic) to prevent bigger damage.

┌───────────────┐
│   Kubernetes  │
│   Pod with    │
│   Container   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│   Probes      │
│ ┌───────────┐ │
│ │Liveness   │ │
│ │Readiness  │ │
│ │Startup    │ │
│ └───────────┘ │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Actions on    │
│ Probe Failure │
│ - Restart     │
│ - Stop Traffic│
└───────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Kubernetes Probes Basics

Concept: Introduce what probes are and their types in Kubernetes.

Kubernetes uses probes to check container health. There are three types: - Liveness probe: checks if the container is alive. If it fails, Kubernetes restarts the container. - Readiness probe: checks if the container is ready to accept traffic. If it fails, Kubernetes stops sending traffic. - Startup probe: checks if the container has started properly. It helps avoid premature restarts during slow startups.

Result

Learners understand the purpose and types of probes in Kubernetes.

Knowing the different probes and their roles is key to managing container health and traffic flow effectively.

2

FoundationHow Kubernetes Handles Probe Failures

3

IntermediateConfiguring Probe Failure Thresholds

4

IntermediateDifferences Between Liveness and Readiness Probes

5

AdvancedStartup Probe Role in Slow-Starting Containers

6

ExpertProbe Failure Impact on Stateful Applications

7

ExpertAdvanced Probe Failure Patterns and Recovery

Under the Hood

Kubernetes runs probe commands or HTTP requests inside or against containers at configured intervals. The kubelet process on each node manages these probes. When a probe fails consecutively beyond the failureThreshold, kubelet triggers actions: restarting the container for liveness or removing it from service endpoints for readiness. Startup probes temporarily disable liveness probes until success. This mechanism relies on periodic health checks and kubelet's control loop to maintain container health.

Why designed this way?

Probes were designed to automate container health management without manual intervention. Separating liveness and readiness allows Kubernetes to distinguish between a container that is dead and one that is temporarily unable to serve traffic. Startup probes address the problem of slow-starting containers causing premature restarts. This design balances availability, reliability, and recovery, avoiding unnecessary restarts while ensuring unhealthy containers are fixed.

┌───────────────┐
│   kubelet     │
│  (node agent) │
└──────┬────────┘
       │ runs probes
       ▼
┌───────────────┐
│ Container     │
│ ┌───────────┐ │
│ │ Liveness  │ │
│ │ Readiness │ │
│ │ Startup   │ │
│ └───────────┘ │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Probe Results │
│ - Success     │
│ - Failure     │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ kubelet Acts  │
│ - Restart     │
│ - Remove from │
│   Service     │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does a failing readiness probe cause Kubernetes to restart the container? Commit yes or no.

Common Belief:A failing readiness probe causes the container to restart immediately.

Tap to reveal reality

Quick: Do startup probes replace liveness probes after startup? Commit yes or no.

Common Belief:Startup probes permanently replace liveness probes once the container starts.

Tap to reveal reality

Quick: Does increasing failureThreshold always improve stability? Commit yes or no.

Common Belief:Increasing failureThreshold always makes the system more stable by avoiding restarts.

Tap to reveal reality

Quick: Can restarting a stateful container always fix probe failures? Commit yes or no.

Common Belief:Restarting a stateful container always resolves health issues.

Tap to reveal reality

Expert Zone

1

Liveness probes should be designed to detect only unrecoverable failures to avoid unnecessary restarts.

2

Readiness probes can be used to implement rolling updates by controlling traffic flow during deployment.

3

Startup probes help avoid the 'crash loop backoff' problem common in containers with heavy initialization.

When NOT to use

Avoid using liveness probes that restart stateful containers without graceful shutdown logic; instead, rely on readiness probes and external monitoring. For very simple or short-lived containers, probes might be unnecessary and add overhead.

Production Patterns

In production, teams often combine HTTP GET readiness probes with custom command liveness probes. They tune failureThreshold and periodSeconds based on app behavior. They integrate probe events with alerting systems and use readiness probes to manage traffic during deployments and scaling.

Connections

Load Balancing

Probes control traffic routing by signaling readiness, which directly affects load balancing decisions.

Understanding probe readiness helps grasp how Kubernetes ensures traffic only goes to healthy containers, improving system reliability.

Fault Tolerance in Distributed Systems

Probe failure handling is a form of automated fault detection and recovery, a core principle in fault-tolerant systems.

Knowing how Kubernetes probes work deepens understanding of how distributed systems maintain availability despite failures.

Medical Diagnostics

Probes are like medical tests that check patient health and decide treatment steps based on results.

This connection highlights the importance of timely and accurate health checks to prevent worsening conditions, similar to container health management.

Common Pitfalls

#1Using a liveness probe that restarts a container too aggressively during normal temporary delays.

Wrong approach:livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 1

Correct approach:livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3

Root cause:Misunderstanding container startup time and probe sensitivity causes premature restarts.

#2Configuring readiness probe to restart container on failure.

Wrong approach:readinessProbe: exec: command: ["/bin/check_ready.sh"] failureThreshold: 1 restartPolicy: Always

Correct approach:readinessProbe: exec: command: ["/bin/check_ready.sh"] failureThreshold: 3 # No restartPolicy here; readiness probe does not trigger restarts

Root cause:Confusing readiness probe behavior with liveness probe leads to wrong assumptions about restarts.

#3Not using startup probe for slow-starting containers, causing crash loops.

Wrong approach:livenessProbe: tcpSocket: port: 3306 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 3

Correct approach:startupProbe: tcpSocket: port: 3306 failureThreshold: 30 periodSeconds: 10 livenessProbe: tcpSocket: port: 3306 initialDelaySeconds: 0 periodSeconds: 10 failureThreshold: 3

Root cause:Ignoring startup probe leads to liveness probe restarting container before it finishes starting.

Key Takeaways

Kubernetes probes are essential tools that monitor container health and control restarts and traffic flow.

Liveness probes trigger container restarts on failure, while readiness probes control traffic without restarting.

Startup probes prevent premature restarts during slow container startups by temporarily disabling liveness probes.

Proper configuration of probe thresholds and timing is critical to avoid false positives and unnecessary restarts.

In stateful applications, probe failures and restarts must be handled carefully to prevent data loss or corruption.