Microservicessystem_design~7 mins

Health checks in containers in Microservices - System Design Guide

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Problem Statement

When a containerized service crashes or becomes unresponsive, the system may continue sending traffic to it, causing errors and degraded user experience. Without automatic detection, unhealthy containers remain in the load balancer pool, leading to downtime and manual intervention.

Solution

Health checks periodically test if a container is running and responsive by sending requests or commands. If a container fails these checks, the orchestrator removes it from service and can restart or replace it automatically, ensuring only healthy containers receive traffic.

Architecture

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Load        │──────▶│ Health Check  │──────▶│ Container     │
│   Balancer    │       │ Controller    │       │ Instance      │
└───────────────┘       └───────────────┘       └───────────────┘
        ▲                      │  ▲                      │
        │                      │  │                      │
        │                      │  └───────────┐          │
        │                      │              │          │
        │                      └──────────────┴──────────┘
        │                             If unhealthy
        │                             remove from pool
        │
        └─────────────────────────────────────────────────────▶

This diagram shows the load balancer sending traffic to containers only after the health check controller verifies their health status. Unhealthy containers are detected and removed automatically.

Trade-offs

✓ Pros

→

Automatically detects and isolates unhealthy containers to prevent failed requests.

→

Enables self-healing by triggering container restarts or replacements.

→

Improves overall system reliability and availability without manual intervention.

→

Integrates seamlessly with container orchestrators like Kubernetes or Docker Swarm.

✗ Cons

→

Adds extra network or CPU overhead due to periodic health check requests.

→

Misconfigured health checks can cause false positives, removing healthy containers.

→

Requires careful tuning of check frequency and timeout values to balance responsiveness and overhead.

Use health checks when running containerized microservices in production environments with automated orchestration and scaling, especially when uptime and reliability are critical.

Avoid complex health checks in small-scale or development setups where manual monitoring suffices and overhead is undesirable.

Real World Examples

Netflix

Netflix uses health checks in their containerized microservices to automatically detect and replace unhealthy instances, ensuring uninterrupted streaming service.

Uber

Uber employs health checks in their Kubernetes clusters to maintain high availability of ride-matching services by removing unresponsive containers.

Google

Google Kubernetes Engine (GKE) uses liveness and readiness probes as health checks to manage container lifecycle and traffic routing efficiently.

Code Example

The before example shows a container without any health checks, so the orchestrator cannot detect if it is unhealthy. The after example adds liveness and readiness probes that periodically send HTTP GET requests to specific endpoints. If these probes fail, the orchestrator restarts the container or stops sending traffic to it.

Microservices

Before (no health check):
apiVersion: v1
kind: Pod
metadata:
  name: myservice
spec:
  containers:
  - name: app
    image: myapp:latest

After (with health checks):
apiVersion: v1
kind: Pod
metadata:
  name: myservice
spec:
  containers:
  - name: app
    image: myapp:latest
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5

OutputSuccess

Alternatives

Circuit Breaker

Circuit breaker prevents calls to failing services by tracking error rates, rather than detecting container health directly.

Use when: Choose circuit breaker when you want to protect clients from cascading failures rather than managing container lifecycle.

External Monitoring

External monitoring uses separate systems to check service health from outside the cluster, instead of internal container probes.

Use when: Choose external monitoring when you need end-to-end visibility including network and user experience beyond container health.

Summary

Health checks detect and isolate unhealthy containers automatically to maintain service availability.

They use periodic probes to verify container responsiveness and readiness to serve traffic.

Proper configuration of health checks improves reliability but requires balancing frequency and overhead.

Practice

(1/5)

1. What is the main purpose of health checks in containers?

easy

A. To log all container network traffic

B. To increase the container's memory allocation

C. To update the container's software automatically

D. To verify if the container is running and responsive

Health checks in containers in Microservices - System Design Guide

Start learning this pattern below

Practice

Solution

Step 1: Understand container health checks

Step 2: Identify the main goal

Final Answer:

Quick Check:

Solution

Step 1: Recall Docker health check syntax

Step 2: Identify the correct command

Final Answer:

Quick Check:

Solution

Step 1: Understand liveness probe behavior

Step 2: Analyze the HTTP 500 response effect

Final Answer:

Quick Check:

Solution

Step 1: Check health check command correctness

Step 2: Consider container restart policy

Final Answer:

Quick Check:

Solution

Step 1: Understand liveness probe role

Step 2: Understand readiness probe role

Step 3: Combine their functions

Final Answer:

Quick Check: