Bird
Raised Fist0
Microservicessystem_design~7 mins

Health checks in containers in Microservices - System Design Guide

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Problem Statement
When a containerized service crashes or becomes unresponsive, the system may continue sending traffic to it, causing errors and degraded user experience. Without automatic detection, unhealthy containers remain in the load balancer pool, leading to downtime and manual intervention.
Solution
Health checks periodically test if a container is running and responsive by sending requests or commands. If a container fails these checks, the orchestrator removes it from service and can restart or replace it automatically, ensuring only healthy containers receive traffic.
Architecture
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Load        │──────▶│ Health Check  │──────▶│ Container     │
│   Balancer    │       │ Controller    │       │ Instance      │
└───────────────┘       └───────────────┘       └───────────────┘
        ▲                      │  ▲                      │
        │                      │  │                      │
        │                      │  └───────────┐          │
        │                      │              │          │
        │                      └──────────────┴──────────┘
        │                             If unhealthy
        │                             remove from pool
        │
        └─────────────────────────────────────────────────────▶

This diagram shows the load balancer sending traffic to containers only after the health check controller verifies their health status. Unhealthy containers are detected and removed automatically.

Trade-offs
✓ Pros
Automatically detects and isolates unhealthy containers to prevent failed requests.
Enables self-healing by triggering container restarts or replacements.
Improves overall system reliability and availability without manual intervention.
Integrates seamlessly with container orchestrators like Kubernetes or Docker Swarm.
✗ Cons
Adds extra network or CPU overhead due to periodic health check requests.
Misconfigured health checks can cause false positives, removing healthy containers.
Requires careful tuning of check frequency and timeout values to balance responsiveness and overhead.
Use health checks when running containerized microservices in production environments with automated orchestration and scaling, especially when uptime and reliability are critical.
Avoid complex health checks in small-scale or development setups where manual monitoring suffices and overhead is undesirable.
Real World Examples
Netflix
Netflix uses health checks in their containerized microservices to automatically detect and replace unhealthy instances, ensuring uninterrupted streaming service.
Uber
Uber employs health checks in their Kubernetes clusters to maintain high availability of ride-matching services by removing unresponsive containers.
Google
Google Kubernetes Engine (GKE) uses liveness and readiness probes as health checks to manage container lifecycle and traffic routing efficiently.
Code Example
The before example shows a container without any health checks, so the orchestrator cannot detect if it is unhealthy. The after example adds liveness and readiness probes that periodically send HTTP GET requests to specific endpoints. If these probes fail, the orchestrator restarts the container or stops sending traffic to it.
Microservices
Before (no health check):
apiVersion: v1
kind: Pod
metadata:
  name: myservice
spec:
  containers:
  - name: app
    image: myapp:latest

After (with health checks):
apiVersion: v1
kind: Pod
metadata:
  name: myservice
spec:
  containers:
  - name: app
    image: myapp:latest
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
OutputSuccess
Alternatives
Circuit Breaker
Circuit breaker prevents calls to failing services by tracking error rates, rather than detecting container health directly.
Use when: Choose circuit breaker when you want to protect clients from cascading failures rather than managing container lifecycle.
External Monitoring
External monitoring uses separate systems to check service health from outside the cluster, instead of internal container probes.
Use when: Choose external monitoring when you need end-to-end visibility including network and user experience beyond container health.
Summary
Health checks detect and isolate unhealthy containers automatically to maintain service availability.
They use periodic probes to verify container responsiveness and readiness to serve traffic.
Proper configuration of health checks improves reliability but requires balancing frequency and overhead.

Practice

(1/5)
1. What is the main purpose of health checks in containers?
easy
A. To log all container network traffic
B. To increase the container's memory allocation
C. To update the container's software automatically
D. To verify if the container is running and responsive

Solution

  1. Step 1: Understand container health checks

    Health checks are used to confirm if a container is alive and working properly.
  2. Step 2: Identify the main goal

    The main goal is to detect if the container is responsive and healthy, so it can be restarted if needed.
  3. Final Answer:

    To verify if the container is running and responsive -> Option D
  4. Quick Check:

    Health checks = verify container health [OK]
Hint: Health checks confirm container responsiveness [OK]
Common Mistakes:
  • Confusing health checks with resource allocation
  • Thinking health checks update software
  • Assuming health checks log network data
2. Which of the following is the correct syntax to define a simple HTTP health check in a Docker container?
easy
A. HEALTHCHECK EXECUTE curl -f http://localhost/
B. HEALTHCHECK RUN curl http://localhost/
C. HEALTHCHECK CMD curl -f http://localhost/ || exit 1
D. HEALTHCHECK CHECK curl http://localhost/

Solution

  1. Step 1: Recall Docker health check syntax

    The correct Dockerfile syntax uses HEALTHCHECK CMD followed by a command that returns 0 on success.
  2. Step 2: Identify the correct command

    HEALTHCHECK CMD curl -f http://localhost/ || exit 1 uses 'curl -f' which fails on HTTP errors and 'exit 1' on failure, matching best practice.
  3. Final Answer:

    HEALTHCHECK CMD curl -f http://localhost/ || exit 1 -> Option C
  4. Quick Check:

    Docker healthcheck syntax = HEALTHCHECK CMD [OK]
Hint: Docker healthchecks use 'HEALTHCHECK CMD' syntax [OK]
Common Mistakes:
  • Using RUN instead of CMD in HEALTHCHECK
  • Using EXECUTE or CHECK which are invalid keywords
  • Not handling failure with exit codes
3. Consider this Kubernetes liveness probe configuration snippet:
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
What happens if the container's /health endpoint returns HTTP 500 continuously?
medium
A. Kubernetes restarts the container after failing the liveness probe
B. Kubernetes ignores the failure and keeps the container running
C. Kubernetes scales up the number of containers
D. Kubernetes shuts down the entire pod immediately

Solution

  1. Step 1: Understand liveness probe behavior

    Liveness probes check if a container is alive; failure triggers a restart of that container.
  2. Step 2: Analyze the HTTP 500 response effect

    HTTP 500 means the endpoint is unhealthy, so Kubernetes marks the probe as failed and restarts the container.
  3. Final Answer:

    Kubernetes restarts the container after failing the liveness probe -> Option A
  4. Quick Check:

    Liveness probe failure = container restart [OK]
Hint: Liveness failure triggers container restart [OK]
Common Mistakes:
  • Thinking Kubernetes ignores liveness failures
  • Confusing liveness probe with scaling behavior
  • Assuming pod shutdown instead of container restart
4. You have this Dockerfile snippet:
HEALTHCHECK CMD curl -f http://localhost:5000/health || exit 1
But the container never restarts even when the service is down. What is the likely issue?
medium
A. The container restart policy is not set to restart on failure
B. The container does not expose port 5000
C. The health check command is missing the --interval option
D. The HEALTHCHECK CMD syntax is incorrect

Solution

  1. Step 1: Check health check command correctness

    The command syntax is correct and uses curl -f with exit 1 on failure.
  2. Step 2: Consider container restart policy

    If the container restart policy is not set to restart on failure, the container won't restart despite health check failures.
  3. Final Answer:

    The container restart policy is not set to restart on failure -> Option A
  4. Quick Check:

    Restart policy controls container restart on health failure [OK]
Hint: Check restart policy if container doesn't restart [OK]
Common Mistakes:
  • Assuming health check command syntax is wrong
  • Ignoring restart policy settings
  • Thinking missing --interval causes no restart
5. You want to design a microservice container that uses both readiness and liveness probes. Which of the following best describes their combined use?
hard
A. Both probes only log health status without affecting container state
B. Liveness probe restarts unhealthy containers; readiness probe controls traffic routing to only ready containers
C. Both probes restart containers on failure
D. Readiness probe restarts containers; liveness probe controls traffic routing

Solution

  1. Step 1: Understand liveness probe role

    Liveness probes detect if a container is alive; failure triggers container restart.
  2. Step 2: Understand readiness probe role

    Readiness probes check if a container is ready to serve traffic; failure removes it from load balancer routing.
  3. Step 3: Combine their functions

    Liveness restarts unhealthy containers; readiness controls traffic flow to only healthy containers.
  4. Final Answer:

    Liveness probe restarts unhealthy containers; readiness probe controls traffic routing to only ready containers -> Option B
  5. Quick Check:

    Liveness = restart, Readiness = traffic control [OK]
Hint: Liveness restarts; readiness controls traffic [OK]
Common Mistakes:
  • Mixing up readiness and liveness roles
  • Thinking readiness probe restarts containers
  • Assuming probes only log status without action