Microservicessystem_design~7 mins

Liveness and readiness probes in Microservices - System Design Guide

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Problem Statement

Without proper health checks, a microservice might appear healthy while it is stuck or unable to serve traffic, causing requests to fail or time out. Also, routing traffic to a service that is still starting or temporarily unable to handle requests leads to poor user experience and wasted resources.

Solution

Liveness probes regularly check if a service is alive and responsive; if not, the system restarts it to recover. Readiness probes check if a service is ready to accept traffic, preventing routing requests to it until it is fully prepared. Together, they ensure only healthy and ready services receive traffic, improving reliability and availability.

Architecture

Load

Balancer

→Readiness

↓

Liveness Probe

This diagram shows the load balancer sending traffic only to service instances that pass the readiness probe. The liveness probe monitors the service health and triggers restarts if needed.

Trade-offs

✓ Pros

→

Automatically recovers services stuck in unhealthy states by restarting them.

→

Prevents routing traffic to services that are not ready, improving user experience.

→

Enables faster detection of failures and reduces downtime.

→

Improves overall system reliability and availability.

✗ Cons

→

Requires careful configuration to avoid false positives causing unnecessary restarts.

→

Adds complexity to deployment and monitoring setup.

→

Improper probe design can mask real issues or delay recovery.

Use when deploying microservices in orchestrated environments like Kubernetes with frequent deployments and dynamic scaling, especially when services have startup delays or can get stuck.

Avoid if your service is extremely simple, stateless, and fast to start, or if you have no orchestration platform to act on probe results.

Real World Examples

Google

Kubernetes uses liveness and readiness probes to manage container lifecycle, ensuring only healthy pods receive traffic and restarting unhealthy ones automatically.

Netflix

Netflix uses readiness probes to prevent routing user requests to instances still warming up or temporarily overloaded, improving streaming reliability.

Uber

Uber employs liveness probes to detect and restart microservices stuck due to deadlocks or resource exhaustion, maintaining high availability.

Code Example

The before code has no health checks, so the orchestrator cannot detect if the service is stuck or not ready. The after code adds two endpoints: /health/liveness to confirm the service is alive, and /health/readiness to indicate if it is ready to serve traffic. The readiness endpoint returns 503 until the service finishes initialization, preventing traffic routing prematurely.

Microservices

### Before: No probes, service always assumed healthy
from flask import Flask
app = Flask(__name__)

@app.route('/')
def home():
    return 'Hello World'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)


### After: Adding liveness and readiness endpoints
from flask import Flask, jsonify
app = Flask(__name__)

service_ready = False

@app.route('/')
def home():
    if not service_ready:
        return 'Service not ready', 503
    return 'Hello World'

@app.route('/health/liveness')
def liveness():
    # Check if service process is alive
    return jsonify(status='alive')

@app.route('/health/readiness')
def readiness():
    # Check if service is ready to serve traffic
    if service_ready:
        return jsonify(status='ready')
    else:
        return jsonify(status='not ready'), 503

# Simulate readiness after some initialization
import threading, time
def set_ready():
    global service_ready
    time.sleep(5)  # simulate startup delay
    service_ready = True
threading.Thread(target=set_ready).start()

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

OutputSuccess

Alternatives

External Health Monitoring

Uses an external system to check service health rather than internal probes.

Use when: When you want centralized monitoring across many services and infrastructure, or when probes cannot be embedded in the service.

Circuit Breaker

Prevents calls to failing services based on error rates rather than probing service health directly.

Use when: When you want to protect clients from cascading failures rather than managing service lifecycle.

Summary

Liveness and readiness probes prevent routing traffic to unhealthy or unready services and enable automatic recovery.

They improve system reliability by detecting failures early and avoiding downtime.

Proper configuration and understanding of their differences are essential for effective use.

Practice

(1/5)

1. What is the main purpose of a liveness probe in microservices?

easy

A. To check if the service is ready to accept traffic

B. To log user requests for debugging

C. To monitor the network latency between services

D. To check if the service is alive and restart it if it is not

Liveness and readiness probes in Microservices - System Design Guide

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of liveness probes

Step 2: Differentiate from readiness probes

Final Answer:

Quick Check:

Solution

Step 1: Identify readiness probe syntax

Step 2: Confirm correct fields and indentation

Final Answer:

Quick Check:

Solution

Step 1: Understand readiness probe failure effect

Step 2: Differentiate from liveness probe effect

Final Answer:

Quick Check:

Solution

Step 1: Identify cause of restarts

Step 2: Adjust probe timing to avoid false failures

Final Answer:

Quick Check:

Solution

Step 1: Prevent unnecessary restarts during initialization

Step 2: Use readiness probe to block traffic until ready

Final Answer:

Quick Check: