Microservicessystem_design~7 mins

Health check pattern in Microservices - System Design Guide

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Problem Statement

When a microservice fails or becomes unresponsive, other services or load balancers may continue sending requests to it, causing errors and degraded user experience. Without a way to verify if a service is healthy, the system cannot automatically detect failures or reroute traffic, leading to downtime and cascading failures.

Solution

The health check pattern solves this by having each service expose a simple endpoint that reports its current status. Other components periodically call this endpoint to verify if the service is alive and functioning. If the health check fails, the service is marked unhealthy and traffic is redirected away until it recovers.

Architecture

Client

→Load Balancer

↓

Health Check

Monitor

This diagram shows a client sending requests through a load balancer to a microservice. A health check monitor periodically queries the microservice's health endpoint to determine its status and informs the load balancer to route traffic accordingly.

Trade-offs

✓ Pros

→

Enables automatic detection of unhealthy services to improve system reliability.

→

Allows load balancers and orchestrators to reroute traffic away from failing instances.

→

Simple to implement with minimal overhead on services.

→

Supports graceful recovery and reduces downtime.

✗ Cons

→

Requires additional infrastructure or monitoring components to poll health endpoints.

→

Health checks may not detect all types of failures, such as degraded performance.

→

Improper health check design can cause false positives or negatives, affecting routing.

Use when running multiple microservice instances behind load balancers or orchestrators, especially at scale above 100 requests per second or when high availability is critical.

Avoid in very simple or single-instance services where failure detection and rerouting are unnecessary or add complexity without benefit.

Real World Examples

Netflix

Netflix uses health checks to monitor microservices in its streaming platform, enabling automatic failover and traffic rerouting to healthy instances to maintain uninterrupted playback.

Uber

Uber employs health checks in its microservice architecture to detect service failures quickly and prevent cascading outages during high-demand periods.

Amazon

Amazon uses health checks in its AWS Elastic Load Balancer to route traffic only to healthy EC2 instances, ensuring reliable service delivery.

Code Example

The before code lacks any endpoint to report service health, so external systems cannot verify if the service is alive. The after code adds a '/health' endpoint that returns a simple JSON status. Load balancers or monitors can call this endpoint to check if the service is healthy and route traffic accordingly.

Microservices

### Before: No health check endpoint
from flask import Flask
app = Flask(__name__)

@app.route('/')
def home():
    return 'Hello World'


### After: Adding a health check endpoint
from flask import Flask, jsonify
app = Flask(__name__)

@app.route('/')
def home():
    return 'Hello World'

@app.route('/health')
def health_check():
    # Simple health check returning service status
    status = {'status': 'healthy'}
    return jsonify(status), 200

OutputSuccess

Alternatives

Circuit Breaker

Circuit breaker stops requests to a failing service after detecting repeated failures, while health check proactively monitors service status.

Use when: Use circuit breaker when you want to prevent cascading failures by stopping calls after failures, especially for transient faults.

Service Mesh

Service mesh provides built-in health checking and traffic management at the network layer, abstracting health checks from application code.

Use when: Choose service mesh when you need advanced traffic control and observability across many microservices.

Summary

Health check pattern prevents sending requests to failed or unresponsive microservices by verifying their status regularly.

It works by exposing a simple endpoint that external systems poll to determine service health and reroute traffic accordingly.

This pattern improves system reliability and availability, especially in large-scale microservice architectures.

Practice

(1/5)

1. What is the main purpose of the health check pattern in microservices?

easy

A. To regularly verify if a service is running and responsive

B. To increase the size of the service database

C. To encrypt communication between services

D. To deploy new versions of the service automatically

Health check pattern in Microservices - System Design Guide

Start learning this pattern below

Practice

Solution

Step 1: Understand the health check pattern purpose

Step 2: Identify the correct purpose among options

Final Answer:

Quick Check:

Solution

Step 1: Identify typical health check endpoint behavior

Step 2: Match this behavior with the options

Final Answer:

Quick Check:

Solution

Step 1: Analyze the condition in the healthCheck function

Step 2: Evaluate the given scenario

Final Answer:

Quick Check:

Solution

Step 1: Understand monitoring tool expectations

Step 2: Identify the issue with the current implementation

Final Answer:

Quick Check:

Solution

Step 1: Understand health check pattern for dependencies

Step 2: Evaluate the options for best practice

Final Answer:

Quick Check: