Microservicessystem_design~25 mins

Health check pattern in Microservices - System Design Exercise

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Design: Microservices Health Check System

Design focuses on health check pattern implementation in a microservices environment including health endpoints, aggregation, and monitoring integration. Out of scope are detailed alerting rules and remediation automation.

Functional Requirements

FR1: Each microservice must expose a health check endpoint.

FR2: Health checks should verify service dependencies like databases and external APIs.

FR3: The system should aggregate health status of all microservices.

FR4: Health status must be accessible for monitoring tools and alerting systems.

FR5: Health checks should be lightweight and fast to avoid overhead.

FR6: Support both readiness and liveness probes for container orchestration.

Non-Functional Requirements

NFR1: Handle up to 100 microservices in the system.

NFR2: Health check response time should be under 200ms (p99).

NFR3: System availability target is 99.9% uptime.

NFR4: Health check endpoints must not cause side effects or heavy load.

NFR5: Support secure access to health endpoints to prevent unauthorized use.

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

Key Components

Health check endpoints in each microservice

Health aggregator service or dashboard

Monitoring and alerting tools integration

Service registry or discovery for locating services

Authentication and authorization for health endpoints

Design Patterns

Health check pattern (liveness and readiness probes)

Circuit breaker pattern for dependency checks

Bulkhead pattern to isolate failures

Push vs pull model for health data aggregation

Reference Architecture

                    +---------------------+
                    |  Monitoring System   |
                    +----------+----------+
                               |
                               | Pull aggregated health data
                               v
                    +---------------------+
                    | Health Aggregator    |
                    | (Service or Dashboard)|
                    +----------+----------+
                               |
               +---------------+---------------+
               |                               |
       +-------v-------+               +-------v-------+
       | Microservice 1|               | Microservice N|
       | +-----------+ |               | +-----------+ |
       | | Health    | |               | | Health    | |
       | | Endpoint  | |               | | Endpoint  | |
       | +-----------+ |               | +-----------+ |
       +---------------+               +---------------+

Components

Microservice Health Endpoint

HTTP REST endpoint (e.g., /health, /ready, /live)

Expose health status of the microservice and its dependencies.

Health Aggregator

Custom service or monitoring dashboard

Collect and aggregate health data from all microservices.

Monitoring System

Prometheus, Grafana, or similar

Visualize health status and trigger alerts on failures.

Service Registry

Consul, Eureka, or Kubernetes API

Discover microservice instances to query health endpoints.

Security Layer

API Gateway, OAuth2, mTLS

Protect health endpoints from unauthorized access.

Request Flow

1. 1. Each microservice exposes /health, /ready, and /live endpoints.

2. 2. Health endpoint checks internal status and dependencies (database, external APIs).

3. 3. Health Aggregator queries all microservices' health endpoints periodically.

4. 4. Aggregator compiles overall system health and exposes summary endpoint.

5. 5. Monitoring system pulls aggregated health data for visualization and alerting.

6. 6. Alerts are triggered if any microservice reports unhealthy status.

7. 7. Security layer ensures only authorized systems can access health endpoints.

Database Schema

No persistent database required specifically for health checks. Health status is ephemeral and queried live. If storing history, a simple schema with tables: Services(service_id, name, last_checked), HealthStatus(service_id, timestamp, status, details) can be used.

Scaling Discussion

Bottlenecks

Health Aggregator becomes a bottleneck when querying many services frequently.

Network overhead from frequent health checks impacts service performance.

Security layer adds latency if not optimized.

Monitoring system overwhelmed with too many health metrics.

Solutions

Implement caching and rate limiting in Health Aggregator to reduce load.

Use asynchronous or event-driven health reporting (push model) to reduce polling.

Optimize security with lightweight tokens or mutual TLS to reduce handshake overhead.

Aggregate metrics at microservice level before sending to monitoring to reduce volume.

Interview Tips

Time: Spend 10 minutes understanding requirements and clarifying scope, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.

Explain difference between liveness and readiness probes.

Discuss importance of checking dependencies in health endpoints.

Describe how aggregation helps centralized monitoring.

Highlight security considerations for health endpoints.

Address scaling challenges and mitigation strategies.

Practice

(1/5)

1. What is the main purpose of the health check pattern in microservices?

easy

A. To regularly verify if a service is running and responsive

B. To increase the size of the service database

C. To encrypt communication between services

D. To deploy new versions of the service automatically

Health check pattern in Microservices - System Design Exercise

Start learning this pattern below

Practice

Solution

Step 1: Understand the health check pattern purpose

Step 2: Identify the correct purpose among options

Final Answer:

Quick Check:

Solution

Step 1: Identify typical health check endpoint behavior

Step 2: Match this behavior with the options

Final Answer:

Quick Check:

Solution

Step 1: Analyze the condition in the healthCheck function

Step 2: Evaluate the given scenario

Final Answer:

Quick Check:

Solution

Step 1: Understand monitoring tool expectations

Step 2: Identify the issue with the current implementation

Final Answer:

Quick Check:

Solution

Step 1: Understand health check pattern for dependencies

Step 2: Evaluate the options for best practice

Final Answer:

Quick Check: