Microservicessystem_design~25 mins

Liveness and readiness probes in Microservices - System Design Exercise

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Design: Microservices Health Check System with Liveness and Readiness Probes

Design focuses on the health check mechanism using liveness and readiness probes for microservices in container orchestration environments. It excludes detailed orchestration logic and deployment pipelines.

Functional Requirements

FR1: Detect if a microservice instance is alive and responsive (liveness probe).

FR2: Detect if a microservice instance is ready to serve traffic (readiness probe).

FR3: Automatically restart or remove unhealthy instances based on probe results.

FR4: Support configurable probe endpoints and intervals.

FR5: Integrate with container orchestration platforms like Kubernetes.

FR6: Minimize false positives to avoid unnecessary restarts or traffic routing.

Non-Functional Requirements

NFR1: System must handle at least 1000 microservice instances concurrently.

NFR2: Probe response time should be under 100ms to avoid delays in orchestration decisions.

NFR3: Availability target: 99.9% uptime for the health check system itself.

NFR4: Probes must not add significant load to microservices.

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Probe endpoints inside microservices (HTTP /healthz, /ready)

Container orchestration health check integration (e.g., Kubernetes probes)

Health check controller or manager

Logging and alerting for probe failures

Configuration management for probe parameters

Design Patterns

Health check pattern

Circuit breaker pattern for readiness

Retry and backoff strategies for transient failures

Sidecar pattern for external health monitoring

Graceful shutdown and startup hooks

Reference Architecture

                    +-------------------------+
                    |  Container Orchestrator  |
                    |  (e.g., Kubernetes)      |
                    +-----------+-------------+
                                |
                +---------------+----------------+
                |                                |
        +-------v-------+                +-------v-------+
        | Microservice 1 |                | Microservice 2 |
        | +-----------+ |                | +-----------+ |
        | | /healthz  | |<--- Liveness --| | /healthz  | |
        | | /ready    | |<--- Readiness-| | /ready    | |
        | +-----------+ |                | +-----------+ |
        +---------------+                +---------------+

Legend:
- Orchestrator calls /healthz to check if service is alive.
- Orchestrator calls /ready to check if service is ready to receive traffic.
- Based on probe results, orchestrator restarts or routes traffic accordingly.

Components

Microservice Probe Endpoints

HTTP REST endpoints

Expose /healthz for liveness and /ready for readiness checks.

Container Orchestrator Health Checks

Kubernetes Liveness and Readiness Probes

Periodically call probe endpoints to monitor service health and readiness.

Health Check Controller

Orchestrator internal component

Manage probe results, restart unhealthy pods, and update service routing.

Configuration Management

Config files or environment variables

Set probe intervals, timeouts, and failure thresholds.

Logging and Alerting

Centralized logging system (e.g., ELK stack)

Record probe failures and notify operators.

Request Flow

1. 1. Container orchestrator sends HTTP GET request to /healthz endpoint of a microservice instance.

2. 2. Microservice responds with 200 OK if alive; otherwise, returns error or no response.

3. 3. Orchestrator marks instance as unhealthy if liveness probe fails repeatedly and restarts it.

4. 4. Orchestrator sends HTTP GET request to /ready endpoint to check if instance is ready to serve traffic.

5. 5. Microservice responds with 200 OK if ready; otherwise, returns error or no response.

6. 6. Orchestrator routes traffic only to instances passing readiness probes.

7. 7. Configuration parameters control probe frequency, timeout, and failure thresholds.

8. 8. Logs and alerts are generated on probe failures for monitoring.

Database Schema

Not applicable as probes are stateless HTTP endpoints within microservices; health state is managed by orchestrator in-memory or via its internal state store.

Scaling Discussion

Bottlenecks

High number of probe requests causing load on microservices.

Delayed detection due to long probe intervals.

False positives from transient network issues.

Orchestrator overwhelmed by managing many probe results.

Probe endpoints causing resource contention inside microservices.

Solutions

Use lightweight probe endpoints that perform minimal checks to reduce load.

Tune probe intervals and failure thresholds to balance detection speed and stability.

Implement retries and backoff in orchestrator before marking failures.

Distribute health check load across orchestrator components or use sidecar proxies.

Isolate probe handling in microservices with dedicated threads or lightweight handlers.

Interview Tips

Time: Spend 10 minutes understanding probe concepts and requirements, 15 minutes designing the probe endpoints and orchestration integration, 10 minutes discussing scaling and failure handling, and 10 minutes for Q&A.

Explain difference between liveness and readiness probes clearly.

Discuss how probes help maintain system reliability and availability.

Describe how probe failures trigger orchestrator actions like restarts or traffic routing.

Mention configuration flexibility and tuning for different workloads.

Highlight strategies to avoid false positives and minimize probe overhead.

Discuss scaling challenges and solutions for large microservice deployments.

Practice

(1/5)

1. What is the main purpose of a liveness probe in microservices?

easy

A. To check if the service is ready to accept traffic

B. To log user requests for debugging

C. To monitor the network latency between services

D. To check if the service is alive and restart it if it is not

Liveness and readiness probes in Microservices - System Design Exercise

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of liveness probes

Step 2: Differentiate from readiness probes

Final Answer:

Quick Check:

Solution

Step 1: Identify readiness probe syntax

Step 2: Confirm correct fields and indentation

Final Answer:

Quick Check:

Solution

Step 1: Understand readiness probe failure effect

Step 2: Differentiate from liveness probe effect

Final Answer:

Quick Check:

Solution

Step 1: Identify cause of restarts

Step 2: Adjust probe timing to avoid false failures

Final Answer:

Quick Check:

Solution

Step 1: Prevent unnecessary restarts during initialization

Step 2: Use readiness probe to block traffic until ready

Final Answer:

Quick Check: