0
0
HLDsystem_design~25 mins

Health check endpoints in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Health Check Endpoints
Design health check endpoints for a microservice or web service. Out of scope: full monitoring system, alerting, or remediation automation.
Functional Requirements
FR1: Provide a simple HTTP endpoint to report the health status of the service
FR2: Support basic liveness check to confirm the service is running
FR3: Support readiness check to confirm the service is ready to handle requests
FR4: Include dependency checks such as database connectivity and external API availability
FR5: Return clear status codes and messages for monitoring systems
FR6: Allow configurable thresholds for health criteria
FR7: Ensure minimal performance impact on the main service
FR8: Support integration with container orchestration systems like Kubernetes
Non-Functional Requirements
NFR1: Must respond within 100ms under normal load
NFR2: Availability target of 99.9% uptime for health endpoints
NFR3: Handle up to 1000 health check requests per second
NFR4: Endpoints must be secure and not expose sensitive information
NFR5: Health checks should not cause side effects or modify data
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
HTTP server to expose health endpoints
Dependency check modules (database, cache, external APIs)
Configuration management for thresholds and checks
Logging and metrics for health check calls
Security layer for endpoint access control
Design Patterns
Circuit breaker pattern for dependency health
Timeout and retry strategies for dependency checks
Caching health check results to reduce load
Separation of liveness and readiness endpoints
Use of standard health check response formats (e.g., JSON)
Reference Architecture
  +-------------------+       +---------------------+
  |                   |       |                     |
  |   Client/Monitor   | <---> |  Health Check API    |
  |                   |       |  (Liveness & Ready)  |
  +-------------------+       +----------+----------+
                                         |
                                         v
                          +-----------------------------+
                          | Dependency Check Modules     |
                          | - Database connectivity      |
                          | - External API availability  |
                          | - Cache health               |
                          +-----------------------------+
                                         |
                                         v
                          +-----------------------------+
                          | Configuration & Thresholds  |
                          +-----------------------------+
Components
Health Check API
HTTP REST endpoint
Expose /health/liveness and /health/readiness endpoints
Dependency Check Modules
Custom code or libraries
Check status of database, cache, external services
Configuration Manager
Config files or environment variables
Manage thresholds and enable/disable checks
Security Layer
API Gateway or middleware
Control access to health endpoints
Logging and Metrics
Logging framework and metrics system
Record health check calls and results for monitoring
Request Flow
1. Client or monitoring system sends HTTP GET request to /health/liveness or /health/readiness
2. Health Check API receives request and triggers dependency checks
3. Dependency Check Modules verify connectivity and status of each dependency
4. Configuration Manager provides thresholds and check settings
5. Health Check API aggregates results and determines overall health status
6. API returns HTTP 200 OK with JSON payload if healthy, or HTTP 503 Service Unavailable if unhealthy
7. Logging and Metrics record the request and outcome for analysis
Database Schema
Not applicable as health check endpoints typically do not require persistent storage.
Scaling Discussion
Bottlenecks
High frequency of health check requests causing CPU or network load
Dependency checks causing latency if they are slow or blocking
Health check endpoints exposing sensitive information if not secured
Configuration changes requiring service restarts if not dynamic
Solutions
Cache health check results for short intervals to reduce repeated checks
Use asynchronous or non-blocking calls for dependency checks with timeouts
Implement access control or IP whitelisting for health endpoints
Support dynamic configuration reload without restarting the service
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying scope, 20 minutes designing components and data flow, 10 minutes discussing scaling and security considerations, 5 minutes summarizing.
Explain difference between liveness and readiness checks
Discuss importance of dependency checks and thresholds
Highlight performance and security considerations
Describe how health checks integrate with orchestration tools
Mention caching and asynchronous checks to improve scalability