0
0
Microservicessystem_design~25 mins

Health check pattern in Microservices - System Design Exercise

Choose your learning style9 modes available
Design: Microservices Health Check System
Design focuses on health check pattern implementation in a microservices environment including health endpoints, aggregation, and monitoring integration. Out of scope are detailed alerting rules and remediation automation.
Functional Requirements
FR1: Each microservice must expose a health check endpoint.
FR2: Health checks should verify service dependencies like databases and external APIs.
FR3: The system should aggregate health status of all microservices.
FR4: Health status must be accessible for monitoring tools and alerting systems.
FR5: Health checks should be lightweight and fast to avoid overhead.
FR6: Support both readiness and liveness probes for container orchestration.
Non-Functional Requirements
NFR1: Handle up to 100 microservices in the system.
NFR2: Health check response time should be under 200ms (p99).
NFR3: System availability target is 99.9% uptime.
NFR4: Health check endpoints must not cause side effects or heavy load.
NFR5: Support secure access to health endpoints to prevent unauthorized use.
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Health check endpoints in each microservice
Health aggregator service or dashboard
Monitoring and alerting tools integration
Service registry or discovery for locating services
Authentication and authorization for health endpoints
Design Patterns
Health check pattern (liveness and readiness probes)
Circuit breaker pattern for dependency checks
Bulkhead pattern to isolate failures
Push vs pull model for health data aggregation
Reference Architecture
                    +---------------------+
                    |  Monitoring System   |
                    +----------+----------+
                               |
                               | Pull aggregated health data
                               v
                    +---------------------+
                    | Health Aggregator    |
                    | (Service or Dashboard)|
                    +----------+----------+
                               |
               +---------------+---------------+
               |                               |
       +-------v-------+               +-------v-------+
       | Microservice 1|               | Microservice N|
       | +-----------+ |               | +-----------+ |
       | | Health    | |               | | Health    | |
       | | Endpoint  | |               | | Endpoint  | |
       | +-----------+ |               | +-----------+ |
       +---------------+               +---------------+
Components
Microservice Health Endpoint
HTTP REST endpoint (e.g., /health, /ready, /live)
Expose health status of the microservice and its dependencies.
Health Aggregator
Custom service or monitoring dashboard
Collect and aggregate health data from all microservices.
Monitoring System
Prometheus, Grafana, or similar
Visualize health status and trigger alerts on failures.
Service Registry
Consul, Eureka, or Kubernetes API
Discover microservice instances to query health endpoints.
Security Layer
API Gateway, OAuth2, mTLS
Protect health endpoints from unauthorized access.
Request Flow
1. 1. Each microservice exposes /health, /ready, and /live endpoints.
2. 2. Health endpoint checks internal status and dependencies (database, external APIs).
3. 3. Health Aggregator queries all microservices' health endpoints periodically.
4. 4. Aggregator compiles overall system health and exposes summary endpoint.
5. 5. Monitoring system pulls aggregated health data for visualization and alerting.
6. 6. Alerts are triggered if any microservice reports unhealthy status.
7. 7. Security layer ensures only authorized systems can access health endpoints.
Database Schema
No persistent database required specifically for health checks. Health status is ephemeral and queried live. If storing history, a simple schema with tables: Services(service_id, name, last_checked), HealthStatus(service_id, timestamp, status, details) can be used.
Scaling Discussion
Bottlenecks
Health Aggregator becomes a bottleneck when querying many services frequently.
Network overhead from frequent health checks impacts service performance.
Security layer adds latency if not optimized.
Monitoring system overwhelmed with too many health metrics.
Solutions
Implement caching and rate limiting in Health Aggregator to reduce load.
Use asynchronous or event-driven health reporting (push model) to reduce polling.
Optimize security with lightweight tokens or mutual TLS to reduce handshake overhead.
Aggregate metrics at microservice level before sending to monitoring to reduce volume.
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying scope, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain difference between liveness and readiness probes.
Discuss importance of checking dependencies in health endpoints.
Describe how aggregation helps centralized monitoring.
Highlight security considerations for health endpoints.
Address scaling challenges and mitigation strategies.