Bird
Raised Fist0
HLDsystem_design~25 mins

Circuit breaker pattern in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Circuit Breaker Pattern Implementation
Design the circuit breaker pattern as a reusable component integrated with service calls. Out of scope: detailed implementation of external services or fallback logic.
Functional Requirements
FR1: Detect failures in calls to external services or components
FR2: Prevent repeated calls to failing services to avoid cascading failures
FR3: Automatically retry calls after a cooldown period
FR4: Provide fallback responses when the external service is unavailable
FR5: Support monitoring of circuit breaker state and metrics
Non-Functional Requirements
NFR1: Handle up to 10,000 requests per second
NFR2: Fail fast with p99 latency under 100ms for service calls
NFR3: Ensure availability of 99.9% for the main application
NFR4: Minimal added latency when circuit breaker is closed (normal operation)
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Circuit breaker state machine (Closed, Open, Half-Open)
Failure detection and counting mechanism
Timeout and retry scheduler
Fallback handler
Metrics and monitoring system
Design Patterns
State machine pattern for managing circuit states
Timeout and retry pattern
Bulkhead pattern to isolate failures
Fallback pattern for degraded responses
Observer pattern for monitoring state changes
Reference Architecture
Client Service
   |
   |---> Circuit Breaker Component ---+---> External Service
                                      |
                                      +---> Fallback Handler
                                      |
                                      +---> Metrics & Monitoring
Components
Circuit Breaker Component
In-memory state machine or distributed cache (e.g., Redis)
Track failure counts, manage states (Closed, Open, Half-Open), and decide if calls should proceed
External Service
Any third-party or internal service
Service being called which may fail or respond slowly
Fallback Handler
Custom code or default response generator
Provide alternative response when circuit is open
Metrics & Monitoring
Prometheus, Grafana, or similar
Collect and visualize circuit breaker state, failure rates, and latency
Request Flow
1. Client Service sends request to Circuit Breaker Component
2. Circuit Breaker checks current state:
3. - If Closed: forwards request to External Service
4. - If Open: immediately returns fallback response
5. - If Half-Open: allows limited requests to test service health
6. External Service responds or fails
7. Circuit Breaker updates failure/success counters based on response
8. If failures exceed threshold, Circuit Breaker transitions to Open state
9. After cooldown period, Circuit Breaker moves to Half-Open to test service
10. Metrics & Monitoring collects state changes and performance data
Database Schema
No persistent database required for core circuit breaker; uses in-memory or distributed cache to store: - CircuitBreakerState { service_id, state (Closed/Open/Half-Open), failure_count, last_failure_time, last_state_change_time } - Configuration { failure_threshold, timeout_duration, retry_interval } Relationships: One-to-one mapping between service_id and CircuitBreakerState
Scaling Discussion
Bottlenecks
Single instance circuit breaker state causing inconsistent behavior in distributed systems
High latency added by synchronous state checks
Memory overhead if many services or endpoints use circuit breakers
Delayed detection of failures due to slow failure count updates
Solutions
Use distributed cache (e.g., Redis) or shared state store for circuit breaker state to synchronize across instances
Implement asynchronous state updates and non-blocking calls to minimize added latency
Apply circuit breaker only to critical or high-risk services to reduce memory usage
Tune failure detection thresholds and use sliding windows for faster failure detection
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying failure scenarios, 20 minutes designing the architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain the purpose of the circuit breaker pattern to prevent cascading failures
Describe the three states and transitions clearly
Discuss how fallback responses improve user experience during failures
Highlight monitoring importance for operational visibility
Address scaling challenges in distributed environments and solutions