Bird
Raised Fist0
Microservicessystem_design~25 mins

Circuit breaker pattern in Microservices - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Circuit Breaker Pattern Implementation in Microservices
Design the circuit breaker component integrated with microservices communication. Out of scope: detailed microservice business logic, deployment infrastructure.
Functional Requirements
FR1: Prevent cascading failures when a dependent microservice is down or slow
FR2: Detect failures and stop requests to the failing service temporarily
FR3: Automatically retry requests after a cooldown period
FR4: Provide fallback responses when the dependent service is unavailable
FR5: Monitor and log circuit breaker state changes for observability
Non-Functional Requirements
NFR1: Handle up to 10,000 requests per second
NFR2: Fail fast with p99 latency under 100ms for circuit breaker checks
NFR3: Ensure 99.9% availability of the overall system
NFR4: Minimal added latency when circuit breaker is closed (normal operation)
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Circuit breaker middleware or client library
Health check and failure detection logic
Fallback handler for degraded responses
Metrics and logging system
Configuration management for thresholds and timeouts
Design Patterns
State machine for circuit breaker states (Closed, Open, Half-Open)
Timeout and retry policies
Bulkhead pattern to isolate failures
Fallback pattern for graceful degradation
Observer pattern for monitoring state changes
Reference Architecture
Client Service
   |
   |---> Circuit Breaker Middleware ---+---> Dependent Service
                                       |
                                       +---> Fallback Handler

Monitoring System <--- Circuit Breaker Logs & Metrics
Components
Circuit Breaker Middleware
Custom library or framework integration (e.g., Resilience4j, Hystrix)
Intercepts calls to dependent services, tracks failures, and controls request flow based on circuit state
Dependent Service
Any microservice (REST/gRPC)
Service that may fail or become slow, triggering circuit breaker
Fallback Handler
Code module or service
Provides alternative responses when circuit breaker is open
Monitoring System
Prometheus + Grafana or ELK stack
Collects metrics and logs from circuit breaker for alerting and analysis
Request Flow
1. Client Service sends request to Dependent Service through Circuit Breaker Middleware.
2. Circuit Breaker Middleware checks current state: if Closed, forwards request.
3. If request fails or times out, middleware increments failure count.
4. If failures exceed threshold, circuit breaker state changes to Open; further requests are blocked.
5. When Open, requests are immediately sent to Fallback Handler for alternative response.
6. After cooldown period, circuit breaker enters Half-Open state and allows limited test requests.
7. If test requests succeed, circuit breaker closes; if they fail, it reopens.
8. All state changes and metrics are logged and sent to Monitoring System.
Database Schema
No persistent database required for circuit breaker state; state is kept in-memory per instance or shared via distributed cache (e.g., Redis) if needed for cluster-wide state.
Scaling Discussion
Bottlenecks
Single instance circuit breaker state causing inconsistent behavior in multi-instance deployments
High request volume causing overhead in failure tracking and state management
Monitoring system overwhelmed by large volume of circuit breaker events
Solutions
Use distributed cache or coordination service (e.g., Redis, ZooKeeper) to share circuit breaker state across instances
Optimize circuit breaker implementation for low latency and asynchronous failure tracking
Aggregate and sample metrics before sending to monitoring to reduce load
Interview Tips
Time: 10 minutes to clarify requirements and constraints, 20 minutes to design architecture and data flow, 10 minutes to discuss scaling and trade-offs, 5 minutes for questions
Explain the problem of cascading failures and how circuit breaker prevents them
Describe the three states of the circuit breaker and transitions
Discuss fallback strategies and their importance for user experience
Mention how to monitor and alert on circuit breaker events
Address scaling challenges and solutions for distributed microservices

Practice

(1/5)
1. What is the primary purpose of the circuit breaker pattern in microservices?
easy
A. To prevent repeated calls to a failing service and improve system stability
B. To increase the speed of database queries
C. To encrypt communication between services
D. To balance load evenly across servers

Solution

  1. Step 1: Understand the problem circuit breaker solves

    The circuit breaker pattern stops calls to a failing service to avoid cascading failures.
  2. Step 2: Identify the main benefit

    This pattern improves system stability by preventing repeated failures and allowing recovery.
  3. Final Answer:

    To prevent repeated calls to a failing service and improve system stability -> Option A
  4. Quick Check:

    Circuit breaker purpose = prevent repeated failing calls [OK]
Hint: Circuit breaker stops calls to failing services fast [OK]
Common Mistakes:
  • Confusing circuit breaker with load balancing
  • Thinking it speeds up database queries
  • Assuming it encrypts data
2. Which of the following correctly represents the three states of a circuit breaker?
easy
A. START, STOP, PAUSE
B. ACTIVE, INACTIVE, PENDING
C. CLOSED, OPEN, HALF_OPEN
D. ON, OFF, WAIT

Solution

  1. Step 1: Recall circuit breaker states

    The circuit breaker has three states: CLOSED (normal), OPEN (blocking calls), HALF_OPEN (testing recovery).
  2. Step 2: Match states to options

    Only CLOSED, OPEN, HALF_OPEN lists these exact states.
  3. Final Answer:

    CLOSED, OPEN, HALF_OPEN -> Option C
  4. Quick Check:

    States = CLOSED, OPEN, HALF_OPEN [OK]
Hint: Remember states as Closed, Open, Half-Open [OK]
Common Mistakes:
  • Mixing up state names with unrelated terms
  • Using generic terms like ON/OFF
  • Forgetting the HALF_OPEN state
3. Consider this pseudocode for a circuit breaker:
if state == 'OPEN':
  return 'fail fast'
elif state == 'HALF_OPEN':
  if test_call_successful():
    state = 'CLOSED'
  else:
    state = 'OPEN'
else:
  call_service()
What happens when the circuit breaker is in HALF_OPEN state and the test call fails?
medium
A. The state changes to CLOSED and service calls continue
B. The state remains HALF_OPEN and retries immediately
C. The service call is ignored without state change
D. The state changes back to OPEN and calls are blocked

Solution

  1. Step 1: Analyze HALF_OPEN state logic

    In HALF_OPEN, a test call checks if the service recovered. If it fails, the state changes to OPEN.
  2. Step 2: Understand consequence of failure

    Changing to OPEN blocks further calls to prevent overload.
  3. Final Answer:

    The state changes back to OPEN and calls are blocked -> Option D
  4. Quick Check:

    HALF_OPEN fail -> OPEN state [OK]
Hint: Failed test call in HALF_OPEN resets to OPEN [OK]
Common Mistakes:
  • Assuming state changes to CLOSED on failure
  • Thinking retries happen immediately in HALF_OPEN
  • Ignoring state changes on test failure
4. A developer implemented a circuit breaker but notices it never transitions from OPEN to HALF_OPEN. What is the most likely cause?
medium
A. The timeout to switch from OPEN to HALF_OPEN is missing or too long
B. The service calls are always successful
C. The circuit breaker is stuck in CLOSED state
D. The test call in HALF_OPEN always succeeds

Solution

  1. Step 1: Understand OPEN to HALF_OPEN transition

    The circuit breaker moves from OPEN to HALF_OPEN after a timeout period to test recovery.
  2. Step 2: Identify cause of no transition

    If the timeout is missing or set too long, the breaker stays OPEN indefinitely.
  3. Final Answer:

    The timeout to switch from OPEN to HALF_OPEN is missing or too long -> Option A
  4. Quick Check:

    Missing timeout blocks OPEN -> HALF_OPEN transition [OK]
Hint: Check timeout settings for OPEN to HALF_OPEN switch [OK]
Common Mistakes:
  • Assuming success of service calls affects OPEN state
  • Confusing CLOSED and OPEN states
  • Ignoring timeout mechanism
5. You design a microservice system with a circuit breaker protecting a payment service. The circuit breaker trips (opens) after 5 failures within 1 minute and stays open for 2 minutes before trying again. What is the main tradeoff of setting the open duration too long?
hard
A. Long open duration improves user experience by retrying quickly
B. Long open duration reduces load on failing service but increases request failures for users
C. Long open duration causes the circuit breaker to never open
D. Long open duration increases the number of successful calls

Solution

  1. Step 1: Understand open duration effect

    A long open duration blocks calls longer, reducing load on the failing service.
  2. Step 2: Identify user impact

    While protecting the service, users experience more failures because calls are blocked longer.
  3. Final Answer:

    Long open duration reduces load on failing service but increases request failures for users -> Option B
  4. Quick Check:

    Long open = less load, more user failures [OK]
Hint: Long open = safer service, worse user experience [OK]
Common Mistakes:
  • Thinking long open improves user experience
  • Assuming circuit breaker never opens with long duration
  • Believing long open increases successful calls