0
0
Microservicessystem_design~7 mins

Circuit breaker pattern in Microservices - System Design Guide

Choose your learning style9 modes available
Problem Statement
When a microservice calls another service that is slow or down, the calling service waits too long or fails repeatedly, causing cascading failures and degraded user experience across the system.
Solution
The circuit breaker monitors calls to a service and stops requests when failures exceed a threshold. It quickly fails requests instead of waiting, then periodically tests if the service has recovered before resuming normal calls.
Architecture
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Client        │──────▶│ Circuit Breaker│──────▶│ Downstream    │
│ (Caller)      │       │ (Monitor &    │       │ Service       │
│               │       │  Control)     │       │               │
└───────────────┘       └───────────────┘       └───────────────┘
       ▲                      │  ▲                     │
       │                      │  │                     │
       │                      │  └───── Failure count ─┘
       │                      │
       └───────── Success / Failure feedback ──────────┘

This diagram shows the client calling a downstream service through a circuit breaker that monitors success and failure to decide whether to allow or block requests.

Trade-offs
✓ Pros
Prevents cascading failures by stopping calls to failing services quickly.
Improves system stability and responsiveness under partial outages.
Allows automatic recovery by periodically testing service health.
✗ Cons
Adds complexity to service communication logic.
Requires tuning thresholds and timeouts to avoid false positives or negatives.
May cause temporary denial of service if the breaker trips incorrectly.
Use when your system has multiple microservices with network calls that can fail or become slow, especially at scales above hundreds of requests per second where failures impact user experience.
Avoid if your service calls are always local or guaranteed reliable, or if your traffic is very low (under 100 requests per second) where failure impact is minimal.
Real World Examples
Netflix
Netflix uses circuit breakers in its microservices to prevent failures in one service from cascading and causing widespread outages during high traffic events.
Amazon
Amazon applies circuit breakers to isolate failing downstream services during peak shopping times to maintain overall system responsiveness.
Uber
Uber uses circuit breakers to handle unreliable third-party APIs and internal services, ensuring degraded but stable user experience.
Code Example
The before code calls the external service directly, risking cascading failures. The after code wraps the call in a CircuitBreaker class that tracks failures and blocks calls when failures exceed a threshold, allowing recovery testing after a timeout.
Microservices
### Before: No circuit breaker, direct call
class ServiceClient:
    def call_service(self):
        response = external_service_request()
        return response


### After: Circuit breaker applied
import time

class CircuitBreaker:
    def __init__(self, failure_threshold=3, recovery_time=10):
        self.failure_threshold = failure_threshold
        self.recovery_time = recovery_time
        self.failure_count = 0
        self.last_failure_time = None
        self.state = 'CLOSED'  # CLOSED, OPEN, HALF_OPEN

    def call(self, func, *args, **kwargs):
        if self.state == 'OPEN':
            if time.time() - self.last_failure_time > self.recovery_time:
                self.state = 'HALF_OPEN'
            else:
                raise Exception('Circuit breaker is OPEN')

        try:
            result = func(*args, **kwargs)
        except Exception:
            self.failure_count += 1
            self.last_failure_time = time.time()
            if self.failure_count >= self.failure_threshold:
                self.state = 'OPEN'
            raise
        else:
            self.failure_count = 0
            if self.state == 'HALF_OPEN':
                self.state = 'CLOSED'
            return result


class ServiceClientWithCB:
    def __init__(self):
        self.circuit_breaker = CircuitBreaker()

    def call_service(self):
        return self.circuit_breaker.call(external_service_request)


# external_service_request is a placeholder for the actual call
OutputSuccess
Alternatives
Retry pattern
Retries failed requests a fixed number of times before failing, without blocking calls.
Use when: Use when failures are transient and quick retries can succeed without risking cascading failures.
Bulkhead pattern
Isolates failures by partitioning resources so one failure does not affect others, rather than blocking calls.
Use when: Use when you want to limit failure impact by resource isolation instead of blocking requests.
Summary
Circuit breaker pattern prevents cascading failures by stopping calls to failing services after repeated errors.
It improves system stability by quickly failing requests and testing service recovery before resuming calls.
This pattern is essential in microservices architectures with unreliable network calls at scale.