Overview - Circuit breaker pattern

What is it?

The circuit breaker pattern is a design approach used in software systems to prevent repeated failures when calling a service or resource that is currently unavailable or slow. It works like an electrical circuit breaker by stopping requests to a failing service to avoid wasting resources and to allow the service time to recover. When the service is healthy again, the circuit breaker allows requests to pass through. This helps systems stay responsive and stable.

Why it matters

Without the circuit breaker pattern, a failing service can cause cascading failures in a system, making the whole system slow or unresponsive. Repeatedly trying to call a broken service wastes resources and increases user wait times. The circuit breaker pattern protects the system by quickly detecting failures and stopping calls, improving overall reliability and user experience.

Where it fits

Before learning this, you should understand basic service communication and error handling in distributed systems. After this, you can explore related patterns like retry mechanisms, bulkheads, and fallback strategies to build resilient systems.

Mental Model

Core Idea

The circuit breaker pattern acts like a safety switch that stops calls to a failing service to prevent system overload and allows recovery before resuming calls.

Think of it like...

Imagine a fuse box in your home that cuts off electricity when there is a short circuit to prevent damage. Similarly, the circuit breaker stops requests to a failing service to protect the system.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Client Calls  │──────▶│ Circuit Breaker│──────▶│ Target Service│
└───────────────┘       └───────┬───────┘       └───────────────┘
                                │
                                │
                                ▼
                      ┌─────────────────────┐
                      │  Open State: Block   │
                      │  Calls to Service    │
                      └─────────────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding service failures

Concept: Services can fail or become slow, causing problems for callers.

In distributed systems, services communicate over networks. Sometimes, a service might crash, become slow, or be unreachable. If callers keep trying without control, it wastes time and resources.

Result

Recognizing that uncontrolled retries to failing services cause system-wide slowdowns or crashes.

Understanding that failures are normal and need handling prevents naive designs that worsen problems.

2

FoundationBasic error handling and retries

3

IntermediateCircuit breaker states explained

4

IntermediateFailure thresholds and timeouts

5

IntermediateFallback strategies with circuit breakers

6

AdvancedDistributed circuit breakers and coordination

7

ExpertSurprising effects and tuning challenges

Under the Hood

The circuit breaker tracks recent call outcomes in memory or storage. It counts failures and successes within a sliding window. When failures exceed a threshold, it switches to open state, blocking calls immediately. After a timeout, it allows limited test calls (half-open) to check service health. Internally, it uses timers, counters, and state machines to manage transitions and enforce blocking or allowing calls.

Why designed this way?

The pattern mimics electrical circuit breakers to protect systems from cascading failures. Early designs retried blindly, causing overload. The circuit breaker adds control and feedback to avoid wasting resources. It balances availability and safety by allowing test calls after recovery time. Alternatives like simple retries or timeouts were insufficient to prevent system-wide slowdowns.

┌───────────────┐
│   Client      │
└──────┬────────┘
       │ Calls
       ▼
┌───────────────┐
│Circuit Breaker│
│  States:      │
│  Closed      ◀─────┐
│  Open        │     │
│  Half-Open   │     │
└──────┬────────┘     │
       │              │
       ▼              │
┌───────────────┐     │
│ Target Service│─────┘
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does the circuit breaker stop all calls immediately after one failure? Commit yes or no.

Common Belief:The circuit breaker trips and blocks calls after a single failure.

Tap to reveal reality

Quick: Is the circuit breaker a replacement for retries? Commit yes or no.

Common Belief:Circuit breakers replace the need for retries in failure handling.

Tap to reveal reality

Quick: Do all clients share the same circuit breaker state by default? Commit yes or no.

Common Belief:Circuit breaker state is shared globally across all clients automatically.

Tap to reveal reality

Quick: Does opening the circuit breaker mean the service is permanently down? Commit yes or no.

Common Belief:Once open, the circuit breaker stays open until manually reset.

Tap to reveal reality

Expert Zone

1

Circuit breakers interact subtly with retries and timeouts; misalignment can cause retry storms or premature blocking.

2

Adaptive circuit breakers that adjust thresholds based on load and error patterns improve resilience but add complexity.

3

Monitoring circuit breaker metrics and integrating with alerting systems is crucial for proactive failure management.

When NOT to use

Avoid using circuit breakers for very fast, idempotent calls where overhead outweighs benefits. For simple, single-service apps, basic retries may suffice. Use bulkhead patterns or rate limiters when isolating failures or controlling load is more important.

Production Patterns

In production, circuit breakers are combined with retries, fallbacks, and bulkheads. They are implemented as libraries or middleware in service meshes and API gateways. Real systems tune thresholds dynamically and monitor breaker states to maintain system health.

Connections

Bulkhead pattern

Complementary pattern

Both patterns isolate failures but bulkheads isolate resources while circuit breakers isolate calls, together improving system resilience.

Retry pattern

Works alongside

Circuit breakers prevent retries from overwhelming failing services, making retries safer and more effective.

Electrical circuit breakers

Inspired by

Understanding electrical circuit breakers helps grasp the safety and protection goals behind the software pattern.

Common Pitfalls

#1Setting failure threshold too low causes frequent unnecessary blocking.

Wrong approach:CircuitBreaker(failureThreshold=1, timeout=5000)

Correct approach:CircuitBreaker(failureThreshold=5, timeout=5000)

Root cause:Misunderstanding that a single failure should not immediately open the breaker.

#2Not resetting the breaker after timeout keeps it open forever.

Wrong approach:CircuitBreaker opens and never moves to half-open state.

Correct approach:CircuitBreaker transitions to half-open after timeout to test service health.

Root cause:Ignoring the state machine and recovery mechanism.

#3Using circuit breaker without fallback causes poor user experience when open.

Wrong approach:Return error directly when breaker is open.

Correct approach:Return cached data or default response when breaker is open.

Root cause:Not planning for graceful degradation.

Key Takeaways

The circuit breaker pattern protects systems by stopping calls to failing services to prevent overload and cascading failures.

It uses a state machine with closed, open, and half-open states to control traffic based on service health.

Thresholds and timeouts balance sensitivity to failures and allow recovery testing.

Circuit breakers work best combined with retries and fallback strategies for resilient systems.

Tuning and monitoring circuit breakers are critical to avoid unintended blocking or overload.