Bird
Raised Fist0
HLDsystem_design~7 mins

Saga pattern for distributed transactions in HLD - System Design Guide

Choose your learning style9 modes available
Problem Statement
When a business process spans multiple microservices, a failure in one service can leave the system in an inconsistent state because traditional database transactions cannot span multiple services. This causes partial updates and data corruption, breaking the reliability of the system.
Solution
The Saga pattern breaks a distributed transaction into a sequence of smaller, local transactions in each service. Each local transaction publishes an event or triggers the next step. If a step fails, compensating transactions undo the previous steps to restore consistency without locking resources across services.
Architecture
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│ Service A   │─────▶│ Service B   │─────▶│ Service C   │
│ (Tx 1)      │      │ (Tx 2)      │      │ (Tx 3)      │
└─────┬───────┘      └─────┬───────┘      └─────┬───────┘
      │                    │                    │
      │                    │                    │
      │                    │                    │
      │                    │                    │
      │                    │                    │
      ▼                    ▼                    ▼
Compensate A◀─────────Compensate B◀─────────Compensate C
(Undo Tx 1)            (Undo Tx 2)            (Undo Tx 3)

This diagram shows a sequence of services executing local transactions in order. If any transaction fails, compensating transactions run in reverse order to undo previous changes and maintain consistency.

Trade-offs
✓ Pros
Enables eventual consistency across distributed services without locking resources.
Improves system availability by avoiding distributed locks and two-phase commits.
Supports failure recovery through compensating transactions.
Fits well with event-driven microservices architectures.
✗ Cons
Requires careful design of compensating transactions which can be complex.
Increases overall transaction latency due to asynchronous steps.
Makes debugging and monitoring more challenging because of distributed state.
Use when business processes span multiple microservices with independent databases and strong consistency is not required immediately but eventual consistency is acceptable. Suitable for systems with medium to high transaction volumes.
Avoid when strict ACID transactions are mandatory or when compensating transactions cannot be reliably implemented. Also not suitable for very simple systems with single database transactions.
Real World Examples
Amazon
Amazon uses the Saga pattern to manage order processing across inventory, payment, and shipping microservices, ensuring eventual consistency without locking resources.
Uber
Uber applies Saga to coordinate ride booking steps across services like driver assignment, payment, and notifications, handling failures gracefully.
Netflix
Netflix uses Saga to manage distributed transactions in their microservices for user subscriptions and billing, allowing independent service scaling.
Alternatives
Two-Phase Commit (2PC)
2PC uses a coordinator to lock resources and commit or rollback all services atomically, blocking resources during the transaction.
Use when: Choose 2PC when strict atomicity is required and the system can tolerate blocking and lower availability.
Eventual Consistency with Event Sourcing
Event sourcing stores all changes as events and rebuilds state from them, focusing on auditability rather than compensations.
Use when: Choose event sourcing when audit trails and replayability are priorities over immediate consistency.
Summary
The Saga pattern manages distributed transactions by splitting them into local transactions with compensations.
It avoids locking resources and supports eventual consistency across microservices.
Compensating transactions undo previous steps if failures occur, maintaining system reliability.