Microservicessystem_design~25 mins

Two-phase commit (and why to avoid it) in Microservices - System Design Exercise

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Design: Distributed Transaction Management with Two-Phase Commit

Design focuses on coordinating distributed transactions using two-phase commit protocol and exploring its drawbacks in microservices. Out of scope are alternative transaction models like event sourcing or saga pattern implementations.

Functional Requirements

FR1: Ensure atomicity of transactions across multiple microservices

FR2: Guarantee all-or-nothing commit for distributed operations

FR3: Handle failures during transaction commit or rollback

FR4: Support concurrent transactions without data corruption

Non-Functional Requirements

NFR1: System must handle up to 1000 distributed transactions per second

NFR2: Transaction commit latency should be under 500ms in normal conditions

NFR3: Availability target of 99.9% uptime

NFR4: Microservices are independently deployable and scalable

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

Key Components

Transaction coordinator service

Microservice participants with prepare and commit endpoints

Persistent logs for transaction state

Timeout and retry mechanisms

Design Patterns

Two-phase commit protocol

Distributed locking

Timeout and failure recovery

Alternatives like Saga pattern for eventual consistency

Reference Architecture

                +---------------------+
                | Transaction         |
                | Coordinator Service  |
                +----------+----------+
                           |
          Prepare / Commit  | 2PC Protocol
                           |
    +----------------------+----------------------+
    |                      |                      |
+---v---+              +---v---+              +---v---+
|Service|              |Service|              |Service|
|  A    |              |  B    |              |  C    |
+-------+              +-------+              +-------+

Components

Transaction Coordinator Service

Stateless service with persistent transaction log (e.g., PostgreSQL or distributed consensus store)

Manages transaction lifecycle: sends prepare requests, collects votes, decides commit or abort, and instructs participants

Microservice Participants

REST/gRPC endpoints with local database support

Execute prepare phase by locking resources and validating, then commit or rollback based on coordinator's decision

Persistent Transaction Log

Durable storage like relational DB or distributed consensus system

Stores transaction states to recover from failures and ensure durability

Timeout and Retry Mechanism

Built-in coordinator logic with timers

Detects participant failures or network issues and triggers abort or recovery procedures

Request Flow

1. Client sends distributed transaction request to Transaction Coordinator.

2. Coordinator sends 'prepare' request to all participant microservices.

3. Each participant tries to lock resources and validate transaction, then replies 'vote commit' or 'vote abort'.

4. Coordinator collects all votes; if all vote commit, sends 'commit' command; otherwise sends 'abort'.

5. Participants commit or rollback changes accordingly and acknowledge completion.

6. Coordinator marks transaction as complete in persistent log and responds to client.

Database Schema

Entities: - Transaction: transaction_id (PK), status (PREPARED, COMMITTED, ABORTED), timestamp - Participant: participant_id (PK), transaction_id (FK), vote (COMMIT, ABORT), status (PREPARED, COMMITTED, ABORTED) Relationships: - One Transaction has many Participants - Participant records votes and status for recovery and coordination

Scaling Discussion

Bottlenecks

Transaction Coordinator becomes a single point of failure and bottleneck under high load.

Participants lock resources during prepare phase, reducing concurrency and increasing latency.

Network delays or failures cause blocking and long transaction times.

Coordinator waiting for slow or failed participants delays entire transaction.

Increased complexity and coupling reduce microservices independence.

Solutions

Use leader election and replication for coordinator to improve availability and load distribution.

Optimize participant locking strategies and reduce transaction scope to minimize lock duration.

Implement timeouts and failure detection to abort stalled transactions quickly.

Consider alternative patterns like Saga for eventual consistency to avoid blocking.

Partition transactions to reduce cross-service dependencies and improve scalability.

Interview Tips

Time: Spend 10 minutes explaining two-phase commit protocol and its steps, 10 minutes discussing drawbacks and failure scenarios, 10 minutes proposing alternatives and scaling strategies, and 15 minutes answering questions and clarifying trade-offs.

Explain how two-phase commit ensures atomicity across distributed services.

Discuss the blocking problem and impact on availability and latency.

Highlight the coordinator as a potential bottleneck and single point of failure.

Mention real-world challenges like network partitions and participant crashes.

Suggest alternatives like Saga pattern for better scalability and resilience.

Practice

(1/5)

1. What is the main purpose of the two-phase commit protocol in microservices?

easy

A. To automatically retry failed requests

B. To speed up communication between services

C. To allow services to work independently without coordination

D. To ensure all services agree on a transaction before committing

Two-phase commit (and why to avoid it) in Microservices - System Design Exercise

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of two-phase commit

Step 2: Identify the main goal in microservices

Final Answer:

Quick Check:

Solution

Step 1: Recall the two phases names and order

Step 2: Understand the commit phase

Final Answer:

Quick Check:

Solution

Step 1: Analyze failure during prepare phase

Step 2: Understand coordinator's action

Final Answer:

Quick Check:

Solution

Step 1: Identify cause of delays and hangs

Step 2: Understand impact of crashed services

Final Answer:

Quick Check:

Solution

Step 1: Understand drawbacks of two-phase commit

Step 2: Recognize why modern systems avoid it

Final Answer:

Quick Check: