Microservicessystem_design~15 mins

Two-phase commit (and why to avoid it) in Microservices - Deep Dive

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Overview - Two-phase commit (and why to avoid it)

What is it?

Two-phase commit is a method used to make sure multiple systems agree on a change before it happens. It works in two steps: first, all systems say if they are ready to commit the change; second, if everyone agrees, the change is made permanent. This helps keep data consistent across different services. However, it can slow down systems and cause problems if one service fails.

Why it matters

Without two-phase commit or a similar method, different parts of a system might disagree about data changes, causing errors or lost information. For example, in a shopping app, payment might go through but the order might not be saved, confusing customers. Two-phase commit tries to prevent this by making sure all parts agree before finalizing changes.

Where it fits

Before learning two-phase commit, you should understand basic transactions and distributed systems concepts. After this, you can explore alternative methods like eventual consistency, saga patterns, and distributed consensus algorithms that handle data consistency in microservices better.

Mental Model

Core Idea

Two-phase commit is a handshake between systems to agree on a change before making it permanent, ensuring all or nothing happens.

Think of it like...

Imagine a group of friends deciding to buy a gift together. First, everyone says if they can pay their share (prepare phase). If all agree, they all pay and buy the gift (commit phase). If anyone says no, no one pays and the gift is not bought.

┌───────────────┐       ┌───────────────┐
│ Coordinator   │       │ Participant 1 │
│               │       │               │
│ 1. Prepare? ──┼──────▶│ 2. Vote Yes/No│
│               │       │               │
│ 3. Commit/Abort◀──────┤               │
└───────────────┘       └───────────────┘
         │                     ▲
         │                     │
         ▼                     │
  ┌───────────────┐            │
  │ Participant 2 │────────────┘
  │               │
  │ 2. Vote Yes/No│
  │ 3. Commit/Abort│
  └───────────────┘

Build-Up - 6 Steps

FoundationUnderstanding Transactions Basics

Concept: Introduce what a transaction is and why atomicity matters.

A transaction is a set of operations that must all succeed or all fail together. For example, transferring money from one bank account to another involves subtracting from one and adding to another. If one part fails, the whole transaction should fail to avoid errors.

Result

You understand that transactions keep data correct by making changes all at once or not at all.

Understanding atomicity is key because it sets the stage for why coordinating multiple systems is hard but necessary.

FoundationBasics of Distributed Systems

IntermediateHow Two-Phase Commit Works

IntermediateLimitations and Risks of Two-Phase Commit

AdvancedWhy Two-Phase Commit Is Often Avoided

ExpertAdvanced Internals and Failure Handling

Under the Hood

Two-phase commit works by having a coordinator send a prepare request to all participants. Each participant locks resources and votes yes or no. The coordinator collects votes; if all yes, it sends commit commands; otherwise, abort commands. Participants then finalize or rollback changes. This requires durable logs to remember votes and decisions in case of crashes.

Why designed this way?

It was designed to ensure atomicity across distributed systems before modern distributed consensus algorithms existed. The two phases separate agreement from execution to avoid partial commits. Alternatives like three-phase commit tried to fix blocking but added complexity. Today, simpler, more resilient patterns are preferred.

┌───────────────┐
│ Coordinator   │
│ 1. Send Prepare ──────────────┐
└───────────────┘               │
        │                       │
        ▼                       ▼
┌───────────────┐         ┌───────────────┐
│ Participant 1 │         │ Participant 2 │
│ 2. Vote Yes/No│         │ 2. Vote Yes/No│
└───────────────┘         └───────────────┘
        │                       │
        └─────────┬─────────────┘
                  ▼
         ┌─────────────────┐
         │ Coordinator     │
         │ 3. Commit/Abort │
         └─────────────────┘
                  │
          ┌───────┴────────┐
          ▼                ▼
┌───────────────┐   ┌───────────────┐
│ Participant 1 │   │ Participant 2 │
│ 4. Commit/Abort│   │ 4. Commit/Abort│
└───────────────┘   └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does two-phase commit guarantee no blocking even if a participant crashes? Commit yes or no.

Common Belief:Two-phase commit always prevents blocking and keeps the system responsive.

Tap to reveal reality

Quick: Is the coordinator in two-phase commit a fault-tolerant component? Commit yes or no.

Common Belief:The coordinator is fault-tolerant and cannot cause system failure.

Tap to reveal reality

Quick: Does two-phase commit scale well with many participants? Commit yes or no.

Common Belief:Two-phase commit scales easily to many services without performance loss.

Tap to reveal reality

Quick: Can two-phase commit handle network partitions gracefully? Commit yes or no.

Common Belief:Two-phase commit can handle network splits without data inconsistency.

Tap to reveal reality

Expert Zone

The coordinator's log durability is critical; losing it can cause participants to wait forever.

Participants must lock resources during prepare phase, which can reduce concurrency and throughput.

Timeouts and retries are tricky; setting them too short causes aborts, too long causes blocking.

When NOT to use

Avoid two-phase commit in microservices requiring high availability and scalability. Use saga patterns, event-driven eventual consistency, or distributed consensus algorithms like Raft or Paxos instead.

Production Patterns

In practice, teams use two-phase commit mainly in legacy systems or tightly coupled databases. Modern microservices prefer sagas with compensating transactions or idempotent event processing to handle distributed updates.

Connections

Saga Pattern

Alternative approach to distributed transactions

Understanding two-phase commit clarifies why sagas trade strict consistency for better availability and simpler failure handling.

Distributed Consensus (Raft/Paxos)

More advanced protocols for agreement in distributed systems

Knowing two-phase commit helps appreciate how consensus algorithms improve fault tolerance and avoid blocking.

Project Management Decision Making

Both involve coordinating multiple parties to agree before action

Seeing two-phase commit as a coordination protocol helps understand how consensus and commitment work in human teams.

Common Pitfalls

#1Assuming two-phase commit never blocks and always completes quickly.

Wrong approach:Implementing two-phase commit without handling participant crashes or timeouts, expecting smooth operation.

Correct approach:Add timeout handling, failure detection, and fallback mechanisms to avoid indefinite blocking.

Root cause:Misunderstanding that network and service failures are common and must be planned for.

#2Using two-phase commit for all distributed transactions regardless of scale.

Wrong approach:Applying two-phase commit in large microservices with many participants, causing slowdowns.

Correct approach:Use sagas or eventual consistency for large-scale distributed transactions to improve performance.

Root cause:Not recognizing the coordination overhead and blocking nature of two-phase commit.

#3Ignoring the coordinator as a single point of failure.

Wrong approach:Deploying two-phase commit without coordinator redundancy or recovery plans.

Correct approach:Implement coordinator failover or use protocols without single points of failure.

Root cause:Underestimating the impact of coordinator failure on system availability.

Key Takeaways

Two-phase commit is a protocol to ensure all-or-nothing changes across multiple systems by coordinating prepare and commit phases.

It guarantees strong consistency but can cause blocking, delays, and single points of failure in distributed microservices.

Because of these drawbacks, modern microservices often avoid two-phase commit in favor of patterns like sagas or eventual consistency.

Understanding two-phase commit helps you grasp the challenges of distributed transactions and why alternative approaches exist.

Knowing its internals and limitations prepares you to design more resilient and scalable distributed systems.

Practice

(1/5)

1. What is the main purpose of the two-phase commit protocol in microservices?

easy

A. To automatically retry failed requests

B. To speed up communication between services

C. To allow services to work independently without coordination

D. To ensure all services agree on a transaction before committing

Two-phase commit (and why to avoid it) in Microservices - Deep Dive

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of two-phase commit

Step 2: Identify the main goal in microservices

Final Answer:

Quick Check:

Solution

Step 1: Recall the two phases names and order

Step 2: Understand the commit phase

Final Answer:

Quick Check:

Solution

Step 1: Analyze failure during prepare phase

Step 2: Understand coordinator's action

Final Answer:

Quick Check:

Solution

Step 1: Identify cause of delays and hangs

Step 2: Understand impact of crashed services

Final Answer:

Quick Check:

Solution

Step 1: Understand drawbacks of two-phase commit

Step 2: Recognize why modern systems avoid it

Final Answer:

Quick Check: