Bird
Raised Fist0
HLDsystem_design~15 mins

Message delivery guarantees in HLD - Deep Dive

Choose your learning style9 modes available
Overview - Message delivery guarantees
What is it?
Message delivery guarantees describe how a system ensures messages sent between components arrive correctly and reliably. They define if messages are delivered once, multiple times, or at least once, and how lost or duplicated messages are handled. These guarantees help systems communicate without losing or repeating information. They are essential in distributed systems where messages travel over networks that can fail or delay.
Why it matters
Without message delivery guarantees, systems could lose important data or process the same message multiple times, causing errors and inconsistent results. Imagine sending a payment request twice or missing a notification; this could lead to financial loss or user frustration. Guarantees make communication trustworthy and predictable, which is critical for applications like banking, messaging apps, and online shopping.
Where it fits
Before learning message delivery guarantees, you should understand basic networking and distributed systems concepts like message passing and failures. After this, you can explore related topics like consensus algorithms, fault tolerance, and event-driven architectures to build robust systems.
Mental Model
Core Idea
Message delivery guarantees define how a system ensures messages are delivered reliably, without loss or unwanted duplication, despite failures.
Think of it like...
It's like sending a letter through the mail with different options: a regular letter that might get lost, a certified letter that ensures delivery once, or a letter that might be delivered multiple times but you confirm receipt to avoid confusion.
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│  Sender App   │─────▶│  Message Bus  │─────▶│ Receiver App  │
└───────────────┘      └───────────────┘      └───────────────┘
       │                      │                      │
       │                      │                      │
       │<----- Delivery Guarantees Control ------->│

Delivery Guarantees:
- At most once: message sent once, may be lost
- At least once: message retried until received, may duplicate
- Exactly once: message delivered once, no loss or duplicates
Build-Up - 7 Steps
1
FoundationUnderstanding basic message passing
🤔
Concept: Introduce the idea of sending messages between two systems or components.
In distributed systems, components communicate by sending messages. A message is a piece of data sent from one part to another. For example, a user clicks a button, and the app sends a message to the server to save data. This communication can fail due to network issues or crashes.
Result
You understand that messages are the basic unit of communication and that sending them is not always guaranteed to succeed.
Understanding that messages can fail to arrive is the foundation for why delivery guarantees are needed.
2
FoundationRecognizing message loss and duplication
🤔
Concept: Explain common problems: messages can be lost or delivered multiple times.
When sending messages over a network, some messages might never reach the receiver (loss). Others might be sent again if the sender is unsure if the first was received, causing duplicates. For example, a payment request sent twice could charge a user twice if duplicates are not handled.
Result
You realize that message delivery is not perfect and can cause errors if not managed.
Knowing the types of failures helps understand why different guarantees exist.
3
IntermediateAt most once delivery explained
🤔Before reading on: do you think 'at most once' means messages can be lost or duplicated? Commit to your answer.
Concept: Introduce 'at most once' delivery where messages are sent once without retries.
'At most once' means the sender sends the message once and does not retry. If the message is lost, it is gone forever. This approach avoids duplicates but risks losing messages. It's like sending a regular letter without tracking.
Result
Messages may be lost but never duplicated.
Understanding 'at most once' helps see the tradeoff between simplicity and reliability.
4
IntermediateAt least once delivery explained
🤔Before reading on: do you think 'at least once' delivery can cause duplicates? Commit to your answer.
Concept: 'At least once' means the sender retries sending until it gets confirmation, ensuring delivery but risking duplicates.
In 'at least once', the sender keeps resending the message until the receiver acknowledges it. This guarantees the message arrives but may cause duplicates if acknowledgments are lost. For example, a server might process the same order twice if duplicates are not handled.
Result
Messages are never lost but may be duplicated.
Knowing 'at least once' delivery shows how systems prioritize reliability over duplication.
5
IntermediateExactly once delivery explained
🤔Before reading on: do you think 'exactly once' delivery is easy to implement? Commit to your answer.
Concept: 'Exactly once' delivery ensures each message is processed one time, no more, no less.
This guarantee combines the benefits of 'at least once' and 'at most once' by ensuring messages are delivered and processed exactly once. It requires complex mechanisms like unique message IDs, deduplication, and transactional processing. For example, payment systems use this to avoid charging twice or missing charges.
Result
Messages are delivered once, no loss, no duplicates.
Understanding 'exactly once' delivery reveals the complexity needed for strong reliability.
6
AdvancedTechniques to achieve delivery guarantees
🤔Before reading on: do you think message IDs alone guarantee exactly once delivery? Commit to your answer.
Concept: Explore methods like acknowledgments, retries, deduplication, and transactions used to implement guarantees.
Systems use acknowledgments to confirm receipt, retries to resend lost messages, and unique IDs to detect duplicates. Exactly once delivery often uses transactions to atomically process messages and record their status. For example, Kafka uses offsets and idempotent producers to help achieve exactly once semantics.
Result
You see how delivery guarantees are implemented with practical techniques.
Knowing these techniques helps understand the engineering challenges behind reliable messaging.
7
ExpertTradeoffs and challenges in real systems
🤔Before reading on: do you think exactly once delivery always improves system performance? Commit to your answer.
Concept: Discuss the performance, complexity, and scalability tradeoffs of different guarantees in production.
Exactly once delivery requires extra coordination and storage, which can slow down systems and increase costs. At most once is fast but risky. At least once is a middle ground but needs careful duplicate handling. Choosing the right guarantee depends on application needs, failure modes, and cost tolerance.
Result
You understand why systems pick different guarantees based on tradeoffs.
Understanding tradeoffs guides designing systems that balance reliability, performance, and complexity.
Under the Hood
Message delivery guarantees rely on protocols that track message state between sender and receiver. Senders tag messages with unique IDs and wait for acknowledgments. If no acknowledgment arrives, they retry sending. Receivers keep track of processed message IDs to avoid duplicates. Exactly once delivery often uses transactional storage to atomically process messages and record their status, preventing duplicates even if retries occur.
Why designed this way?
These guarantees were designed to handle unreliable networks and system failures common in distributed computing. Early systems either lost messages or duplicated them, causing errors. The design balances complexity and reliability: simple systems accept loss, while critical systems invest in complex protocols to ensure correctness. Alternatives like synchronous calls or shared memory were impractical at scale or across networks.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Sender      │──────▶│  Network/Bus  │──────▶│   Receiver    │
│  (assigns ID) │       │ (may lose/dup)│       │ (checks ID)   │
└───────────────┘       └───────────────┘       └───────────────┘
       │                      ▲                      │
       │                      │                      │
       │◀──── Acknowledgment ──┘                      │
       │                                             │
       │◀──────────── Deduplication check ──────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does 'at most once' guarantee no message loss? Commit yes or no.
Common Belief:At most once delivery means messages are never lost.
Tap to reveal reality
Reality:At most once delivery can lose messages because it does not retry sending.
Why it matters:Believing this causes systems to miss critical data silently, leading to inconsistent states.
Quick: Does 'at least once' guarantee no duplicates? Commit yes or no.
Common Belief:At least once delivery means messages are never duplicated.
Tap to reveal reality
Reality:At least once delivery can cause duplicates because it retries until acknowledged.
Why it matters:Ignoring duplicates can cause repeated processing, like charging a customer twice.
Quick: Is exactly once delivery easy and cheap to implement? Commit yes or no.
Common Belief:Exactly once delivery is simple and has no performance cost.
Tap to reveal reality
Reality:Exactly once delivery is complex and often slower due to coordination and storage overhead.
Why it matters:Underestimating complexity leads to poor system design and unexpected bottlenecks.
Quick: Does unique message ID alone guarantee exactly once delivery? Commit yes or no.
Common Belief:Using unique IDs automatically ensures exactly once delivery.
Tap to reveal reality
Reality:Unique IDs help detect duplicates but need transactional processing to guarantee exactly once semantics.
Why it matters:Relying only on IDs can cause subtle bugs where duplicates are processed if state is not managed atomically.
Expert Zone
1
Exactly once delivery often requires idempotent operations on the receiver side to handle rare edge cases.
2
Network partitions can cause split-brain scenarios where duplicate messages appear despite guarantees.
3
Some systems use 'effectively once' delivery, accepting rare duplicates but simplifying design.
When NOT to use
Use at most once delivery for non-critical data where loss is acceptable, like logging. Use at least once when duplicates can be handled or are less harmful than loss. Exactly once should be reserved for critical operations like financial transactions, but consider performance impact and complexity.
Production Patterns
Real systems combine delivery guarantees with idempotent processing and deduplication caches. For example, Kafka uses offsets and transactional producers for exactly once. AWS SQS offers at least once with deduplication. Payment systems use distributed transactions to ensure exactly once.
Connections
Distributed Consensus
Builds-on
Understanding message delivery guarantees helps grasp how consensus algorithms like Paxos or Raft ensure agreement despite message loss or duplication.
Database Transactions
Shares principles
Exactly once delivery relies on atomic processing similar to database transactions, ensuring operations happen fully or not at all.
Human Communication
Analogous pattern
Message delivery guarantees mirror how humans confirm messages by repeating or acknowledging to avoid misunderstandings.
Common Pitfalls
#1Ignoring duplicate messages in at least once delivery systems.
Wrong approach:Process every received message without checking IDs or state.
Correct approach:Implement deduplication by tracking processed message IDs before processing.
Root cause:Misunderstanding that retries cause duplicates and that receivers must handle them.
#2Assuming at most once delivery is reliable for critical data.
Wrong approach:Send message once without retries or acknowledgments for important transactions.
Correct approach:Use at least once or exactly once guarantees with retries and acknowledgments for critical messages.
Root cause:Underestimating network failures and message loss probability.
#3Trying to implement exactly once delivery without transactional support.
Wrong approach:Use unique IDs but process messages without atomic state updates.
Correct approach:Combine unique IDs with transactional processing to atomically record message handling.
Root cause:Not realizing that deduplication requires atomic state changes to prevent duplicates.
Key Takeaways
Message delivery guarantees define how systems handle message loss and duplication to ensure reliable communication.
At most once delivery avoids duplicates but risks losing messages; at least once ensures delivery but may duplicate messages.
Exactly once delivery is the strongest guarantee but requires complex mechanisms like transactions and deduplication.
Choosing the right guarantee depends on application needs, balancing reliability, complexity, and performance.
Understanding these guarantees is essential for designing robust distributed systems that behave predictably under failure.