Bird
Raised Fist0
HLDsystem_design~7 mins

Message delivery guarantees in HLD - System Design Guide

Choose your learning style9 modes available
Problem Statement
When messages are sent between services or components, they can be lost, duplicated, or delivered out of order. This causes failures like missing transactions, repeated actions, or inconsistent data states, which break user trust and system correctness.
Solution
Message delivery guarantees define how a system ensures messages reach their destination reliably. They use acknowledgments, retries, and ordering controls to prevent loss, duplication, or disorder. Different levels of guarantees balance reliability and performance based on system needs.
Architecture
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Producer    │──────▶│ Message Queue │──────▶│   Consumer    │
└───────────────┘       └───────────────┘       └───────────────┘
       │                      ▲   │                    │
       │                      │   │                    │
       │                      │   └───── Acknowledgment ─┘
       └───── Send Message ───┘

This diagram shows a producer sending messages to a message queue, which then delivers them to a consumer. The consumer sends acknowledgments back to confirm receipt, enabling delivery guarantees.

Trade-offs
✓ Pros
Prevents message loss, ensuring critical data is not missed.
Avoids duplicate processing by detecting repeated messages.
Maintains message order when required for consistency.
Improves system reliability and user trust.
✗ Cons
Higher delivery guarantees add latency due to acknowledgments and retries.
Complexity increases with mechanisms like deduplication and ordering.
Resource usage grows with message tracking and storage.
Use strong delivery guarantees when data correctness is critical, such as financial transactions or inventory updates, especially at scales above thousands of messages per second.
Avoid strict guarantees in low-scale or latency-sensitive systems where occasional message loss or duplication is acceptable, such as real-time analytics or logs.
Real World Examples
Amazon
Amazon SQS provides at-least-once delivery with deduplication options to ensure order processing messages are reliably handled without loss.
Netflix
Netflix uses Kafka with exactly-once semantics to guarantee event delivery for billing and user activity tracking, preventing duplicates and data loss.
Uber
Uber employs at-least-once delivery in their dispatch system to ensure ride requests are never lost, retrying message delivery until confirmed.
Alternatives
At-most-once delivery
Messages are sent once without retries or acknowledgments, risking loss but minimizing latency.
Use when: Choose when low latency is critical and occasional message loss is acceptable, such as telemetry data.
At-least-once delivery
Messages are retried until acknowledged, ensuring no loss but allowing duplicates.
Use when: Choose when message loss is unacceptable but duplicates can be handled, like order processing.
Exactly-once delivery
Combines retries and deduplication to ensure each message is processed once and only once.
Use when: Choose when both loss and duplicates are unacceptable, such as financial transactions.
Summary
Message delivery guarantees prevent loss, duplication, and disorder of messages between components.
Different levels like at-most-once, at-least-once, and exactly-once balance reliability and performance.
Choosing the right guarantee depends on system criticality, scale, and tolerance for duplicates or loss.