Bird
Raised Fist0
HLDsystem_design~25 mins

Message delivery guarantees in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Message Delivery Guarantee System
Design the core message delivery system focusing on delivery guarantees and reliability. Out of scope: client UI, advanced analytics, and message content processing.
Functional Requirements
FR1: Support three delivery guarantees: at-most-once, at-least-once, exactly-once
FR2: Allow clients to send and receive messages reliably
FR3: Handle message loss, duplication, and ordering issues
FR4: Provide APIs for sending and receiving messages
FR5: Support message persistence to avoid data loss
FR6: Allow scaling to handle 100,000 messages per second
FR7: Ensure p99 latency for message delivery under 200ms
Non-Functional Requirements
NFR1: System must be highly available with 99.9% uptime
NFR2: Support horizontal scaling for throughput
NFR3: Use standard protocols (e.g., REST, gRPC) for client communication
NFR4: Message storage must be durable and consistent
NFR5: Latency target: p99 < 200ms for message delivery
NFR6: Handle network failures gracefully
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Message broker or queue
Persistent storage for messages
Delivery tracking and acknowledgment system
Client API gateway
Retry and dead-letter queue mechanisms
Monitoring and alerting
Design Patterns
At-most-once, at-least-once, exactly-once delivery patterns
Idempotent message processing
Message deduplication
Two-phase commit or distributed transactions for exactly-once
Retry with exponential backoff
Event sourcing for message state
Reference Architecture
Client API Gateway
      |
      v
+------------------+       +---------------------+
|  Message Broker  |<----->| Persistent Storage  |
+------------------+       +---------------------+
      |                          |
      v                          v
+------------------+       +---------------------+
| Delivery Tracker |       | Dead-letter Queue   |
+------------------+       +---------------------+
Components
Client API Gateway
REST/gRPC server
Receives client requests to send and receive messages
Message Broker
Kafka/RabbitMQ or custom queue
Handles message queuing and delivery with support for delivery guarantees
Persistent Storage
Distributed database (e.g., Cassandra, PostgreSQL)
Stores messages durably to prevent data loss
Delivery Tracker
In-memory store with persistence (e.g., Redis + DB)
Tracks message delivery status and acknowledgments
Dead-letter Queue
Separate queue system
Stores messages that failed delivery after retries
Request Flow
1. Client sends message to API Gateway.
2. API Gateway validates and forwards message to Message Broker.
3. Message Broker stores message in Persistent Storage for durability.
4. Message Broker delivers message to consumer clients.
5. Consumer acknowledges message receipt to Delivery Tracker.
6. Delivery Tracker updates message status to prevent duplicates.
7. If delivery fails repeatedly, message moves to Dead-letter Queue.
8. For exactly-once, system uses idempotency keys and transactional updates.
Database Schema
Entities: - Message: id (PK), content, timestamp, status, delivery_attempts, idempotency_key - DeliveryStatus: message_id (FK), consumer_id, acknowledged (bool), timestamp - DeadLetter: message_id (FK), reason, timestamp Relationships: - One Message can have multiple DeliveryStatus entries (one per consumer) - DeadLetter references Message for failed deliveries
Scaling Discussion
Bottlenecks
Message Broker throughput limits under high load
Persistent Storage write/read latency
Delivery Tracker state size and update frequency
Network bandwidth for message delivery
Handling large numbers of consumer acknowledgments
Solutions
Partition Message Broker topics/queues to distribute load
Use scalable distributed databases with write optimization
Cache delivery status and batch updates to storage
Use compression and efficient protocols for network communication
Shard Delivery Tracker by consumer groups and use asynchronous processing
Interview Tips
Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and data flow, 10 minutes for scaling discussion, 10 minutes for Q&A
Clarify delivery guarantee definitions and client expectations
Explain trade-offs between at-most-once, at-least-once, and exactly-once
Describe components and their roles clearly
Discuss how persistence and acknowledgments ensure reliability
Address scaling challenges and solutions thoughtfully
Mention failure handling and monitoring importance