HLDsystem_design~25 mins

Message delivery guarantees in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Message Delivery Guarantee System

Design the core message delivery system focusing on delivery guarantees and reliability. Out of scope: client UI, advanced analytics, and message content processing.

Functional Requirements

FR1: Support three delivery guarantees: at-most-once, at-least-once, exactly-once

FR2: Allow clients to send and receive messages reliably

FR3: Handle message loss, duplication, and ordering issues

FR4: Provide APIs for sending and receiving messages

FR5: Support message persistence to avoid data loss

FR6: Allow scaling to handle 100,000 messages per second

FR7: Ensure p99 latency for message delivery under 200ms

Non-Functional Requirements

NFR1: System must be highly available with 99.9% uptime

NFR2: Support horizontal scaling for throughput

NFR3: Use standard protocols (e.g., REST, gRPC) for client communication

NFR4: Message storage must be durable and consistent

NFR5: Latency target: p99 < 200ms for message delivery

NFR6: Handle network failures gracefully

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Message broker or queue

Persistent storage for messages

Delivery tracking and acknowledgment system

Client API gateway

Retry and dead-letter queue mechanisms

Monitoring and alerting

Design Patterns

At-most-once, at-least-once, exactly-once delivery patterns

Idempotent message processing

Message deduplication

Two-phase commit or distributed transactions for exactly-once

Retry with exponential backoff

Event sourcing for message state

Reference Architecture

Client API Gateway
      |
      v
+------------------+       +---------------------+
|  Message Broker  |<----->| Persistent Storage  |
+------------------+       +---------------------+
      |                          |
      v                          v
+------------------+       +---------------------+
| Delivery Tracker |       | Dead-letter Queue   |
+------------------+       +---------------------+

Components

Client API Gateway

REST/gRPC server

Receives client requests to send and receive messages

Message Broker

Kafka/RabbitMQ or custom queue

Handles message queuing and delivery with support for delivery guarantees

Persistent Storage

Distributed database (e.g., Cassandra, PostgreSQL)

Stores messages durably to prevent data loss

Delivery Tracker

In-memory store with persistence (e.g., Redis + DB)

Tracks message delivery status and acknowledgments

Dead-letter Queue

Separate queue system

Stores messages that failed delivery after retries

Request Flow

1. Client sends message to API Gateway.

2. API Gateway validates and forwards message to Message Broker.

3. Message Broker stores message in Persistent Storage for durability.

4. Message Broker delivers message to consumer clients.

5. Consumer acknowledges message receipt to Delivery Tracker.

6. Delivery Tracker updates message status to prevent duplicates.

7. If delivery fails repeatedly, message moves to Dead-letter Queue.

8. For exactly-once, system uses idempotency keys and transactional updates.

Database Schema

Entities: - Message: id (PK), content, timestamp, status, delivery_attempts, idempotency_key - DeliveryStatus: message_id (FK), consumer_id, acknowledged (bool), timestamp - DeadLetter: message_id (FK), reason, timestamp Relationships: - One Message can have multiple DeliveryStatus entries (one per consumer) - DeadLetter references Message for failed deliveries

Scaling Discussion

Bottlenecks

Message Broker throughput limits under high load

Persistent Storage write/read latency

Delivery Tracker state size and update frequency

Network bandwidth for message delivery

Handling large numbers of consumer acknowledgments

Solutions

Partition Message Broker topics/queues to distribute load

Use scalable distributed databases with write optimization

Cache delivery status and batch updates to storage

Use compression and efficient protocols for network communication

Shard Delivery Tracker by consumer groups and use asynchronous processing

Interview Tips

Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and data flow, 10 minutes for scaling discussion, 10 minutes for Q&A

Clarify delivery guarantee definitions and client expectations

Explain trade-offs between at-most-once, at-least-once, and exactly-once

Describe components and their roles clearly

Discuss how persistence and acknowledgments ensure reliability

Address scaling challenges and solutions thoughtfully

Mention failure handling and monitoring importance