HLDsystem_design~10 mins

Message delivery guarantees in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Message delivery guarantees

Growth Table: Message Delivery Guarantees

Users / Messages	100 users	10K users	1M users	100M users
Message volume per second	~100 msg/s	~10K msg/s	~1M msg/s	~100M msg/s
System components	Single broker, single DB	Multiple brokers, DB replicas	Partitioned brokers, sharded DB	Global distributed brokers, multi-region DB shards
Delivery guarantee complexity	Simple ack, retry	Idempotency, deduplication	Exactly-once semantics, ordering	Cross-region consistency, geo-replication
Latency	Low, <100ms	Moderate, 100-200ms	Higher, 200-500ms	Variable, 500ms+
Failure handling	Basic retry	Dead-letter queues, monitoring	Advanced retry policies, transactional logs	Multi-region failover, disaster recovery

First Bottleneck

The first bottleneck is the message broker's throughput and storage capacity. At low scale, a single broker handles message delivery with simple acknowledgments. As users grow, the broker's ability to process and persist messages reliably becomes limited. The database or storage system backing the broker also becomes a bottleneck due to write throughput and consistency requirements for delivery guarantees.

Scaling Solutions

Horizontal scaling: Add more message brokers and partition topics to distribute load.
Replication: Use broker clusters with replication for fault tolerance and availability.
Caching: Use in-memory caches for quick message state checks to reduce DB load.
Sharding: Partition databases by user or topic to scale storage and throughput.
Idempotency and deduplication: Implement to ensure exactly-once delivery despite retries.
Dead-letter queues: Handle undeliverable messages separately to avoid blocking.
Geo-distribution: Deploy brokers and storage in multiple regions for latency and disaster recovery.

Back-of-Envelope Cost Analysis

At 10K messages/sec, a single broker can handle ~5K-10K msg/sec, so 2 brokers needed.
Database write throughput must support message persistence; ~10K writes/sec at 10K msg/sec.
Storage grows with message retention; 10K msg/sec * 86,400 sec/day = ~864M messages/day.
Network bandwidth: 10K msg/sec * average message size (e.g., 1KB) = ~10MB/s (~80Mbps).
At 1M msg/sec, partitioning and sharding are mandatory; storage and network scale accordingly.

Interview Tip

Start by clarifying the delivery guarantees needed: at-most-once, at-least-once, or exactly-once. Discuss how these affect system design and complexity. Then, outline scaling steps from single broker to distributed clusters, focusing on bottlenecks and solutions. Use real numbers to justify design choices and show understanding of trade-offs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce load on the primary database. Also, consider sharding the database to distribute writes and scale horizontally.

Key Result

Message delivery systems first hit bottlenecks at the message broker and database throughput. Scaling requires partitioning, replication, and careful handling of delivery semantics to maintain guarantees at high volume.