Bird
Raised Fist0
HLDsystem_design~10 mins

Message delivery guarantees in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Message delivery guarantees
Growth Table: Message Delivery Guarantees
Users / Messages100 users10K users1M users100M users
Message volume per second~100 msg/s~10K msg/s~1M msg/s~100M msg/s
System componentsSingle broker, single DBMultiple brokers, DB replicasPartitioned brokers, sharded DBGlobal distributed brokers, multi-region DB shards
Delivery guarantee complexitySimple ack, retryIdempotency, deduplicationExactly-once semantics, orderingCross-region consistency, geo-replication
LatencyLow, <100msModerate, 100-200msHigher, 200-500msVariable, 500ms+
Failure handlingBasic retryDead-letter queues, monitoringAdvanced retry policies, transactional logsMulti-region failover, disaster recovery
First Bottleneck

The first bottleneck is the message broker's throughput and storage capacity. At low scale, a single broker handles message delivery with simple acknowledgments. As users grow, the broker's ability to process and persist messages reliably becomes limited. The database or storage system backing the broker also becomes a bottleneck due to write throughput and consistency requirements for delivery guarantees.

Scaling Solutions
  • Horizontal scaling: Add more message brokers and partition topics to distribute load.
  • Replication: Use broker clusters with replication for fault tolerance and availability.
  • Caching: Use in-memory caches for quick message state checks to reduce DB load.
  • Sharding: Partition databases by user or topic to scale storage and throughput.
  • Idempotency and deduplication: Implement to ensure exactly-once delivery despite retries.
  • Dead-letter queues: Handle undeliverable messages separately to avoid blocking.
  • Geo-distribution: Deploy brokers and storage in multiple regions for latency and disaster recovery.
Back-of-Envelope Cost Analysis
  • At 10K messages/sec, a single broker can handle ~5K-10K msg/sec, so 2 brokers needed.
  • Database write throughput must support message persistence; ~10K writes/sec at 10K msg/sec.
  • Storage grows with message retention; 10K msg/sec * 86,400 sec/day = ~864M messages/day.
  • Network bandwidth: 10K msg/sec * average message size (e.g., 1KB) = ~10MB/s (~80Mbps).
  • At 1M msg/sec, partitioning and sharding are mandatory; storage and network scale accordingly.
Interview Tip

Start by clarifying the delivery guarantees needed: at-most-once, at-least-once, or exactly-once. Discuss how these affect system design and complexity. Then, outline scaling steps from single broker to distributed clusters, focusing on bottlenecks and solutions. Use real numbers to justify design choices and show understanding of trade-offs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce load on the primary database. Also, consider sharding the database to distribute writes and scale horizontally.

Key Result
Message delivery systems first hit bottlenecks at the message broker and database throughput. Scaling requires partitioning, replication, and careful handling of delivery semantics to maintain guarantees at high volume.