HLDsystem_design~10 mins

Why messaging requires real-time architecture in HLD - Scalability Evidence

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Why messaging requires real-time architecture

Growth Table: Messaging System Scaling

Users	Messages/Second	Latency Requirement	Infrastructure Changes	Challenges
100 users	~10-50	< 1 second	Single server, simple DB	Minimal load, simple queue
10,000 users	~1,000-5,000	< 500 ms	Load balancer, message broker, caching	Handling concurrent connections, DB load
1,000,000 users	~1,000,000+	< 200 ms	Horizontal scaling, sharded DB, distributed brokers	Network bandwidth, message ordering, fault tolerance
100,000,000 users	~10,000,000+	< 100 ms	Global CDN, multi-region clusters, advanced partitioning	Latency consistency, data replication, disaster recovery

First Bottleneck: Real-Time Message Delivery

The first bottleneck in scaling messaging systems is the real-time message delivery component. This includes the message broker and network connections that must handle many concurrent users sending and receiving messages instantly.

As user count grows, the system struggles to maintain low latency and message ordering. The database can also become a bottleneck if it is used synchronously for message storage or delivery confirmation.

Scaling Solutions for Real-Time Messaging

Horizontal Scaling: Add more message broker instances and application servers behind load balancers to distribute user connections.
Message Brokers: Use specialized real-time brokers (e.g., Kafka, RabbitMQ, or MQTT) that support high throughput and low latency.
Caching: Use in-memory caches (e.g., Redis) for quick message state and presence info to reduce DB hits.
Sharding: Partition user data and message streams by user ID or region to reduce contention and improve parallelism.
CDN & Edge Computing: For global scale, use edge servers to reduce latency by bringing message routing closer to users.
Asynchronous Processing: Decouple message storage from delivery using queues to avoid blocking operations.

Back-of-Envelope Cost Analysis

At 1M users sending 1 message per second: ~1M messages/sec throughput needed.
Each message ~1 KB -> 1 GB/s bandwidth needed just for messages.
Database writes can be optimized by batching or async writes to handle ~100K QPS per instance.
Network bandwidth and CPU on brokers become expensive; multiple instances needed.
Storage grows rapidly; archiving old messages is necessary to control costs.

Interview Tip: Structuring Scalability Discussion

Start by explaining the real-time nature of messaging and why low latency is critical.

Discuss how user growth increases concurrent connections and message throughput.

Identify the first bottleneck (message delivery and broker capacity).

Propose scaling solutions step-by-step: horizontal scaling, caching, sharding, and CDN.

Include cost and complexity trade-offs to show balanced understanding.

Self Check Question

Your message broker handles 1,000 messages per second. Traffic grows 10x. What do you do first and why?

Answer: Add more broker instances and implement load balancing to distribute the increased message load, ensuring low latency and avoiding message loss.

Key Result

Messaging systems require real-time architecture because as users grow, the message broker and network become the first bottlenecks due to the need for low latency and high throughput. Scaling involves horizontal scaling of brokers, caching, sharding, and edge delivery to maintain performance.