| Users | Messages/Second | Latency Requirement | Infrastructure Changes | Challenges |
|---|---|---|---|---|
| 100 users | ~10-50 | < 1 second | Single server, simple DB | Minimal load, simple queue |
| 10,000 users | ~1,000-5,000 | < 500 ms | Load balancer, message broker, caching | Handling concurrent connections, DB load |
| 1,000,000 users | ~1,000,000+ | < 200 ms | Horizontal scaling, sharded DB, distributed brokers | Network bandwidth, message ordering, fault tolerance |
| 100,000,000 users | ~10,000,000+ | < 100 ms | Global CDN, multi-region clusters, advanced partitioning | Latency consistency, data replication, disaster recovery |
Why messaging requires real-time architecture in HLD - Scalability Evidence
The first bottleneck in scaling messaging systems is the real-time message delivery component. This includes the message broker and network connections that must handle many concurrent users sending and receiving messages instantly.
As user count grows, the system struggles to maintain low latency and message ordering. The database can also become a bottleneck if it is used synchronously for message storage or delivery confirmation.
- Horizontal Scaling: Add more message broker instances and application servers behind load balancers to distribute user connections.
- Message Brokers: Use specialized real-time brokers (e.g., Kafka, RabbitMQ, or MQTT) that support high throughput and low latency.
- Caching: Use in-memory caches (e.g., Redis) for quick message state and presence info to reduce DB hits.
- Sharding: Partition user data and message streams by user ID or region to reduce contention and improve parallelism.
- CDN & Edge Computing: For global scale, use edge servers to reduce latency by bringing message routing closer to users.
- Asynchronous Processing: Decouple message storage from delivery using queues to avoid blocking operations.
- At 1M users sending 1 message per second: ~1M messages/sec throughput needed.
- Each message ~1 KB -> 1 GB/s bandwidth needed just for messages.
- Database writes can be optimized by batching or async writes to handle ~100K QPS per instance.
- Network bandwidth and CPU on brokers become expensive; multiple instances needed.
- Storage grows rapidly; archiving old messages is necessary to control costs.
Start by explaining the real-time nature of messaging and why low latency is critical.
Discuss how user growth increases concurrent connections and message throughput.
Identify the first bottleneck (message delivery and broker capacity).
Propose scaling solutions step-by-step: horizontal scaling, caching, sharding, and CDN.
Include cost and complexity trade-offs to show balanced understanding.
Your message broker handles 1,000 messages per second. Traffic grows 10x. What do you do first and why?
Answer: Add more broker instances and implement load balancing to distribute the increased message load, ensuring low latency and avoiding message loss.
