| Users | Connections | Message Rate | Latency | Infrastructure Changes |
|---|---|---|---|---|
| 100 | ~100 concurrent | Low (few msgs/sec) | <100ms | Single server with WebSocket support |
| 10,000 | ~10,000 concurrent | Moderate (hundreds msgs/sec) | <200ms | Load balancer + multiple app servers + Redis pub/sub |
| 1,000,000 | ~1M concurrent | High (thousands msgs/sec) | <300ms | Clustered message brokers (Kafka, Redis Cluster), sharded app servers, CDN for static content |
| 100,000,000 | ~100M concurrent | Very High (millions msgs/sec) | <500ms | Global distributed clusters, edge computing, advanced partitioning, multi-region data centers |
Real-time features in HLD - Scalability & System Analysis
The first bottleneck is the application server's ability to maintain concurrent connections. Real-time features rely on persistent connections like WebSockets, which consume server memory and CPU. Around 5,000 concurrent connections per server is typical. Beyond this, servers struggle to keep connections alive and process messages quickly.
- Horizontal scaling: Add more app servers behind a load balancer to distribute connections.
- Message brokers: Use systems like Redis Pub/Sub, Kafka, or MQTT brokers to handle message distribution efficiently.
- Caching: Cache frequent data to reduce backend load.
- Sharding: Partition users or channels across servers to limit connection and message load per server.
- CDN and edge computing: Offload static content and some processing closer to users to reduce latency and bandwidth.
- Connection multiplexing: Use protocols like HTTP/2 or WebTransport to optimize connection usage.
- At 10,000 users with 1 message per second: 10,000 messages/sec to handle.
- Each message ~1KB -> 10MB/s bandwidth needed.
- Storage depends on message retention; 1 day of messages at 10,000 msgs/sec = ~864GB.
- Network bandwidth per server limited to ~1Gbps (~125MB/s), so multiple servers needed.
- CPU and memory scale with connection count; 1 server ~5,000 connections.
Start by defining the real-time feature and expected load. Identify the main challenges: connection management, message throughput, and latency. Discuss bottlenecks in servers and network. Propose scaling steps: horizontal scaling, message brokers, caching, and sharding. Always mention trade-offs and monitoring needs.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Introduce read replicas and caching layers to reduce load on the primary database before scaling vertically or sharding.
