| Users | Connections | Server Load | Network Usage | Latency |
|---|---|---|---|---|
| 100 users | 100 concurrent connections | Low CPU and memory | Low bandwidth | Low latency, near real-time |
| 10,000 users | 10,000 concurrent connections | High CPU and memory on few servers | Moderate bandwidth | Some delay due to server load |
| 1,000,000 users | 1,000,000 concurrent connections | Cannot handle on few servers; memory and CPU bottleneck | High bandwidth; network saturation risk | Increased latency; possible dropped connections |
| 100,000,000 users | 100,000,000 concurrent connections | Impossible on single data center; requires global distribution | Very high bandwidth; CDN and edge servers needed | Latency depends on geo-distribution; complex load balancing |
Long polling and Server-Sent Events in HLD - Scalability & System Analysis
The first bottleneck is the server's ability to maintain many concurrent open connections. Both long polling and Server-Sent Events (SSE) keep connections open, consuming memory and CPU resources. At around 10,000 to 50,000 concurrent connections per server, resource limits are reached. Network bandwidth also becomes a concern as each connection sends data periodically or stays open.
- Horizontal scaling: Add more servers behind a load balancer to distribute connections.
- Connection multiplexing: Use protocols like HTTP/2 or WebSockets to reduce overhead per connection.
- Use of reverse proxies: Employ Nginx or Envoy to efficiently manage many open connections.
- Offload static content: Use CDNs to reduce server bandwidth for static assets.
- Sharding users: Distribute users across multiple servers or regions to reduce load per server.
- Switch to WebSockets: For very high scale, WebSockets can be more efficient than long polling or SSE.
- Optimize message frequency: Reduce how often servers send updates to reduce bandwidth and CPU.
Assuming 10,000 users with SSE or long polling:
- Each server handles ~10,000 concurrent connections (high but possible with optimized servers).
- Each connection sends a small message every 5 seconds -> 0.2 messages per second per connection.
- Total messages per second = 10,000 users * 0.2 messages/sec = 2,000 messages/sec.
- Bandwidth per message ~1 KB -> 2,000 KB/s = ~2 MB/s bandwidth per server.
- Memory per connection ~2 KB -> 10,000 connections * 2 KB = ~20 MB RAM just for connections.
- CPU usage depends on message processing; expect moderate CPU load.
Scaling to 1 million users requires ~100 servers, 200,000 messages/sec, and ~200 MB/s bandwidth total.
Start by explaining how long polling and SSE keep connections open and why that matters for scaling. Identify the server resource limits (memory, CPU, network). Discuss horizontal scaling and connection management techniques. Mention alternatives like WebSockets and CDNs. Always quantify with rough numbers to show understanding of scale.
Your server handles 1,000 QPS with long polling connections. Traffic grows 10x to 10,000 QPS. What do you do first and why?