Bird
Raised Fist0
HLDsystem_design~10 mins

Real-time features in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Real-time features
Growth Table: Real-time Features Scaling
UsersConnectionsMessage RateLatencyInfrastructure Changes
100~100 concurrentLow (few msgs/sec)<100msSingle server with WebSocket support
10,000~10,000 concurrentModerate (hundreds msgs/sec)<200msLoad balancer + multiple app servers + Redis pub/sub
1,000,000~1M concurrentHigh (thousands msgs/sec)<300msClustered message brokers (Kafka, Redis Cluster), sharded app servers, CDN for static content
100,000,000~100M concurrentVery High (millions msgs/sec)<500msGlobal distributed clusters, edge computing, advanced partitioning, multi-region data centers
First Bottleneck

The first bottleneck is the application server's ability to maintain concurrent connections. Real-time features rely on persistent connections like WebSockets, which consume server memory and CPU. Around 5,000 concurrent connections per server is typical. Beyond this, servers struggle to keep connections alive and process messages quickly.

Scaling Solutions
  • Horizontal scaling: Add more app servers behind a load balancer to distribute connections.
  • Message brokers: Use systems like Redis Pub/Sub, Kafka, or MQTT brokers to handle message distribution efficiently.
  • Caching: Cache frequent data to reduce backend load.
  • Sharding: Partition users or channels across servers to limit connection and message load per server.
  • CDN and edge computing: Offload static content and some processing closer to users to reduce latency and bandwidth.
  • Connection multiplexing: Use protocols like HTTP/2 or WebTransport to optimize connection usage.
Back-of-Envelope Cost Analysis
  • At 10,000 users with 1 message per second: 10,000 messages/sec to handle.
  • Each message ~1KB -> 10MB/s bandwidth needed.
  • Storage depends on message retention; 1 day of messages at 10,000 msgs/sec = ~864GB.
  • Network bandwidth per server limited to ~1Gbps (~125MB/s), so multiple servers needed.
  • CPU and memory scale with connection count; 1 server ~5,000 connections.
Interview Tip

Start by defining the real-time feature and expected load. Identify the main challenges: connection management, message throughput, and latency. Discuss bottlenecks in servers and network. Propose scaling steps: horizontal scaling, message brokers, caching, and sharding. Always mention trade-offs and monitoring needs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Introduce read replicas and caching layers to reduce load on the primary database before scaling vertically or sharding.

Key Result
Real-time features first hit limits on concurrent connections at app servers; horizontal scaling and message brokers are key to scaling beyond 10K users.