HLDsystem_design~10 mins

Real-time features in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Real-time features

Growth Table: Real-time Features Scaling

Users	Connections	Message Rate	Latency	Infrastructure Changes
100	~100 concurrent	Low (few msgs/sec)	<100ms	Single server with WebSocket support
10,000	~10,000 concurrent	Moderate (hundreds msgs/sec)	<200ms	Load balancer + multiple app servers + Redis pub/sub
1,000,000	~1M concurrent	High (thousands msgs/sec)	<300ms	Clustered message brokers (Kafka, Redis Cluster), sharded app servers, CDN for static content
100,000,000	~100M concurrent	Very High (millions msgs/sec)	<500ms	Global distributed clusters, edge computing, advanced partitioning, multi-region data centers

First Bottleneck

The first bottleneck is the application server's ability to maintain concurrent connections. Real-time features rely on persistent connections like WebSockets, which consume server memory and CPU. Around 5,000 concurrent connections per server is typical. Beyond this, servers struggle to keep connections alive and process messages quickly.

Scaling Solutions

Horizontal scaling: Add more app servers behind a load balancer to distribute connections.
Message brokers: Use systems like Redis Pub/Sub, Kafka, or MQTT brokers to handle message distribution efficiently.
Caching: Cache frequent data to reduce backend load.
Sharding: Partition users or channels across servers to limit connection and message load per server.
CDN and edge computing: Offload static content and some processing closer to users to reduce latency and bandwidth.
Connection multiplexing: Use protocols like HTTP/2 or WebTransport to optimize connection usage.

Back-of-Envelope Cost Analysis

At 10,000 users with 1 message per second: 10,000 messages/sec to handle.
Each message ~1KB -> 10MB/s bandwidth needed.
Storage depends on message retention; 1 day of messages at 10,000 msgs/sec = ~864GB.
Network bandwidth per server limited to ~1Gbps (~125MB/s), so multiple servers needed.
CPU and memory scale with connection count; 1 server ~5,000 connections.

Interview Tip

Start by defining the real-time feature and expected load. Identify the main challenges: connection management, message throughput, and latency. Discuss bottlenecks in servers and network. Propose scaling steps: horizontal scaling, message brokers, caching, and sharding. Always mention trade-offs and monitoring needs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Introduce read replicas and caching layers to reduce load on the primary database before scaling vertically or sharding.

Key Result

Real-time features first hit limits on concurrent connections at app servers; horizontal scaling and message brokers are key to scaling beyond 10K users.