HLDsystem_design~10 mins

Why scalability handles growing traffic in HLD - Scalability Evidence

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Why scalability handles growing traffic

Growth Table: What Changes as Traffic Grows

Users/Traffic	Requests per Second	Server Load	Database Load	Network Bandwidth	Storage Needs
100 users	~10-50 RPS	Single server handles easily	Single DB instance sufficient	Low bandwidth usage	Minimal storage
10,000 users	~1,000-5,000 RPS	Multiple app servers needed	DB nearing capacity, may need read replicas	Moderate bandwidth usage	Growing storage, backups needed
1,000,000 users	~100,000 RPS	Horizontal scaling essential	DB sharding or distributed DB needed	High bandwidth, CDN recommended	Large storage, tiered storage useful
100,000,000 users	~10,000,000 RPS	Massive cluster of servers	Multiple shards, distributed DB clusters	Very high bandwidth, global CDN	Petabytes of storage, archival systems

First Bottleneck: What Breaks First and Why

At low to medium traffic, the database is usually the first bottleneck. This is because it handles all data reads and writes, and has limited query throughput (typically 5,000-10,000 queries per second for a single instance). As traffic grows, the DB CPU, memory, and disk I/O get saturated first.

At higher traffic, application servers can become CPU or memory bottlenecks due to processing many concurrent requests. Network bandwidth can also become a bottleneck when serving large amounts of data or media.

Scaling Solutions

Database: Use read replicas to spread read load, connection pooling to reduce overhead, and sharding to split data across multiple DB instances.
Application Servers: Add more servers horizontally behind load balancers to distribute traffic evenly.
Caching: Use in-memory caches like Redis or Memcached to reduce DB load for frequent queries.
Content Delivery Network (CDN): Offload static content delivery to edge servers closer to users, reducing bandwidth and latency.
Storage: Use tiered storage and archival for old data to reduce expensive fast storage usage.

Back-of-Envelope Cost Analysis

At 10,000 users generating ~5,000 RPS, a single DB instance near max capacity (5,000-10,000 QPS) may need read replicas.
Each server can handle ~1,000-5,000 concurrent connections; so 2-5 app servers needed at this scale.
Network bandwidth at 1 Gbps (~125 MB/s) can handle roughly 10,000-20,000 requests per second depending on payload size.
Storage grows with data; for example, 1 million users might generate terabytes of data requiring distributed storage.

Interview Tip: Structuring Scalability Discussion

Start by understanding current traffic and system limits. Identify the first bottleneck clearly (usually DB). Discuss how traffic growth affects each component. Propose targeted solutions like caching, read replicas, horizontal scaling. Mention trade-offs and cost implications. Use real numbers to show understanding.

Self Check Question

Question: Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: The first step is to add read replicas to distribute read queries and reduce load on the primary database. This helps handle increased read traffic without immediate costly sharding or redesign.

Key Result

The database is usually the first bottleneck as traffic grows; scaling it with read replicas, caching, and sharding is key to handling increasing user requests efficiently.