0
0
HLDsystem_design~10 mins

Why scalability handles growing traffic in HLD - Scalability Evidence

Choose your learning style9 modes available
Scalability Analysis - Why scalability handles growing traffic
Growth Table: What Changes as Traffic Grows
Users/TrafficRequests per SecondServer LoadDatabase LoadNetwork BandwidthStorage Needs
100 users~10-50 RPSSingle server handles easilySingle DB instance sufficientLow bandwidth usageMinimal storage
10,000 users~1,000-5,000 RPSMultiple app servers neededDB nearing capacity, may need read replicasModerate bandwidth usageGrowing storage, backups needed
1,000,000 users~100,000 RPSHorizontal scaling essentialDB sharding or distributed DB neededHigh bandwidth, CDN recommendedLarge storage, tiered storage useful
100,000,000 users~10,000,000 RPSMassive cluster of serversMultiple shards, distributed DB clustersVery high bandwidth, global CDNPetabytes of storage, archival systems
First Bottleneck: What Breaks First and Why

At low to medium traffic, the database is usually the first bottleneck. This is because it handles all data reads and writes, and has limited query throughput (typically 5,000-10,000 queries per second for a single instance). As traffic grows, the DB CPU, memory, and disk I/O get saturated first.

At higher traffic, application servers can become CPU or memory bottlenecks due to processing many concurrent requests. Network bandwidth can also become a bottleneck when serving large amounts of data or media.

Scaling Solutions
  • Database: Use read replicas to spread read load, connection pooling to reduce overhead, and sharding to split data across multiple DB instances.
  • Application Servers: Add more servers horizontally behind load balancers to distribute traffic evenly.
  • Caching: Use in-memory caches like Redis or Memcached to reduce DB load for frequent queries.
  • Content Delivery Network (CDN): Offload static content delivery to edge servers closer to users, reducing bandwidth and latency.
  • Storage: Use tiered storage and archival for old data to reduce expensive fast storage usage.
Back-of-Envelope Cost Analysis
  • At 10,000 users generating ~5,000 RPS, a single DB instance near max capacity (5,000-10,000 QPS) may need read replicas.
  • Each server can handle ~1,000-5,000 concurrent connections; so 2-5 app servers needed at this scale.
  • Network bandwidth at 1 Gbps (~125 MB/s) can handle roughly 10,000-20,000 requests per second depending on payload size.
  • Storage grows with data; for example, 1 million users might generate terabytes of data requiring distributed storage.
Interview Tip: Structuring Scalability Discussion

Start by understanding current traffic and system limits. Identify the first bottleneck clearly (usually DB). Discuss how traffic growth affects each component. Propose targeted solutions like caching, read replicas, horizontal scaling. Mention trade-offs and cost implications. Use real numbers to show understanding.

Self Check Question

Question: Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: The first step is to add read replicas to distribute read queries and reduce load on the primary database. This helps handle increased read traffic without immediate costly sharding or redesign.

Key Result
The database is usually the first bottleneck as traffic grows; scaling it with read replicas, caching, and sharding is key to handling increasing user requests efficiently.