| Users/Traffic | Requests per Second | Server Load | Database Load | Network Bandwidth | Storage Needs |
|---|---|---|---|---|---|
| 100 users | ~10-50 RPS | Single server handles easily | Single DB instance sufficient | Low bandwidth usage | Minimal storage |
| 10,000 users | ~1,000-5,000 RPS | Multiple app servers needed | DB nearing capacity, may need read replicas | Moderate bandwidth usage | Growing storage, backups needed |
| 1,000,000 users | ~100,000 RPS | Horizontal scaling essential | DB sharding or distributed DB needed | High bandwidth, CDN recommended | Large storage, tiered storage useful |
| 100,000,000 users | ~10,000,000 RPS | Massive cluster of servers | Multiple shards, distributed DB clusters | Very high bandwidth, global CDN | Petabytes of storage, archival systems |
Why scalability handles growing traffic in HLD - Scalability Evidence
At low to medium traffic, the database is usually the first bottleneck. This is because it handles all data reads and writes, and has limited query throughput (typically 5,000-10,000 queries per second for a single instance). As traffic grows, the DB CPU, memory, and disk I/O get saturated first.
At higher traffic, application servers can become CPU or memory bottlenecks due to processing many concurrent requests. Network bandwidth can also become a bottleneck when serving large amounts of data or media.
- Database: Use read replicas to spread read load, connection pooling to reduce overhead, and sharding to split data across multiple DB instances.
- Application Servers: Add more servers horizontally behind load balancers to distribute traffic evenly.
- Caching: Use in-memory caches like Redis or Memcached to reduce DB load for frequent queries.
- Content Delivery Network (CDN): Offload static content delivery to edge servers closer to users, reducing bandwidth and latency.
- Storage: Use tiered storage and archival for old data to reduce expensive fast storage usage.
- At 10,000 users generating ~5,000 RPS, a single DB instance near max capacity (5,000-10,000 QPS) may need read replicas.
- Each server can handle ~1,000-5,000 concurrent connections; so 2-5 app servers needed at this scale.
- Network bandwidth at 1 Gbps (~125 MB/s) can handle roughly 10,000-20,000 requests per second depending on payload size.
- Storage grows with data; for example, 1 million users might generate terabytes of data requiring distributed storage.
Start by understanding current traffic and system limits. Identify the first bottleneck clearly (usually DB). Discuss how traffic growth affects each component. Propose targeted solutions like caching, read replicas, horizontal scaling. Mention trade-offs and cost implications. Use real numbers to show understanding.
Question: Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?
Answer: The first step is to add read replicas to distribute read queries and reduce load on the primary database. This helps handle increased read traffic without immediate costly sharding or redesign.