| Scale | Throughput (requests/sec) | Latency (ms) | Availability (%) | What Changes? |
|---|---|---|---|---|
| 100 users | ~50 | ~50 | 99.9 | Single server handles requests easily; low latency; simple setup |
| 10,000 users | ~5,000 | ~100 | 99.95 | Need load balancer; some caching; latency slightly increases |
| 1,000,000 users | ~500,000 | ~200 | 99.99 | Multiple servers; database replicas; CDN; latency affected by network |
| 100,000,000 users | ~50,000,000 | ~300+ | 99.999 | Global distribution; sharding; advanced caching; complex failover |
Throughput, latency, and availability in HLD - Scalability & System Analysis
At small scale, the application server CPU and memory can handle throughput and latency well.
As users grow to thousands, the database becomes the first bottleneck because it handles many read/write operations.
At millions of users, network bandwidth and latency limit performance, affecting availability.
- Horizontal scaling: Add more servers behind load balancers to increase throughput and reduce latency.
- Caching: Use in-memory caches (like Redis) to reduce database load and improve latency.
- Database replication: Use read replicas to spread read traffic and improve availability.
- Sharding: Split database by user or data to handle large scale writes and reads.
- Content Delivery Network (CDN): Cache static content closer to users to reduce latency globally.
- Failover and redundancy: Use multiple data centers and automatic failover to improve availability.
- At 1,000 users: ~50 requests/sec, easily handled by 1 server with 1 Gbps network.
- At 1 million users: ~500,000 requests/sec, need ~100 servers (assuming 5,000 req/sec/server).
- Database: Single instance handles ~10,000 QPS; need replicas and sharding beyond that.
- Bandwidth: 1 Gbps = 125 MB/s; high throughput requires multiple network interfaces or data centers.
- Latency: Network and disk I/O dominate; caching reduces disk reads and improves response times.
Start by defining throughput, latency, and availability in simple terms.
Explain how each metric affects user experience and system design.
Discuss bottlenecks at different scales and propose targeted solutions.
Use real numbers to show understanding of system limits and scaling techniques.
Your database handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas to distribute read load and implement caching to reduce database hits. Consider sharding if writes also increase significantly.