| Users / Requests | 100 Users | 10K Users | 1M Users | 100M Users |
|---|---|---|---|---|
| Requests per second (QPS) | ~100 | ~10,000 | ~1,000,000 | ~100,000,000 |
| Data size | ~10 GB | ~1 TB | ~100 TB | ~10 PB |
| Number of servers | 1-2 | 10-20 | 100-200 | 10,000+ |
| Latency | <1 ms | <5 ms | <10 ms | <20 ms |
| Storage type | SSD local | Distributed SSD | Sharded distributed storage | Multi-region distributed storage |
| Replication | Simple master-slave | Multi-replica for availability | Geo-replication | Global replication with consistency |
Design a key-value store in HLD - Scalability & System Analysis
At small scale (100 users), the database server CPU and disk I/O are the first bottlenecks because a single server handles all requests and data.
At medium scale (10K to 1M users), the database query throughput and network bandwidth become bottlenecks as requests increase beyond a single server's capacity.
At large scale (100M users), data partitioning and cross-region replication latency become bottlenecks due to massive data size and global distribution.
- Vertical scaling: Upgrade server CPU, RAM, and SSDs for small scale.
- Horizontal scaling: Add more servers behind a load balancer to distribute requests.
- Sharding: Split data by key ranges or hash to distribute storage and load across servers.
- Caching: Use in-memory caches (e.g., Redis) to reduce database load for frequent reads.
- Replication: Use master-slave or multi-master replication for availability and read scaling.
- Consistent hashing: To minimize data movement when scaling out or in.
- Geo-distribution: Deploy data centers closer to users to reduce latency at large scale.
- At 1M QPS, assuming 1KB per request, bandwidth needed is ~1 GB/s (8 Gbps).
- Storage for 100 TB data requires multiple SSD servers; each SSD ~4 TB, so ~25 servers minimum.
- Each server handles ~5,000 QPS; for 1M QPS, need ~200 servers.
- Network infrastructure must support high throughput and low latency.
- Replication doubles storage and bandwidth needs.
Start by clarifying requirements: data size, read/write ratio, latency needs.
Discuss simple design first, then identify bottlenecks as scale grows.
Explain how each scaling solution addresses specific bottlenecks.
Use real numbers to justify design choices and trade-offs.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce load on the primary database before scaling vertically or sharding.
