| Scale | Users | Data Volume | Traffic | System Behavior |
|---|---|---|---|---|
| Small | 100 users | Low (MBs) | Low (few QPS) | Single server handles all; simple design |
| Medium | 10,000 users | Moderate (GBs) | Moderate (hundreds QPS) | Server CPU/memory stressed; DB load increases |
| Large | 1,000,000 users | High (TBs) | High (thousands QPS) | Single server insufficient; DB bottleneck; latency rises |
| Very Large | 100,000,000 users | Very High (PBs) | Very High (hundreds of thousands QPS) | Need distributed systems; partitioning; multi-region |
Why distributed patterns solve common challenges in HLD - Scalability Evidence
At small scale, a single server and database can handle all requests. As users grow, the database becomes the first bottleneck because it can only process a limited number of queries per second (usually up to 5,000-10,000 QPS for a single instance). CPU and memory on the application server also get stressed as traffic increases. Network bandwidth and storage limits appear at very large scale.
Distributed patterns solve these by spreading load across many machines, avoiding single points of failure and scaling horizontally.
- Horizontal Scaling: Add more servers to share load, improving capacity and fault tolerance.
- Load Balancing: Distribute incoming requests evenly to prevent overload on any one server.
- Database Replication: Use read replicas to handle read-heavy traffic, reducing load on primary DB.
- Sharding: Split data across multiple databases by key ranges or hashes to handle large data volumes.
- Caching: Store frequent data in fast memory (e.g., Redis) to reduce database hits.
- Message Queues: Decouple components and smooth traffic spikes by asynchronous processing.
- CDNs: Cache static content closer to users to reduce bandwidth and latency.
- Multi-region Deployment: Place servers near users to reduce latency and improve availability.
Example for 1 million users with 1 request per second each:
- Requests per second: 1,000,000 QPS (too high for single DB)
- Database capacity: Single DB ~10,000 QPS -> Need ~100 DB shards or replicas
- Network bandwidth: 1 Gbps = 125 MB/s; 1M QPS with 1 KB payload = ~1 GB/s -> Need multiple network links
- Storage: TBs of data requiring distributed storage solutions
- Servers: Hundreds of app servers behind load balancers
Start by describing current system limits at small scale. Identify the first bottleneck as traffic grows. Explain why that component breaks (e.g., DB QPS limit). Then propose distributed solutions step-by-step, explaining how each solves a specific problem. Use real numbers to justify your choices. Finally, mention trade-offs and monitoring needs.
Question: Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?
Answer: The first step is to add read replicas to offload read queries from the primary database. This reduces load and increases read capacity. If writes also increase, consider sharding data or scaling the database vertically or horizontally.
