| Users/Traffic | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| 100 users | Single server with moderate CPU/RAM handles load easily | Single server; no need for multiple servers |
| 10,000 users | Upgrade server CPU, RAM, and storage to handle more load | Add a few servers behind a load balancer to share traffic |
| 1,000,000 users | Server hardware limits reached; expensive and risky upgrades | Many servers distributed; load balancer and data partitioning needed |
| 100,000,000 users | Vertical scaling impractical; single server cannot handle load | Large cluster of servers; complex orchestration and sharding required |
Vertical scaling vs horizontal scaling in HLD - Scaling Approaches Compared
With vertical scaling, the first bottleneck is the physical limits of a single server's CPU, memory, and storage capacity. Once maxed out, no further upgrades are possible without downtime or huge cost.
With horizontal scaling, the first bottleneck is often the coordination layer, such as load balancers or database consistency, which can become complex as servers increase.
- Vertical Scaling: Upgrade server hardware (CPU, RAM, SSD). Simple but limited by max hardware specs and cost.
- Horizontal Scaling: Add more servers to distribute load. Requires load balancers, data partitioning (sharding), and stateless design.
- Caching: Use caches (like Redis) to reduce load on servers and databases in both scaling types.
- Database Scaling: Vertical scaling by bigger DB server; horizontal scaling by read replicas and sharding.
- Network: Use CDNs to offload static content and reduce server load in horizontal scaling.
Assuming 1,000 requests per second (RPS) per server:
- Vertical scaling: Upgrading server to handle 5,000 RPS costs 3x more hardware but limited by max specs.
- Horizontal scaling: 5 servers at 1,000 RPS each; cost scales linearly but adds complexity.
- Storage: Vertical scaling requires bigger disks; horizontal scaling distributes data across servers.
- Bandwidth: 1 Gbps network supports ~125 MB/s; multiple servers share bandwidth load.
Start by explaining vertical scaling as upgrading a single machine's resources. Then contrast with horizontal scaling by adding more machines. Discuss pros and cons, bottlenecks, and when to choose each. Use simple examples like upgrading a car engine vs buying more cars to share the load.
Your database handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas to distribute read queries and reduce load on the primary database before considering vertical upgrades or sharding.