| Users | Traffic Characteristics | API Server Load | Database Load | Network | Other Changes |
|---|---|---|---|---|---|
| 100 users | Low requests/sec (~10-50) | Single server handles easily | Single DB instance, low QPS | Minimal bandwidth | Basic logging, no caching needed |
| 10,000 users | Moderate requests/sec (~1,000-5,000) | Multiple API servers behind load balancer | DB handles ~1,000 QPS, may need read replicas | Moderate bandwidth, consider CDN for static | Introduce caching (Redis), rate limiting |
| 1,000,000 users | High requests/sec (~100,000+) | Horizontal scaling of API servers, autoscaling | DB bottleneck likely, sharding or partitioning needed | High bandwidth, CDN critical for static and some API responses | Advanced caching, API gateway, throttling, monitoring |
| 100,000,000 users | Very high requests/sec (millions) | Global distributed API servers, multi-region load balancing | Multiple DB clusters, geo-distributed, complex sharding | Very high bandwidth, multi-CDN strategy | Microservices, event-driven, circuit breakers, extensive monitoring |
REST API best practices in HLD - Scalability & System Analysis
At small to medium scale, the database is the first bottleneck. It struggles to handle increasing query volume and complex joins. API servers and network usually handle load better initially. Without caching, DB load grows linearly with users.
- Horizontal scaling: Add more API servers behind load balancers to handle more concurrent requests.
- Caching: Use Redis or Memcached to cache frequent API responses and reduce DB load.
- Database read replicas: Offload read queries to replicas to reduce primary DB load.
- Sharding/Partitioning: Split database by user ID or region to distribute load.
- CDN: Cache static assets and some API responses close to users to reduce latency and bandwidth.
- API Gateway: Manage rate limiting, authentication, and routing efficiently.
- Monitoring and throttling: Detect and control traffic spikes to protect backend.
- At 10,000 users, expect ~1,000 QPS. One DB instance can handle ~5,000 QPS, so still okay.
- At 1M users, ~100,000 QPS likely. Need ~20 DB replicas or sharded clusters.
- Bandwidth: 1 Gbps = 125 MB/s. For 100,000 QPS with 1 KB response, need ~100 MB/s bandwidth.
- API servers: Each handles ~5,000 concurrent connections. For 100,000 QPS, need ~20 servers.
- Caching reduces DB QPS by 50-90%, lowering infrastructure cost.
Structure your scalability discussion by first identifying the main components: API servers, database, network. Then estimate load at different user scales. Identify the first bottleneck (usually DB). Propose targeted solutions like caching, read replicas, and sharding. Discuss trade-offs and monitoring. Keep explanations simple and focused.
Question: Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Introduce caching to reduce DB queries and add read replicas to distribute read load. This relieves DB pressure before considering sharding or more complex solutions.