| Users/Traffic | What Changes? |
|---|---|
| 100 users | Single API server handles requests; simple database; no caching needed; low latency. |
| 10,000 users | Need load balancer; multiple API servers; database read replicas; introduce caching layer (e.g., Redis); rate limiting. |
| 1,000,000 users | API servers scaled horizontally with auto-scaling; database sharding; CDN for static content; advanced caching; API gateway for routing and security; asynchronous processing for heavy tasks. |
| 100,000,000 users | Global distributed API servers; multi-region database clusters; aggressive caching and edge computing; microservices split; strict rate limiting and quota management; event-driven architecture for scalability. |
REST API design for systems in HLD - Scalability & System Analysis
At small to medium scale, the database is the first bottleneck. It struggles to handle high query rates and concurrent connections from API servers. This causes increased latency and potential downtime.
- Database: Use read replicas to distribute read load; implement connection pooling; shard data by user or region.
- API Servers: Scale horizontally behind a load balancer; use stateless design for easy scaling.
- Caching: Add in-memory caches (Redis/Memcached) to reduce database hits for frequent queries.
- CDN: Serve static content and cache API responses at edge locations to reduce latency.
- API Gateway: Manage routing, authentication, rate limiting, and monitoring centrally.
- Asynchronous Processing: Offload heavy or long-running tasks to background workers or message queues.
- At 10,000 users, expect ~1000 QPS (requests per second) assuming 1 request per user per 10 seconds.
- Database needs to handle ~1000 QPS; a single PostgreSQL instance can handle up to ~5000 QPS, so read replicas help.
- API servers: each can handle ~2000 concurrent connections; 3-5 servers recommended for redundancy and load.
- Bandwidth: 1 Gbps network (~125 MB/s) sufficient for typical JSON payloads under 1 KB at 1000 QPS.
- Storage: depends on data retention; for 1 million users, expect tens to hundreds of GBs of data monthly.
Start by clarifying API usage patterns and expected traffic. Identify the main components: API servers, database, caching, and network. Discuss bottlenecks in order: database first, then servers, then network. Propose scaling solutions step-by-step with reasons. Mention trade-offs like consistency vs availability. Keep answers structured and focused.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching to reduce database hits. This addresses the database bottleneck before scaling API servers.