| Users | Requests per Second (QPS) | Data Size | Latency Requirement | System Changes |
|---|---|---|---|---|
| 100 users | ~50 QPS | Thousands of keywords | <1 second | Single server, in-memory trie, simple cache |
| 10,000 users | ~5,000 QPS | Millions of keywords and queries | <200 ms | Load balancer, multiple app servers, Redis cache, read replicas |
| 1,000,000 users | ~500,000 QPS | Hundreds of millions keywords, user personalization data | <100 ms | Sharded databases, distributed cache, CDN for static assets, async updates |
| 100,000,000 users | ~50 million QPS | Billions of keywords, global personalization | <50 ms | Global data centers, multi-level caching, advanced sharding, ML model serving clusters |
Design a search autocomplete in HLD - Scalability & System Analysis
The first bottleneck is the database that stores keywords and query statistics. At low scale, a single database can handle autocomplete queries. As users grow, the database query rate and data size increase, causing slow response times and connection limits.
- Read Replicas: Add read-only database replicas to distribute query load.
- Caching: Use in-memory caches (Redis, Memcached) to store popular autocomplete results.
- Horizontal Scaling: Add more application servers behind a load balancer to handle concurrent requests.
- Sharding: Partition the keyword data by prefix or user region to reduce single database load.
- CDN: Cache static autocomplete assets globally to reduce latency.
- Asynchronous Updates: Update autocomplete indexes asynchronously to avoid blocking queries.
At 10,000 users with 5,000 QPS, assuming each autocomplete request is ~1 KB, bandwidth needed is ~5 MB/s. A Redis instance can handle ~100K ops/sec, so a few Redis nodes suffice. Database must handle 5K QPS; multiple read replicas needed. Storage for millions of keywords and user data can be tens of GBs. Network and CPU scale with request volume.
Start by clarifying scale and latency needs. Identify bottlenecks step-by-step: database, cache, network. Propose solutions matching bottlenecks: caching for read-heavy, sharding for data size, horizontal scaling for concurrency. Discuss trade-offs and monitoring strategies.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching for popular autocomplete results to reduce database hits.
