| Scale | Users | Search Queries/Second | Recommendation Requests/Second | Data Size (Items, User Profiles) | System Changes |
|---|---|---|---|---|---|
| Small | 100 | 10 | 5 | 1K items, 100 profiles | Single server, simple DB, no caching |
| Medium | 10K | 1K | 500 | 100K items, 10K profiles | Load balancer, DB replicas, caching layer |
| Large | 1M | 100K | 50K | 10M items, 1M profiles | Distributed search cluster, sharded DB, ML model serving |
| Very Large | 100M | 10M | 5M | 1B+ items, 100M profiles | Multi-region deployment, CDN, advanced sharding, real-time streaming |
Search and recommendation in HLD - Scalability & System Analysis
At small scale, the database is the first bottleneck because it handles all search queries and recommendation data lookups. As traffic grows, the single DB cannot handle the query load and latency increases.
- Database scaling: Add read replicas to distribute query load and use connection pooling.
- Caching: Use in-memory caches (e.g., Redis) for frequent queries and recommendation results.
- Search scaling: Deploy distributed search engines (e.g., Elasticsearch) to handle large data and queries.
- Sharding: Partition user profiles and item data across multiple databases to reduce single DB load.
- Horizontal scaling: Add more application servers behind load balancers to handle increased traffic.
- CDN: Use content delivery networks to cache static recommendation content closer to users.
- ML model serving: Use dedicated servers or services for recommendation model inference to offload app servers.
At 1M users with 100K search queries/sec and 50K recommendation requests/sec:
- Database: Needs to handle ~150K QPS (queries per second). A single PostgreSQL instance handles ~10K QPS, so at least 15 replicas or sharded DBs are needed.
- Cache: Redis can handle ~100K ops/sec per instance, so multiple Redis nodes are required for caching.
- Network bandwidth: Assuming 1KB per query/response, total bandwidth ~150MB/s, requiring multiple 1Gbps network links or 10Gbps links.
- Storage: 10M items and 1M user profiles may require terabytes of storage, preferably on distributed storage systems.
Start by clarifying scale and traffic patterns. Identify the main components: search engine, recommendation engine, database, cache, and network. Discuss bottlenecks at each scale and propose targeted solutions like caching, sharding, and horizontal scaling. Use real numbers to justify your choices and show understanding of trade-offs.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?
Answer: Add read replicas and implement caching to reduce load on the primary database. This distributes query load and improves response times before considering more complex solutions like sharding.
