Bird
Raised Fist0
HLDsystem_design~10 mins

Search and recommendation in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Search and recommendation
Growth Table: Search and Recommendation System
ScaleUsersSearch Queries/SecondRecommendation Requests/SecondData Size (Items, User Profiles)System Changes
Small1001051K items, 100 profilesSingle server, simple DB, no caching
Medium10K1K500100K items, 10K profilesLoad balancer, DB replicas, caching layer
Large1M100K50K10M items, 1M profilesDistributed search cluster, sharded DB, ML model serving
Very Large100M10M5M1B+ items, 100M profilesMulti-region deployment, CDN, advanced sharding, real-time streaming
First Bottleneck

At small scale, the database is the first bottleneck because it handles all search queries and recommendation data lookups. As traffic grows, the single DB cannot handle the query load and latency increases.

Scaling Solutions
  • Database scaling: Add read replicas to distribute query load and use connection pooling.
  • Caching: Use in-memory caches (e.g., Redis) for frequent queries and recommendation results.
  • Search scaling: Deploy distributed search engines (e.g., Elasticsearch) to handle large data and queries.
  • Sharding: Partition user profiles and item data across multiple databases to reduce single DB load.
  • Horizontal scaling: Add more application servers behind load balancers to handle increased traffic.
  • CDN: Use content delivery networks to cache static recommendation content closer to users.
  • ML model serving: Use dedicated servers or services for recommendation model inference to offload app servers.
Back-of-Envelope Cost Analysis

At 1M users with 100K search queries/sec and 50K recommendation requests/sec:

  • Database: Needs to handle ~150K QPS (queries per second). A single PostgreSQL instance handles ~10K QPS, so at least 15 replicas or sharded DBs are needed.
  • Cache: Redis can handle ~100K ops/sec per instance, so multiple Redis nodes are required for caching.
  • Network bandwidth: Assuming 1KB per query/response, total bandwidth ~150MB/s, requiring multiple 1Gbps network links or 10Gbps links.
  • Storage: 10M items and 1M user profiles may require terabytes of storage, preferably on distributed storage systems.
Interview Tip

Start by clarifying scale and traffic patterns. Identify the main components: search engine, recommendation engine, database, cache, and network. Discuss bottlenecks at each scale and propose targeted solutions like caching, sharding, and horizontal scaling. Use real numbers to justify your choices and show understanding of trade-offs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: Add read replicas and implement caching to reduce load on the primary database. This distributes query load and improves response times before considering more complex solutions like sharding.

Key Result
The database is the first bottleneck as traffic grows; scaling requires read replicas, caching, distributed search, and sharding to handle millions of users and queries efficiently.