Bird
Raised Fist0
HLDsystem_design~10 mins

Design a search autocomplete in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Design a search autocomplete
Growth Table: Search Autocomplete Scaling
UsersRequests per Second (QPS)Data SizeLatency RequirementSystem Changes
100 users~50 QPSThousands of keywords<1 secondSingle server, in-memory trie, simple cache
10,000 users~5,000 QPSMillions of keywords and queries<200 msLoad balancer, multiple app servers, Redis cache, read replicas
1,000,000 users~500,000 QPSHundreds of millions keywords, user personalization data<100 msSharded databases, distributed cache, CDN for static assets, async updates
100,000,000 users~50 million QPSBillions of keywords, global personalization<50 msGlobal data centers, multi-level caching, advanced sharding, ML model serving clusters
First Bottleneck

The first bottleneck is the database that stores keywords and query statistics. At low scale, a single database can handle autocomplete queries. As users grow, the database query rate and data size increase, causing slow response times and connection limits.

Scaling Solutions
  • Read Replicas: Add read-only database replicas to distribute query load.
  • Caching: Use in-memory caches (Redis, Memcached) to store popular autocomplete results.
  • Horizontal Scaling: Add more application servers behind a load balancer to handle concurrent requests.
  • Sharding: Partition the keyword data by prefix or user region to reduce single database load.
  • CDN: Cache static autocomplete assets globally to reduce latency.
  • Asynchronous Updates: Update autocomplete indexes asynchronously to avoid blocking queries.
Back-of-Envelope Cost Analysis

At 10,000 users with 5,000 QPS, assuming each autocomplete request is ~1 KB, bandwidth needed is ~5 MB/s. A Redis instance can handle ~100K ops/sec, so a few Redis nodes suffice. Database must handle 5K QPS; multiple read replicas needed. Storage for millions of keywords and user data can be tens of GBs. Network and CPU scale with request volume.

Interview Tip

Start by clarifying scale and latency needs. Identify bottlenecks step-by-step: database, cache, network. Propose solutions matching bottlenecks: caching for read-heavy, sharding for data size, horizontal scaling for concurrency. Discuss trade-offs and monitoring strategies.

Self Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching for popular autocomplete results to reduce database hits.

Key Result
The database is the first bottleneck as user requests grow; adding read replicas and caching are the primary scaling steps before moving to sharding and global distribution.