HLDsystem_design~10 mins

Design a search autocomplete in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Design a search autocomplete

Growth Table: Search Autocomplete Scaling

Users	Requests per Second (QPS)	Data Size	Latency Requirement	System Changes
100 users	~50 QPS	Thousands of keywords	<1 second	Single server, in-memory trie, simple cache
10,000 users	~5,000 QPS	Millions of keywords and queries	<200 ms	Load balancer, multiple app servers, Redis cache, read replicas
1,000,000 users	~500,000 QPS	Hundreds of millions keywords, user personalization data	<100 ms	Sharded databases, distributed cache, CDN for static assets, async updates
100,000,000 users	~50 million QPS	Billions of keywords, global personalization	<50 ms	Global data centers, multi-level caching, advanced sharding, ML model serving clusters

First Bottleneck

The first bottleneck is the database that stores keywords and query statistics. At low scale, a single database can handle autocomplete queries. As users grow, the database query rate and data size increase, causing slow response times and connection limits.

Scaling Solutions

Read Replicas: Add read-only database replicas to distribute query load.
Caching: Use in-memory caches (Redis, Memcached) to store popular autocomplete results.
Horizontal Scaling: Add more application servers behind a load balancer to handle concurrent requests.
Sharding: Partition the keyword data by prefix or user region to reduce single database load.
CDN: Cache static autocomplete assets globally to reduce latency.
Asynchronous Updates: Update autocomplete indexes asynchronously to avoid blocking queries.

Back-of-Envelope Cost Analysis

At 10,000 users with 5,000 QPS, assuming each autocomplete request is ~1 KB, bandwidth needed is ~5 MB/s. A Redis instance can handle ~100K ops/sec, so a few Redis nodes suffice. Database must handle 5K QPS; multiple read replicas needed. Storage for millions of keywords and user data can be tens of GBs. Network and CPU scale with request volume.

Interview Tip

Start by clarifying scale and latency needs. Identify bottlenecks step-by-step: database, cache, network. Propose solutions matching bottlenecks: caching for read-heavy, sharding for data size, horizontal scaling for concurrency. Discuss trade-offs and monitoring strategies.

Self Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching for popular autocomplete results to reduce database hits.

Key Result

The database is the first bottleneck as user requests grow; adding read replicas and caching are the primary scaling steps before moving to sharding and global distribution.