Bird
Raised Fist0
LLDsystem_design~25 mins

Search functionality design in LLD - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Search Functionality System
Design covers search query processing, indexing, ranking, and autocomplete. Does not cover user authentication or data ingestion pipelines.
Functional Requirements
FR1: Allow users to search for items by keywords
FR2: Support typo tolerance and partial matches
FR3: Return results ranked by relevance
FR4: Handle 1000 concurrent search requests
FR5: Provide search results within 300ms latency
FR6: Support filtering results by categories
FR7: Allow autocomplete suggestions as user types
Non-Functional Requirements
NFR1: System must be available 99.9% of the time
NFR2: Search index must update within 5 minutes of data changes
NFR3: Support up to 10 million searchable items
NFR4: Use scalable and cost-effective technologies
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Search index storage (e.g., inverted index)
Query parser and processor
Ranking and scoring module
Autocomplete suggestion engine
Cache layer for popular queries
Data update and index refresh mechanism
Design Patterns
Inverted index for fast keyword lookup
Trie or prefix tree for autocomplete
TF-IDF or BM25 ranking algorithms
Caching frequently searched queries
Batch and incremental index updates
Reference Architecture
User
  |
  v
Search API Server <--> Cache Layer <--> Search Index Storage
  |
  v
Autocomplete Service
  |
  v
Data Update Service --> Index Updater --> Search Index Storage
Components
Search API Server
Node.js or Python REST API
Receives search requests, parses queries, and returns ranked results
Cache Layer
Redis
Stores results of popular queries to reduce latency
Search Index Storage
Elasticsearch or Apache Lucene
Stores inverted index and supports fast keyword search and ranking
Autocomplete Service
Trie data structure in memory or Elasticsearch completion suggester
Provides real-time autocomplete suggestions as user types
Data Update Service
Batch jobs or streaming pipeline
Processes new or updated items and triggers index refresh
Index Updater
Elasticsearch bulk API or custom indexer
Updates the search index with new data within 5 minutes
Request Flow
1. User sends search query to Search API Server
2. Search API Server checks Cache Layer for cached results
3. If cache miss, Search API Server queries Search Index Storage
4. Search Index Storage returns ranked results based on query
5. Search API Server returns results to user and caches them
6. For autocomplete, user input is sent to Autocomplete Service
7. Autocomplete Service returns suggestions in real-time
8. Data Update Service processes new data and sends to Index Updater
9. Index Updater refreshes Search Index Storage within 5 minutes
Database Schema
Entities: - Item: id (PK), title, description, category, tags, updated_at - SearchIndex: inverted index structure mapping keywords to item ids - AutocompleteTrie: prefix tree nodes storing partial keywords Relationships: - Items are indexed into SearchIndex for keyword lookup - AutocompleteTrie built from item titles and keywords for suggestions
Scaling Discussion
Bottlenecks
Search API Server CPU and memory limits under high concurrency
Cache Layer memory capacity for popular queries
Search Index Storage disk I/O and query throughput
Index update latency with large data volumes
Autocomplete service response time with large prefix sets
Solutions
Scale Search API Server horizontally behind load balancer
Use distributed cache clusters and eviction policies
Shard search index across multiple nodes and use replicas
Implement incremental and parallel index updates
Optimize autocomplete data structures and cache hot prefixes
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Clarify search use cases and data update frequency
Explain choice of inverted index and autocomplete structures
Describe caching strategy to reduce latency
Discuss ranking algorithms and relevance tuning
Address scaling challenges and solutions
Highlight trade-offs between freshness and performance

Practice

(1/5)
1. What is the main purpose of building an index in a search functionality system?
easy
A. To compress data for storage
B. To store user passwords securely
C. To display images faster on the screen
D. To quickly find data entries matching search keywords

Solution

  1. Step 1: Understand the role of an index in search

    An index maps keywords to data entries, enabling fast lookup instead of scanning all data.
  2. Step 2: Identify the correct purpose

    Since search needs to find matching data quickly, the index helps achieve this by direct access.
  3. Final Answer:

    To quickly find data entries matching search keywords -> Option D
  4. Quick Check:

    Index = Fast keyword lookup [OK]
Hint: Index means fast lookup, not storage or compression [OK]
Common Mistakes:
  • Confusing index with data compression
  • Thinking index stores passwords
  • Assuming index speeds up image display
2. Which data structure is commonly used to implement a search index for keyword lookup?
easy
A. Hash map
B. Linked list
C. Stack
D. Queue

Solution

  1. Step 1: Recall common data structures for fast lookup

    Hash maps provide average O(1) time for key-based access, ideal for mapping keywords to data.
  2. Step 2: Eliminate other options

    Linked lists, stacks, and queues do not provide efficient direct lookup by key.
  3. Final Answer:

    Hash map -> Option A
  4. Quick Check:

    Hash map = Fast key lookup [OK]
Hint: Hash maps give fast key-based access, perfect for indexes [OK]
Common Mistakes:
  • Choosing linked list which is slow for lookup
  • Confusing stack or queue with key-value storage
  • Ignoring average O(1) lookup time of hash maps
3. Consider a search system where the index maps keywords to document IDs. If the keyword 'apple' maps to [1, 3, 5] and 'banana' maps to [2, 3], what is the result of searching for documents containing both 'apple' and 'banana'?
medium
A. [1, 2, 3, 5]
B. [3]
C. [1, 5]
D. [2, 3, 5]

Solution

  1. Step 1: Identify documents for each keyword

    'apple' maps to documents [1, 3, 5], 'banana' maps to [2, 3].
  2. Step 2: Find intersection of document lists

    Documents containing both keywords are in both lists. Intersection of [1, 3, 5] and [2, 3] is [3].
  3. Final Answer:

    [3] -> Option B
  4. Quick Check:

    Intersection = [3] [OK]
Hint: Search AND means intersection of document lists [OK]
Common Mistakes:
  • Merging lists instead of intersecting
  • Confusing union with intersection
  • Ignoring common documents
4. A search system uses a hash map to store keyword to document ID mappings. The code snippet below has a bug:
index = {}
keywords = ['apple', 'banana', 'apple']
docs = [1, 2, 3]
for i in range(len(keywords)):
    index[keywords[i]] = docs[i]
print(index)
What is the bug in this code?
medium
A. It overwrites previous document IDs for duplicate keywords
B. It uses a list instead of a dictionary
C. It does not initialize the index
D. It uses wrong loop range

Solution

  1. Step 1: Analyze how index is updated

    The loop assigns index[keyword] = doc, so duplicate keywords overwrite previous values.
  2. Step 2: Identify the bug

    For 'apple', first doc 1 is stored, then overwritten by doc 3, losing doc 1.
  3. Final Answer:

    It overwrites previous document IDs for duplicate keywords -> Option A
  4. Quick Check:

    Duplicate keys overwrite values in hash map [OK]
Hint: Duplicate keys overwrite values unless stored as list [OK]
Common Mistakes:
  • Thinking loop range is wrong
  • Assuming index is not initialized
  • Confusing data structure type
5. You are designing a search system for a large online store with millions of products. To support fast search by keywords and handle high user traffic, which combination of design choices is best?
hard
A. Use a simple list of products and filter by keywords on the client side
B. Store all product data in a single SQL table and scan it for each search
C. Use an inverted index stored in a distributed NoSQL database with caching layers
D. Build an index only for the top 10 products and search others sequentially

Solution

  1. Step 1: Consider scalability and speed needs

    Millions of products and high traffic require fast, scalable search with distributed storage and caching.
  2. Step 2: Evaluate options

    Use an inverted index stored in a distributed NoSQL database with caching layers uses inverted index (fast keyword lookup), distributed NoSQL (scalable), and caching (speed). Others are slow or incomplete.
  3. Final Answer:

    Use an inverted index stored in a distributed NoSQL database with caching layers -> Option C
  4. Quick Check:

    Inverted index + distributed storage + cache = scalable fast search [OK]
Hint: Combine inverted index, distributed DB, and cache for scale [OK]
Common Mistakes:
  • Choosing full table scan for large data
  • Filtering on client side for millions of items
  • Indexing only a small subset of data