Bird
Raised Fist0
LLDsystem_design~10 mins

Search and filter design in LLD - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Search and filter design
Growth Table: Search and Filter Design
Users / Queries100 Users10K Users1M Users100M Users
Search Queries per Second (QPS)10 QPS1,000 QPS50,000 QPS5,000,000 QPS
Data Size (Indexed Items)10K items1M items100M items10B items
Index SizeSmall (few GB)Medium (hundreds GB)Large (TBs)Very Large (PBs)
Latency Expectation<100 ms<200 ms<300 ms<500 ms
InfrastructureSingle server with local indexCluster with distributed indexMulti-region clusters with replicationGlobal distributed system with sharding and CDN
Filter ComplexitySimple filters (few fields)Moderate filters (multi-field)Complex filters with facets and rangesHighly dynamic filters with personalization
First Bottleneck

The first bottleneck is the search index and query processing. As users and data grow, the index size increases, making queries slower. The CPU and memory on the search servers become overwhelmed handling complex filters and high QPS.

Scaling Solutions
  • Horizontal scaling: Add more search nodes to distribute query load and index shards.
  • Index sharding: Split the index into smaller parts by data ranges or categories to reduce query scope.
  • Caching: Cache frequent queries and filter results to reduce repeated computation.
  • Pre-aggregation: For filters, precompute counts or facets to speed up filtering.
  • Load balancing: Use smart routing to send queries to the least busy nodes.
  • Use of CDN: For static filter metadata or autocomplete suggestions, serve from CDN to reduce backend load.
  • Asynchronous processing: For complex filters, consider background jobs to prepare results.
Back-of-Envelope Cost Analysis

At 10K users generating 1,000 QPS, each query touching 1MB of index data means 1GB/s data throughput. This requires multiple servers with fast SSDs and high network bandwidth.

Storage for 1M items with index size ~100 bytes per item is ~100MB, but with inverted indexes and facets, it can grow to 100GB+.

At 1M users and 50,000 QPS, network bandwidth and CPU become critical. Each server can handle ~5,000 QPS, so at least 10 servers are needed just for query handling.

Interview Tip

Start by clarifying the scale and query patterns. Discuss data size, query complexity, and latency needs. Identify bottlenecks early (index size, CPU, memory). Propose incremental scaling: caching, sharding, horizontal scaling. Mention trade-offs like consistency and freshness of index.

Self Check

Your search database handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add horizontal scaling by adding more search nodes and shard the index to distribute the query load. Also, implement caching for frequent queries to reduce load.

Key Result
Search and filter systems first break at the search index and query processing due to CPU and memory limits. Scaling horizontally with sharding and caching is key to handle growth from thousands to millions of users.

Practice

(1/5)
1. What is the main purpose of adding filters in a search system?
easy
A. To slow down the search process for accuracy
B. To increase the total number of search results
C. To narrow down search results based on user preferences
D. To remove the search bar from the interface

Solution

  1. Step 1: Understand the role of filters in search

    Filters help users reduce the number of results by selecting specific criteria.
  2. Step 2: Identify the effect of filters on results

    Filters narrow results to match user preferences, making search faster and more relevant.
  3. Final Answer:

    To narrow down search results based on user preferences -> Option C
  4. Quick Check:

    Filters narrow results = C [OK]
Hint: Filters reduce results to match user needs quickly [OK]
Common Mistakes:
  • Thinking filters increase results
  • Assuming filters slow down search intentionally
  • Confusing filters with UI removal
2. Which of the following is the correct way to represent a filter for price less than $100 in a query parameter?
easy
A. price>100
B. price<100
C. price=100
D. price!=100

Solution

  1. Step 1: Understand comparison operators in queries

    The symbol '<' means less than, so 'price<100' filters prices below 100.
  2. Step 2: Eliminate incorrect operators

    '>' means greater than, '=' means equal, '!=' means not equal, so they don't match 'less than 100'.
  3. Final Answer:

    price<100 -> Option B
  4. Quick Check:

    Less than operator = A [OK]
Hint: Use '<' for less than in filters [OK]
Common Mistakes:
  • Using '>' instead of '<' for less than
  • Confusing '=' with less than
  • Using '!=' which means not equal
3. Consider a search system that indexes products by category and price. If a user searches with filters category='books' and price < 20, which data structure best supports fast filtering?
medium
A. A hash map keyed by category with sorted price lists
B. A single unsorted list of all products
C. A queue of products ordered by insertion time
D. A stack of products sorted by price

Solution

  1. Step 1: Analyze filtering needs

    Filtering by category and price requires quick lookup by category and efficient price range queries.
  2. Step 2: Choose data structure supporting these queries

    A hash map keyed by category allows fast category lookup; sorted price lists enable quick filtering by price.
  3. Final Answer:

    A hash map keyed by category with sorted price lists -> Option A
  4. Quick Check:

    Hash map + sorted list = B [OK]
Hint: Use hash map for categories and sorted lists for range filters [OK]
Common Mistakes:
  • Using unsorted lists causing slow searches
  • Using queue or stack which don't support efficient filtering
  • Ignoring the need for sorting by price
4. A search filter system is returning incorrect results when filtering by date range. Which of the following is the most likely cause?
medium
A. Date values are stored as strings and compared lexicographically
B. The filter uses numeric comparison on date objects
C. The database index is on the wrong column
D. The search query is missing a filter parameter

Solution

  1. Step 1: Understand date comparison issues

    Comparing dates stored as strings can cause wrong order because string comparison is lexicographic.
  2. Step 2: Identify why this causes incorrect results

    Dates like '12/01/2023' and '02/12/2023' compared as strings may not sort correctly, causing wrong filter results.
  3. Final Answer:

    Date values are stored as strings and compared lexicographically -> Option A
  4. Quick Check:

    String date comparison causes errors = A [OK]
Hint: Store dates as date objects, not strings [OK]
Common Mistakes:
  • Assuming numeric comparison works on strings
  • Ignoring index relevance
  • Thinking missing filter param causes wrong filtered results
5. You are designing a scalable search and filter system for an e-commerce site with millions of products. Which approach best balances fast search and flexible filtering?
hard
A. Load all products into memory and filter using loops
B. Store all products in a single SQL table and scan it for every search
C. Use a simple key-value store without indexes
D. Use a distributed search engine with inverted indexes and faceted filters

Solution

  1. Step 1: Consider scalability and performance needs

    Millions of products require fast, scalable search with flexible filters.
  2. Step 2: Evaluate options for search and filtering

    Distributed search engines with inverted indexes enable fast text search; faceted filters allow flexible attribute filtering efficiently.
  3. Step 3: Eliminate inefficient approaches

    Scanning large SQL tables or in-memory filtering is slow and not scalable; key-value stores lack complex search capabilities.
  4. Final Answer:

    Use a distributed search engine with inverted indexes and faceted filters -> Option D
  5. Quick Check:

    Distributed search + faceted filters = D [OK]
Hint: Use distributed search with faceted filters for scale [OK]
Common Mistakes:
  • Relying on full table scans for large data
  • Ignoring indexing for search speed
  • Using memory-heavy filtering for millions of items