Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Search and Filter System
Design covers backend search and filter functionality including data storage, indexing, query processing, and API design. Frontend UI and data ingestion pipelines are out of scope.
Functional Requirements
FR1: Allow users to search items by keywords
FR2: Support filtering results by multiple attributes (e.g., category, price range, rating)
FR3: Return results with low latency (p99 < 300ms)
FR4: Handle up to 5,000 concurrent search requests
FR5: Support pagination of results
FR6: Allow sorting results by relevance or attribute values
Non-Functional Requirements
NFR1: System must be highly available (99.9% uptime)
NFR2: Search results should be consistent within 5 seconds of data updates
NFR3: Support up to 1 million items in the catalog
NFR4: API response size should be limited to 1MB
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Search index (e.g., inverted index, attribute indexes)
Query parser and filter processor
API layer for search requests
Data storage for items and attributes
Cache layer for popular queries
Design Patterns
Inverted index for keyword search
Bitmap indexes or B-trees for attribute filtering
Pagination and sorting strategies
Caching frequently requested queries
Eventual consistency for index updates
Reference Architecture
Client
|
v
Search API Layer
|
v
Query Processor --> Cache Layer
|
v
Search Index (Inverted + Attribute Indexes)
|
v
Data Storage (Relational DB or NoSQL)
|
v
Data Update Pipeline (out of scope)
Components
Search API Layer
RESTful API with JSON
Receives search and filter requests, validates input, and returns results
Query Processor
Custom service or search engine query parser
Parses user queries, applies filters, sorts and paginates results
Cache Layer
Redis or Memcached
Caches popular query results to reduce latency and load
Search Index
Elasticsearch or Apache Lucene
Stores inverted index for keywords and attribute indexes for filters
Data Storage
PostgreSQL or MongoDB
Stores raw item data and attributes for indexing and retrieval
Request Flow
1. Client sends search request with keywords and filters to Search API Layer
2. API Layer validates request and forwards to Query Processor
3. Query Processor checks Cache Layer for existing results
4. If cache miss, Query Processor queries Search Index with keywords and filters
5. Search Index returns matching item IDs sorted by relevance or attribute
6. Query Processor fetches item details from Data Storage if needed
7. Results are paginated and returned to API Layer
8. API Layer sends results back to client
9. Data updates trigger asynchronous reindexing to keep Search Index fresh
Database Schema
Entities:
- Item: id (PK), title, description, category, price, rating, other attributes
- Indexes: Inverted index on title and description keywords
- Attribute indexes on category, price range, rating
Relationships:
- One-to-many from Item to attributes
- Indexes maintain mappings from keywords/attributes to item IDs
Scaling Discussion
Bottlenecks
Search Index query latency under high concurrent load
Cache misses causing heavy load on Search Index and Data Storage
Index update delays causing stale search results
Large result sets causing high memory and network usage
Solutions
Shard Search Index horizontally by item ID or attribute ranges
Increase cache size and use query result caching with TTL
Use incremental and asynchronous index updates with near real-time indexing
Implement efficient pagination and limit maximum page size to reduce payload
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Clarify search and filter requirements including attributes and freshness
Explain choice of search index and attribute filtering techniques
Describe caching strategy to reduce latency
Discuss data flow from client request to search results
Highlight scaling challenges and solutions for high concurrency
Mention trade-offs between consistency and availability
Practice
(1/5)
1. What is the main purpose of adding filters in a search system?
easy
A. To slow down the search process for accuracy
B. To increase the total number of search results
C. To narrow down search results based on user preferences
D. To remove the search bar from the interface
Solution
Step 1: Understand the role of filters in search
Filters help users reduce the number of results by selecting specific criteria.
Step 2: Identify the effect of filters on results
Filters narrow results to match user preferences, making search faster and more relevant.
Final Answer:
To narrow down search results based on user preferences -> Option C
Quick Check:
Filters narrow results = C [OK]
Hint: Filters reduce results to match user needs quickly [OK]
Common Mistakes:
Thinking filters increase results
Assuming filters slow down search intentionally
Confusing filters with UI removal
2. Which of the following is the correct way to represent a filter for price less than $100 in a query parameter?
easy
A. price>100
B. price<100
C. price=100
D. price!=100
Solution
Step 1: Understand comparison operators in queries
The symbol '<' means less than, so 'price<100' filters prices below 100.
Step 2: Eliminate incorrect operators
'>' means greater than, '=' means equal, '!=' means not equal, so they don't match 'less than 100'.
Final Answer:
price<100 -> Option B
Quick Check:
Less than operator = A [OK]
Hint: Use '<' for less than in filters [OK]
Common Mistakes:
Using '>' instead of '<' for less than
Confusing '=' with less than
Using '!=' which means not equal
3. Consider a search system that indexes products by category and price. If a user searches with filters category='books' and price < 20, which data structure best supports fast filtering?
medium
A. A hash map keyed by category with sorted price lists
B. A single unsorted list of all products
C. A queue of products ordered by insertion time
D. A stack of products sorted by price
Solution
Step 1: Analyze filtering needs
Filtering by category and price requires quick lookup by category and efficient price range queries.
Step 2: Choose data structure supporting these queries
A hash map keyed by category allows fast category lookup; sorted price lists enable quick filtering by price.
Final Answer:
A hash map keyed by category with sorted price lists -> Option A
Quick Check:
Hash map + sorted list = B [OK]
Hint: Use hash map for categories and sorted lists for range filters [OK]
Common Mistakes:
Using unsorted lists causing slow searches
Using queue or stack which don't support efficient filtering
Ignoring the need for sorting by price
4. A search filter system is returning incorrect results when filtering by date range. Which of the following is the most likely cause?
medium
A. Date values are stored as strings and compared lexicographically
B. The filter uses numeric comparison on date objects
C. The database index is on the wrong column
D. The search query is missing a filter parameter
Solution
Step 1: Understand date comparison issues
Comparing dates stored as strings can cause wrong order because string comparison is lexicographic.
Step 2: Identify why this causes incorrect results
Dates like '12/01/2023' and '02/12/2023' compared as strings may not sort correctly, causing wrong filter results.
Final Answer:
Date values are stored as strings and compared lexicographically -> Option A
Quick Check:
String date comparison causes errors = A [OK]
Hint: Store dates as date objects, not strings [OK]
5. You are designing a scalable search and filter system for an e-commerce site with millions of products. Which approach best balances fast search and flexible filtering?
hard
A. Load all products into memory and filter using loops
B. Store all products in a single SQL table and scan it for every search
C. Use a simple key-value store without indexes
D. Use a distributed search engine with inverted indexes and faceted filters
Solution
Step 1: Consider scalability and performance needs
Millions of products require fast, scalable search with flexible filters.
Step 2: Evaluate options for search and filtering
Distributed search engines with inverted indexes enable fast text search; faceted filters allow flexible attribute filtering efficiently.
Step 3: Eliminate inefficient approaches
Scanning large SQL tables or in-memory filtering is slow and not scalable; key-value stores lack complex search capabilities.
Final Answer:
Use a distributed search engine with inverted indexes and faceted filters -> Option D
Quick Check:
Distributed search + faceted filters = D [OK]
Hint: Use distributed search with faceted filters for scale [OK]
Common Mistakes:
Relying on full table scans for large data
Ignoring indexing for search speed
Using memory-heavy filtering for millions of items