Bird
Raised Fist0
HLDsystem_design~25 mins

Search and metadata in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Search and Metadata System
Design covers search functionality and metadata management for content items. Out of scope are user authentication, content creation UI, and analytics.
Functional Requirements
FR1: Allow users to search content using keywords and filters
FR2: Store and manage metadata for each content item (e.g., title, author, date, tags)
FR3: Support fast search response times (p99 < 300ms)
FR4: Enable filtering search results by metadata fields
FR5: Handle up to 100,000 concurrent search requests
FR6: Support adding, updating, and deleting content and metadata
FR7: Provide relevance ranking for search results
Non-Functional Requirements
NFR1: System must be highly available (99.9% uptime)
NFR2: Search index should be updated within 5 seconds of content changes
NFR3: Support horizontal scaling for both search and metadata storage
NFR4: Latency for search queries should be under 300ms at p99
NFR5: Metadata storage must ensure consistency for updates
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Search index engine (e.g., Elasticsearch, OpenSearch)
Metadata database (relational or NoSQL)
API gateway or search service
Indexing pipeline for content and metadata updates
Cache layer for frequent queries
Load balancer for scaling search requests
Design Patterns
Inverted index for full-text search
Event-driven indexing for near real-time updates
Cache aside pattern for metadata caching
Sharding and replication for scaling search index
Pagination and filtering in search results
Reference Architecture
Client
  |
  v
API Gateway / Search Service
  |
  +--> Cache (Redis)
  |
  +--> Search Engine (Elasticsearch Cluster)
  |
  +--> Metadata DB (PostgreSQL Cluster)

Indexing Pipeline:
Metadata DB --> Change Events --> Indexer --> Search Engine
Components
API Gateway / Search Service
Node.js or Java Spring Boot
Handles client search requests, applies filters, queries cache and search engine, returns results
Search Engine
Elasticsearch or OpenSearch
Stores inverted index for full-text search and metadata fields, performs fast search queries with ranking
Metadata Database
PostgreSQL
Stores authoritative metadata records with strong consistency
Cache
Redis
Caches frequent search queries and metadata to reduce latency and load
Indexing Pipeline
Kafka + Worker Services
Processes metadata changes from DB events and updates search engine index within 5 seconds
Request Flow
1. Client sends search request with keywords and filters to API Gateway
2. API Gateway checks Redis cache for cached results
3. If cache miss, API Gateway queries Elasticsearch with search and filter parameters
4. Elasticsearch returns ranked search results
5. API Gateway returns results to client and caches them in Redis
6. When metadata changes, DB emits change event to Kafka
7. Indexer service consumes event, updates Elasticsearch index accordingly
8. Metadata updates are stored in PostgreSQL with strong consistency
Database Schema
Entities: - ContentItem(id PK, title, author, created_at, updated_at, ...) - Metadata(id PK, content_item_id FK, key, value) Relationships: - ContentItem 1:N Metadata Metadata stores key-value pairs for flexible metadata fields. ContentItem stores core attributes.
Scaling Discussion
Bottlenecks
Search engine cluster CPU and memory limits under high query load
Metadata database write throughput for frequent updates
Cache size and eviction policy for popular queries
Indexer pipeline lag causing stale search results
Solutions
Scale search engine horizontally by adding shards and replicas
Use read replicas and partition metadata DB to improve write/read throughput
Implement cache eviction policies and increase Redis cluster size
Optimize indexing pipeline with parallel consumers and batch updates
Interview Tips
Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and components, 10 minutes for data flow and database design, 10 minutes for scaling and trade-offs discussion
Clarify search and metadata requirements and constraints
Explain choice of search engine and metadata storage
Describe how indexing pipeline keeps search index fresh
Discuss caching strategy to reduce latency
Address scaling challenges and solutions
Mention trade-offs between consistency and availability