HLDsystem_design~25 mins

Search and metadata in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Search and Metadata System

Design covers search functionality and metadata management for content items. Out of scope are user authentication, content creation UI, and analytics.

Functional Requirements

FR1: Allow users to search content using keywords and filters

FR2: Store and manage metadata for each content item (e.g., title, author, date, tags)

FR3: Support fast search response times (p99 < 300ms)

FR4: Enable filtering search results by metadata fields

FR5: Handle up to 100,000 concurrent search requests

FR6: Support adding, updating, and deleting content and metadata

FR7: Provide relevance ranking for search results

Non-Functional Requirements

NFR1: System must be highly available (99.9% uptime)

NFR2: Search index should be updated within 5 seconds of content changes

NFR3: Support horizontal scaling for both search and metadata storage

NFR4: Latency for search queries should be under 300ms at p99

NFR5: Metadata storage must ensure consistency for updates

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Search index engine (e.g., Elasticsearch, OpenSearch)

Metadata database (relational or NoSQL)

API gateway or search service

Indexing pipeline for content and metadata updates

Cache layer for frequent queries

Load balancer for scaling search requests

Design Patterns

Inverted index for full-text search

Event-driven indexing for near real-time updates

Cache aside pattern for metadata caching

Sharding and replication for scaling search index

Pagination and filtering in search results

Reference Architecture

Client
  |
  v
API Gateway / Search Service
  |
  +--> Cache (Redis)
  |
  +--> Search Engine (Elasticsearch Cluster)
  |
  +--> Metadata DB (PostgreSQL Cluster)

Indexing Pipeline:
Metadata DB --> Change Events --> Indexer --> Search Engine

Components

API Gateway / Search Service

Node.js or Java Spring Boot

Handles client search requests, applies filters, queries cache and search engine, returns results

Search Engine

Elasticsearch or OpenSearch

Stores inverted index for full-text search and metadata fields, performs fast search queries with ranking

Metadata Database

PostgreSQL

Stores authoritative metadata records with strong consistency

Cache

Redis

Caches frequent search queries and metadata to reduce latency and load

Indexing Pipeline

Kafka + Worker Services

Processes metadata changes from DB events and updates search engine index within 5 seconds

Request Flow

1. Client sends search request with keywords and filters to API Gateway

2. API Gateway checks Redis cache for cached results

3. If cache miss, API Gateway queries Elasticsearch with search and filter parameters

4. Elasticsearch returns ranked search results

5. API Gateway returns results to client and caches them in Redis

6. When metadata changes, DB emits change event to Kafka

7. Indexer service consumes event, updates Elasticsearch index accordingly

8. Metadata updates are stored in PostgreSQL with strong consistency

Database Schema

Entities: - ContentItem(id PK, title, author, created_at, updated_at, ...) - Metadata(id PK, content_item_id FK, key, value) Relationships: - ContentItem 1:N Metadata Metadata stores key-value pairs for flexible metadata fields. ContentItem stores core attributes.

Scaling Discussion

Bottlenecks

Search engine cluster CPU and memory limits under high query load

Metadata database write throughput for frequent updates

Cache size and eviction policy for popular queries

Indexer pipeline lag causing stale search results

Solutions

Scale search engine horizontally by adding shards and replicas

Use read replicas and partition metadata DB to improve write/read throughput

Implement cache eviction policies and increase Redis cluster size

Optimize indexing pipeline with parallel consumers and batch updates

Interview Tips

Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and components, 10 minutes for data flow and database design, 10 minutes for scaling and trade-offs discussion

Clarify search and metadata requirements and constraints

Explain choice of search engine and metadata storage

Describe how indexing pipeline keeps search index fresh

Discuss caching strategy to reduce latency

Address scaling challenges and solutions

Mention trade-offs between consistency and availability