HLDsystem_design~25 mins

Product catalog design in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Product Catalog System

Design covers product metadata storage, search, and retrieval APIs. Image storage and CDN are out of scope. User authentication and order management are out of scope.

Functional Requirements

FR1: Store product information including name, description, price, and images

FR2: Support product categories and subcategories

FR3: Allow searching products by name, category, and attributes

FR4: Support filtering products by price range, brand, and other attributes

FR5: Handle up to 1 million products

FR6: Provide API for product retrieval with response time under 200ms

FR7: Support updates to product details with eventual consistency

FR8: Allow bulk import and export of product data

Non-Functional Requirements

NFR1: System should handle 1000 concurrent read requests

NFR2: API availability target of 99.9%

NFR3: Search queries should return results within 200ms p99 latency

NFR4: Data consistency can be eventual for product updates

NFR5: Images stored separately from product metadata

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

API Gateway for client requests

Product Metadata Database (relational or NoSQL)

Search Engine (e.g., Elasticsearch) for fast querying

Cache layer (e.g., Redis) for frequently accessed products

Bulk import/export service

Background job processor for syncing updates to search index

Design Patterns

CQRS (Command Query Responsibility Segregation) for separating reads and writes

Eventual consistency for search index updates

Pagination and sorting for product lists

Denormalization for faster reads

API versioning for backward compatibility

Reference Architecture

Client
  |
  v
API Gateway
  |
  +--------------------+
  |                    |
Product Metadata DB   Search Engine
  |                    |
Cache Layer <---------+
  |
Bulk Import/Export Service
  |
Background Job Processor (sync to Search)

Components

API Gateway

RESTful API server (e.g., Node.js/Express, Spring Boot)

Handles client requests, routes to appropriate services, enforces rate limiting

Product Metadata Database

Relational DB (PostgreSQL) or NoSQL (MongoDB)

Stores product details, categories, and attributes

Search Engine

Elasticsearch

Provides fast search and filtering capabilities

Cache Layer

Redis

Caches frequently accessed product data to reduce DB load

Bulk Import/Export Service

Batch processing system (e.g., Python scripts, AWS Lambda)

Handles large scale product data uploads and downloads

Background Job Processor

Message queue + worker (e.g., RabbitMQ + Celery)

Syncs product updates from DB to search engine asynchronously

Request Flow

1. Client sends product search request to API Gateway

2. API Gateway checks Cache Layer for results

3. If cache miss, API Gateway queries Search Engine

4. Search Engine returns matching product IDs

5. API Gateway fetches product details from Product Metadata DB or Cache

6. API Gateway returns product data to client

7. For product updates, API Gateway writes to Product Metadata DB

8. Background Job Processor reads DB changes and updates Search Engine asynchronously

9. Bulk import service writes product data to Product Metadata DB in batches

Database Schema

Entities: - Product (id PK, name, description, price, brand, category_id FK, attributes JSONB, created_at, updated_at) - Category (id PK, name, parent_category_id FK nullable) - ProductImage (id PK, product_id FK, image_url, alt_text) Relationships: - One Category can have many Products (1:N) - One Product can have many ProductImages (1:N) - Categories can have hierarchical parent-child relationship (self-referencing FK)

Scaling Discussion

Bottlenecks

Database read load increases with product queries

Search Engine indexing delays with frequent updates

Cache invalidation complexity with product updates

Bulk import causing DB performance degradation

Solutions

Use read replicas for the database to distribute read traffic

Implement incremental and batched indexing for search engine

Use cache expiration and event-driven cache invalidation

Throttle bulk import jobs and run during off-peak hours

Interview Tips

Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and data flow, 10 minutes for scaling and trade-offs, 10 minutes for Q&A

Clarify functional and non-functional requirements upfront

Explain choice of database and search engine clearly

Describe how caching improves performance

Discuss eventual consistency trade-offs for search index

Highlight scalability strategies and bottleneck mitigation

Mention API design considerations and error handling