Bird
Raised Fist0
HLDsystem_design~25 mins

Product catalog design in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Product Catalog System
Design covers product metadata storage, search, and retrieval APIs. Image storage and CDN are out of scope. User authentication and order management are out of scope.
Functional Requirements
FR1: Store product information including name, description, price, and images
FR2: Support product categories and subcategories
FR3: Allow searching products by name, category, and attributes
FR4: Support filtering products by price range, brand, and other attributes
FR5: Handle up to 1 million products
FR6: Provide API for product retrieval with response time under 200ms
FR7: Support updates to product details with eventual consistency
FR8: Allow bulk import and export of product data
Non-Functional Requirements
NFR1: System should handle 1000 concurrent read requests
NFR2: API availability target of 99.9%
NFR3: Search queries should return results within 200ms p99 latency
NFR4: Data consistency can be eventual for product updates
NFR5: Images stored separately from product metadata
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
API Gateway for client requests
Product Metadata Database (relational or NoSQL)
Search Engine (e.g., Elasticsearch) for fast querying
Cache layer (e.g., Redis) for frequently accessed products
Bulk import/export service
Background job processor for syncing updates to search index
Design Patterns
CQRS (Command Query Responsibility Segregation) for separating reads and writes
Eventual consistency for search index updates
Pagination and sorting for product lists
Denormalization for faster reads
API versioning for backward compatibility
Reference Architecture
Client
  |
  v
API Gateway
  |
  +--------------------+
  |                    |
Product Metadata DB   Search Engine
  |                    |
Cache Layer <---------+
  |
Bulk Import/Export Service
  |
Background Job Processor (sync to Search)
Components
API Gateway
RESTful API server (e.g., Node.js/Express, Spring Boot)
Handles client requests, routes to appropriate services, enforces rate limiting
Product Metadata Database
Relational DB (PostgreSQL) or NoSQL (MongoDB)
Stores product details, categories, and attributes
Search Engine
Elasticsearch
Provides fast search and filtering capabilities
Cache Layer
Redis
Caches frequently accessed product data to reduce DB load
Bulk Import/Export Service
Batch processing system (e.g., Python scripts, AWS Lambda)
Handles large scale product data uploads and downloads
Background Job Processor
Message queue + worker (e.g., RabbitMQ + Celery)
Syncs product updates from DB to search engine asynchronously
Request Flow
1. Client sends product search request to API Gateway
2. API Gateway checks Cache Layer for results
3. If cache miss, API Gateway queries Search Engine
4. Search Engine returns matching product IDs
5. API Gateway fetches product details from Product Metadata DB or Cache
6. API Gateway returns product data to client
7. For product updates, API Gateway writes to Product Metadata DB
8. Background Job Processor reads DB changes and updates Search Engine asynchronously
9. Bulk import service writes product data to Product Metadata DB in batches
Database Schema
Entities: - Product (id PK, name, description, price, brand, category_id FK, attributes JSONB, created_at, updated_at) - Category (id PK, name, parent_category_id FK nullable) - ProductImage (id PK, product_id FK, image_url, alt_text) Relationships: - One Category can have many Products (1:N) - One Product can have many ProductImages (1:N) - Categories can have hierarchical parent-child relationship (self-referencing FK)
Scaling Discussion
Bottlenecks
Database read load increases with product queries
Search Engine indexing delays with frequent updates
Cache invalidation complexity with product updates
Bulk import causing DB performance degradation
Solutions
Use read replicas for the database to distribute read traffic
Implement incremental and batched indexing for search engine
Use cache expiration and event-driven cache invalidation
Throttle bulk import jobs and run during off-peak hours
Interview Tips
Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and data flow, 10 minutes for scaling and trade-offs, 10 minutes for Q&A
Clarify functional and non-functional requirements upfront
Explain choice of database and search engine clearly
Describe how caching improves performance
Discuss eventual consistency trade-offs for search index
Highlight scalability strategies and bottleneck mitigation
Mention API design considerations and error handling