Bird
Raised Fist0
HLDsystem_design~25 mins

Design a notification system in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Notification System
Design covers notification generation, delivery, user preferences, and history storage. Does not cover content creation or user authentication systems.
Functional Requirements
FR1: Send notifications to users via multiple channels: email, SMS, and push notifications
FR2: Support both real-time and scheduled notifications
FR3: Allow users to subscribe or unsubscribe from different notification types
FR4: Handle up to 100,000 notifications per minute
FR5: Ensure delivery with retry mechanisms for failed notifications
FR6: Provide an API for other services to trigger notifications
FR7: Store notification history for 30 days for audit and user review
Non-Functional Requirements
NFR1: Latency for real-time notifications should be under 500ms for 95% of requests
NFR2: System availability should be at least 99.9%
NFR3: Scalable to handle peak loads during special events
NFR4: Data privacy compliance for user contact information
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
❓ Question 7
Key Components
API Gateway for receiving notification requests
Notification Service to process and route notifications
Message Queue for decoupling and buffering
Channel-specific Delivery Services (Email, SMS, Push)
User Preferences Store
Notification History Database
Retry and Failure Handling Mechanism
Monitoring and Logging
Design Patterns
Publish-Subscribe for event-driven notification dispatch
Circuit Breaker for handling channel failures
Bulkhead pattern to isolate channel failures
Retry with exponential backoff
Caching user preferences for fast access
Reference Architecture
          +-----------------+
          |  Client / Other  |
          |   Services API  |
          +--------+--------+
                   |
                   v
          +--------+--------+
          |   API Gateway   |
          +--------+--------+
                   |
                   v
          +--------+--------+          +-------------------+
          | Notification    |          | User Preferences  |
          | Service        |<-------->| Store (DB/Cache)  |
          +--------+--------+          +-------------------+
                   |
                   v
          +--------+--------+
          | Message Queue   |
          +--------+--------+
           /       |        \
          v        v         v
+---------+  +-----+-----+ +--+---------+
| Email   |  | SMS       | | Push       |
| Service |  | Service   | | Service    |
+---------+  +-----------+ +------------+
                   |
                   v
          +--------+--------+
          | Retry & Failure |
          | Handling        |
          +-----------------+
                   |
                   v
          +-----------------+
          | Notification    |
          | History DB      |
          +-----------------+
Components
API Gateway
REST API / GraphQL
Receives notification requests from clients and other services
Notification Service
Microservice (Node.js / Python)
Processes requests, applies user preferences, and routes notifications
Message Queue
Kafka / RabbitMQ
Buffers notifications for asynchronous processing and decouples components
Email Service
SMTP / Third-party API (SendGrid, SES)
Delivers email notifications
SMS Service
Third-party SMS API (Twilio, Nexmo)
Delivers SMS notifications
Push Service
Firebase Cloud Messaging / APNs
Delivers push notifications to mobile/web clients
User Preferences Store
Relational DB + Cache (PostgreSQL + Redis)
Stores and caches user subscription preferences
Retry & Failure Handling
Background worker with exponential backoff
Retries failed notifications and logs failures
Notification History DB
NoSQL DB (MongoDB / DynamoDB)
Stores sent notification records for 30 days
Request Flow
1. Client or service sends notification request to API Gateway.
2. API Gateway forwards request to Notification Service.
3. Notification Service fetches user preferences from cache or DB.
4. Notification Service filters and personalizes notifications based on preferences.
5. Notification Service publishes notification messages to Message Queue.
6. Channel-specific services (Email, SMS, Push) consume messages from the queue.
7. Each channel service attempts delivery and reports success or failure.
8. Retry & Failure Handling component retries failed deliveries with backoff.
9. All sent notifications are logged into Notification History DB.
10. Users can query notification history or update preferences via API.
Database Schema
Entities: - UserPreferences(user_id PK, email_opt_in bool, sms_opt_in bool, push_opt_in bool, updated_at timestamp) - NotificationHistory(notification_id PK, user_id FK, channel enum, content text, status enum, sent_at timestamp) Relationships: - UserPreferences linked to users (1:1) - NotificationHistory linked to users (N:1) - Channels represented as enum values in NotificationHistory
Scaling Discussion
Bottlenecks
Message Queue saturation during peak notification bursts
Notification Service CPU and memory limits processing high volume
Channel service rate limits from third-party providers
Database read/write load for user preferences and history
Retry mechanism causing message pile-up if many failures occur
Solutions
Partition and scale message queue clusters horizontally
Deploy multiple instances of Notification Service with load balancing
Implement rate limiting and backpressure for channel services
Use caching aggressively for user preferences; shard history DB
Implement circuit breakers and dead-letter queues for failing messages
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Clarify notification channels and delivery guarantees
Explain asynchronous processing with message queues
Discuss user preferences management and caching
Describe retry and failure handling strategies
Highlight scalability and fault tolerance approaches
Mention data privacy and compliance considerations