Bird
Raised Fist0
LLDsystem_design~25 mins

Notification system in LLD - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Notification System
Design covers backend notification processing, delivery, and subscription management. Out of scope: UI design for user preferences and third-party integrations beyond email, SMS, and push.
Functional Requirements
FR1: Send notifications to users via multiple channels: email, SMS, and push notifications
FR2: Support scheduling notifications for future delivery
FR3: Allow users to subscribe or unsubscribe from different notification types
FR4: Ensure delivery guarantees with retries on failure
FR5: Provide an API for other services to trigger notifications
FR6: Support at least 10,000 notifications per second
FR7: Ensure p99 latency for notification delivery under 500ms
FR8: Maintain 99.9% system availability
Non-Functional Requirements
NFR1: Handle spikes up to 50,000 notifications per second during peak times
NFR2: Notifications must be delivered in order per user
NFR3: Data privacy compliance for user contact information
NFR4: System must be horizontally scalable
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
❓ Question 7
Key Components
API Gateway for receiving notification requests
User Subscription Service to manage preferences
Notification Queue to buffer and order notifications
Worker Services to process and send notifications
Channel-specific Delivery Services (Email, SMS, Push)
Database for storing user preferences and notification logs
Cache for quick access to subscription data
Retry and Dead Letter Queue for failed notifications
Monitoring and Logging system
Design Patterns
Publish-Subscribe for decoupling notification producers and consumers
Queue-based Load Leveling to handle spikes
Circuit Breaker for external channel failures
Idempotency to avoid duplicate notifications
Event Sourcing for audit and replay
Reference Architecture
          +-------------------+
          |   API Gateway     |
          +---------+---------+
                    |
                    v
          +-------------------+          +---------------------+
          | User Subscription |<-------->|    Database         |
          |    Service        |          | (User prefs, logs)  |
          +---------+---------+          +---------------------+
                    |
                    v
          +-------------------+
          | Notification Queue |<-------------------+
          +---------+---------+                    |
                    |                              |
          +---------v---------+                    |
          | Worker Services   |                    |
          +----+----+----+----+                    |
               |    |    |                         |
       +-------+    |    +--------+                |
       |            |             |                |
+------+--+    +----+----+   +----+-----+          |
| Email   |    | SMS     |   | Push     |          |
| Service |    | Service |   | Service  |          |
+---------+    +---------+   +----------+          |
       |            |             |                |
       +------------+-------------+----------------+
                    |
          +---------v---------+
          | Retry & DLQ       |
          +-------------------+
Components
API Gateway
RESTful HTTP Server
Receive notification requests from clients and other services
User Subscription Service
Microservice with relational DB
Manage user preferences and subscription status
Notification Queue
Distributed message queue (e.g., Kafka, RabbitMQ)
Buffer notifications and maintain order per user
Worker Services
Stateless microservices
Consume notifications from queue, apply business logic, and dispatch to channels
Email Service
SMTP or Email API (e.g., SendGrid)
Send email notifications
SMS Service
SMS Gateway API (e.g., Twilio)
Send SMS notifications
Push Service
Push notification service (e.g., Firebase Cloud Messaging)
Send push notifications to mobile/web clients
Retry & Dead Letter Queue
Message queue with retry logic
Handle failed notifications with retries and store permanently failed messages
Database
Relational DB (e.g., PostgreSQL)
Store user subscriptions, notification logs, and metadata
Request Flow
1. Client or service sends notification request to API Gateway.
2. API Gateway forwards request to User Subscription Service to verify user preferences.
3. If user is subscribed, notification is placed into Notification Queue.
4. Worker Services consume notifications from the queue in order.
5. Workers send notifications to appropriate channel services (Email, SMS, Push).
6. Channel services attempt delivery and report success or failure.
7. On failure, notification is sent to Retry & Dead Letter Queue for retry attempts.
8. Successful deliveries and failures are logged in the Database.
9. Users can update subscription preferences via User Subscription Service API.
Database Schema
Entities: - User (user_id PK, name, contact_info) - Subscription (subscription_id PK, user_id FK, channel ENUM, is_subscribed BOOLEAN) - Notification (notification_id PK, user_id FK, content TEXT, channel ENUM, status ENUM, created_at TIMESTAMP, delivered_at TIMESTAMP) - RetryLog (retry_id PK, notification_id FK, attempt_count INT, last_attempt TIMESTAMP, status ENUM) Relationships: - User 1:N Subscription - User 1:N Notification - Notification 1:1 RetryLog (optional)
Scaling Discussion
Bottlenecks
Notification Queue can become a bottleneck under high load
Worker Services may be overwhelmed processing many notifications
External channel services (Email, SMS, Push) may have rate limits
Database can become a bottleneck for subscription and logging queries
Solutions
Partition Notification Queue by user ID to distribute load and maintain order
Scale Worker Services horizontally with auto-scaling based on queue length
Implement rate limiting and circuit breakers for external channel APIs
Use caching (e.g., Redis) for user subscription data to reduce DB load
Archive old notification logs to reduce database size and improve performance
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Clarify notification channels and delivery guarantees early
Emphasize decoupling producers and consumers with queues
Discuss user subscription management and privacy
Explain retry mechanisms and failure handling
Highlight scalability strategies and bottleneck mitigation
Mention monitoring and alerting importance

Practice

(1/5)
1. Which component in a notification system is responsible for generating events that trigger notifications?
easy
A. Delivery Channel
B. Notification Service
C. User Preferences Store
D. Event Producer

Solution

  1. Step 1: Understand the role of event producers

    Event producers create or detect events that require notifying users, such as a new message or alert.
  2. Step 2: Differentiate from other components

    Notification service processes events, delivery channels send notifications, and user preferences store user settings.
  3. Final Answer:

    Event Producer -> Option D
  4. Quick Check:

    Event source = Event Producer [OK]
Hint: Event creators are called producers in notification systems [OK]
Common Mistakes:
  • Confusing notification service with event producer
  • Thinking delivery channel generates events
  • Assuming user preferences create events
2. Which of the following is the correct sequence of components for sending a notification after an event occurs?
easy
A. Delivery Channel -> Notification Service -> Event Producer
B. Event Producer -> Notification Service -> Delivery Channel
C. Notification Service -> Event Producer -> Delivery Channel
D. User Preferences Store -> Event Producer -> Delivery Channel

Solution

  1. Step 1: Identify the logical flow of notification

    First, an event is generated by the event producer, then processed by the notification service, and finally sent via the delivery channel.
  2. Step 2: Eliminate incorrect sequences

    Delivery channel cannot start the process; user preferences store is not part of the sending sequence.
  3. Final Answer:

    Event Producer -> Notification Service -> Delivery Channel -> Option B
  4. Quick Check:

    Event -> Process -> Send = A [OK]
Hint: Notifications flow from event to service to delivery [OK]
Common Mistakes:
  • Reversing the order of components
  • Including user preferences in the sending chain
  • Confusing delivery channel as event source
3. Consider a notification system where users can choose email or SMS as delivery channels. If a user prefers both, what is the expected behavior when an event triggers a notification?
medium
A. Send notification via both email and SMS
B. Send notification via email only
C. Send notification via SMS only
D. Do not send any notification

Solution

  1. Step 1: Understand user preference handling

    If a user selects multiple delivery channels, the system should send notifications through all preferred channels to ensure delivery.
  2. Step 2: Confirm expected multi-channel delivery

    Sending via both email and SMS respects user choice and increases notification reach.
  3. Final Answer:

    Send notification via both email and SMS -> Option A
  4. Quick Check:

    Multiple preferences = multiple channels [OK]
Hint: Send notifications on all user-selected channels [OK]
Common Mistakes:
  • Sending notification on only one channel
  • Ignoring user preferences
  • Not sending notification at all
4. A notification system uses a queue to handle event processing but notifications are delayed significantly. Which is the most likely cause?
medium
A. Queue is overloaded with too many events
B. User preferences are not stored
C. Delivery channel is sending notifications instantly
D. Event producer is generating too few events

Solution

  1. Step 1: Analyze queue role in notification system

    Queue buffers events to handle load. If overloaded, it causes delays in processing notifications.
  2. Step 2: Evaluate other options

    Missing user preferences or instant delivery does not cause delay; too few events would reduce load, not increase delay.
  3. Final Answer:

    Queue is overloaded with too many events -> Option A
  4. Quick Check:

    Queue overload = delay [OK]
Hint: Delays often mean queue overload, not missing data [OK]
Common Mistakes:
  • Blaming delivery channel for delays
  • Assuming missing preferences cause delay
  • Thinking fewer events cause delays
5. You need to design a notification system that supports millions of users with personalized preferences and multiple delivery channels. Which design choice best ensures scalability and user customization?
hard
A. Use a centralized notification service with a single queue and fixed delivery channels
B. Store all user preferences in a local file on the notification server
C. Implement distributed notification services with sharded queues and dynamic delivery channel selection per user
D. Send notifications synchronously from event producers directly to users

Solution

  1. Step 1: Consider scalability requirements

    Millions of users require distributed services and sharded queues to handle load without bottlenecks.
  2. Step 2: Address user customization needs

    Dynamic delivery channel selection per user allows personalized notifications respecting preferences.
  3. Step 3: Evaluate other options

    Centralized service and single queue create bottlenecks; synchronous sending blocks processing; local files do not scale or support dynamic preferences.
  4. Final Answer:

    Implement distributed notification services with sharded queues and dynamic delivery channel selection per user -> Option C
  5. Quick Check:

    Distributed + dynamic preferences = scalable & customizable [OK]
Hint: Distribute services and shard queues for scale and flexibility [OK]
Common Mistakes:
  • Choosing centralized design causing bottlenecks
  • Using synchronous sending blocking system
  • Storing preferences in non-scalable local files