Bird
Raised Fist0
LLDsystem_design~10 mins

Notification system in LLD - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Notification system
Growth Table: Notification System Scaling
UsersNotifications/DaySystem Changes
100~1,000Single server handles all; simple DB writes; no queue needed
10,000~100,000Introduce message queue; DB indexing; basic caching; multiple app instances
1,000,000~10,000,000Horizontal scaling of app servers; distributed queue; read replicas; CDN for media
100,000,000~1,000,000,000Sharded databases; multi-region deployment; advanced caching layers; push notification services
First Bottleneck

At small to medium scale, the database is the first bottleneck. Writing and reading notification data for millions of users causes high load. The DB struggles with many writes and queries per second.

Scaling Solutions
  • Database: Use read replicas to spread read load; implement write queues to smooth writes; shard data by user ID.
  • Application Servers: Horizontally scale by adding more servers behind load balancers.
  • Message Queue: Use distributed queues (e.g., Kafka) to handle high notification throughput.
  • Caching: Cache frequent notification metadata in Redis or similar to reduce DB hits.
  • CDN: Use CDN to serve notification media (images, videos) efficiently.
  • Push Services: Integrate with platform push notification services (APNs, FCM) for mobile delivery.
Back-of-Envelope Cost Analysis
  • At 1M users sending 10 notifications/day: ~10M notifications/day ≈ 115 notifications/sec.
  • Database: Needs to handle ~200 QPS (writes + reads), requiring replicas and indexing.
  • Message Queue: Must support 100-200 messages/sec throughput.
  • Bandwidth: Assuming 10KB per notification payload, ~100 GB/day (~1.15 MB/sec avg, ~13 MB/sec peak).
  • Storage: Storing notifications for 30 days -> 300M notifications ≈ 3TB assuming 10KB each.
Interview Tip

Start by clarifying notification types and delivery methods. Discuss user scale and traffic patterns. Identify bottlenecks like DB writes or push service limits. Propose incremental scaling steps: queues, caching, sharding. Always justify why each solution fits the bottleneck.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Introduce read replicas and write queues to distribute load and smooth writes before scaling app servers.

Key Result
The database is the first bottleneck as notification volume grows; scaling requires queues, caching, read replicas, and sharding to handle high write/read loads efficiently.

Practice

(1/5)
1. Which component in a notification system is responsible for generating events that trigger notifications?
easy
A. Delivery Channel
B. Notification Service
C. User Preferences Store
D. Event Producer

Solution

  1. Step 1: Understand the role of event producers

    Event producers create or detect events that require notifying users, such as a new message or alert.
  2. Step 2: Differentiate from other components

    Notification service processes events, delivery channels send notifications, and user preferences store user settings.
  3. Final Answer:

    Event Producer -> Option D
  4. Quick Check:

    Event source = Event Producer [OK]
Hint: Event creators are called producers in notification systems [OK]
Common Mistakes:
  • Confusing notification service with event producer
  • Thinking delivery channel generates events
  • Assuming user preferences create events
2. Which of the following is the correct sequence of components for sending a notification after an event occurs?
easy
A. Delivery Channel -> Notification Service -> Event Producer
B. Event Producer -> Notification Service -> Delivery Channel
C. Notification Service -> Event Producer -> Delivery Channel
D. User Preferences Store -> Event Producer -> Delivery Channel

Solution

  1. Step 1: Identify the logical flow of notification

    First, an event is generated by the event producer, then processed by the notification service, and finally sent via the delivery channel.
  2. Step 2: Eliminate incorrect sequences

    Delivery channel cannot start the process; user preferences store is not part of the sending sequence.
  3. Final Answer:

    Event Producer -> Notification Service -> Delivery Channel -> Option B
  4. Quick Check:

    Event -> Process -> Send = A [OK]
Hint: Notifications flow from event to service to delivery [OK]
Common Mistakes:
  • Reversing the order of components
  • Including user preferences in the sending chain
  • Confusing delivery channel as event source
3. Consider a notification system where users can choose email or SMS as delivery channels. If a user prefers both, what is the expected behavior when an event triggers a notification?
medium
A. Send notification via both email and SMS
B. Send notification via email only
C. Send notification via SMS only
D. Do not send any notification

Solution

  1. Step 1: Understand user preference handling

    If a user selects multiple delivery channels, the system should send notifications through all preferred channels to ensure delivery.
  2. Step 2: Confirm expected multi-channel delivery

    Sending via both email and SMS respects user choice and increases notification reach.
  3. Final Answer:

    Send notification via both email and SMS -> Option A
  4. Quick Check:

    Multiple preferences = multiple channels [OK]
Hint: Send notifications on all user-selected channels [OK]
Common Mistakes:
  • Sending notification on only one channel
  • Ignoring user preferences
  • Not sending notification at all
4. A notification system uses a queue to handle event processing but notifications are delayed significantly. Which is the most likely cause?
medium
A. Queue is overloaded with too many events
B. User preferences are not stored
C. Delivery channel is sending notifications instantly
D. Event producer is generating too few events

Solution

  1. Step 1: Analyze queue role in notification system

    Queue buffers events to handle load. If overloaded, it causes delays in processing notifications.
  2. Step 2: Evaluate other options

    Missing user preferences or instant delivery does not cause delay; too few events would reduce load, not increase delay.
  3. Final Answer:

    Queue is overloaded with too many events -> Option A
  4. Quick Check:

    Queue overload = delay [OK]
Hint: Delays often mean queue overload, not missing data [OK]
Common Mistakes:
  • Blaming delivery channel for delays
  • Assuming missing preferences cause delay
  • Thinking fewer events cause delays
5. You need to design a notification system that supports millions of users with personalized preferences and multiple delivery channels. Which design choice best ensures scalability and user customization?
hard
A. Use a centralized notification service with a single queue and fixed delivery channels
B. Store all user preferences in a local file on the notification server
C. Implement distributed notification services with sharded queues and dynamic delivery channel selection per user
D. Send notifications synchronously from event producers directly to users

Solution

  1. Step 1: Consider scalability requirements

    Millions of users require distributed services and sharded queues to handle load without bottlenecks.
  2. Step 2: Address user customization needs

    Dynamic delivery channel selection per user allows personalized notifications respecting preferences.
  3. Step 3: Evaluate other options

    Centralized service and single queue create bottlenecks; synchronous sending blocks processing; local files do not scale or support dynamic preferences.
  4. Final Answer:

    Implement distributed notification services with sharded queues and dynamic delivery channel selection per user -> Option C
  5. Quick Check:

    Distributed + dynamic preferences = scalable & customizable [OK]
Hint: Distribute services and shard queues for scale and flexibility [OK]
Common Mistakes:
  • Choosing centralized design causing bottlenecks
  • Using synchronous sending blocking system
  • Storing preferences in non-scalable local files