Bird
0
0
LLDsystem_design~10 mins

Notification system in LLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Notification system
Growth Table: Notification System Scaling
UsersNotifications/DaySystem Changes
100~1,000Single server handles all; simple DB writes; no queue needed
10,000~100,000Introduce message queue; DB indexing; basic caching; multiple app instances
1,000,000~10,000,000Horizontal scaling of app servers; distributed queue; read replicas; CDN for media
100,000,000~1,000,000,000Sharded databases; multi-region deployment; advanced caching layers; push notification services
First Bottleneck

At small to medium scale, the database is the first bottleneck. Writing and reading notification data for millions of users causes high load. The DB struggles with many writes and queries per second.

Scaling Solutions
  • Database: Use read replicas to spread read load; implement write queues to smooth writes; shard data by user ID.
  • Application Servers: Horizontally scale by adding more servers behind load balancers.
  • Message Queue: Use distributed queues (e.g., Kafka) to handle high notification throughput.
  • Caching: Cache frequent notification metadata in Redis or similar to reduce DB hits.
  • CDN: Use CDN to serve notification media (images, videos) efficiently.
  • Push Services: Integrate with platform push notification services (APNs, FCM) for mobile delivery.
Back-of-Envelope Cost Analysis
  • At 1M users sending 10 notifications/day: ~10M notifications/day ≈ 115 notifications/sec.
  • Database: Needs to handle ~200 QPS (writes + reads), requiring replicas and indexing.
  • Message Queue: Must support 100-200 messages/sec throughput.
  • Bandwidth: Assuming 10KB per notification payload, ~100 GB/day (~1.15 MB/sec avg, ~13 MB/sec peak).
  • Storage: Storing notifications for 30 days -> 300M notifications ≈ 3TB assuming 10KB each.
Interview Tip

Start by clarifying notification types and delivery methods. Discuss user scale and traffic patterns. Identify bottlenecks like DB writes or push service limits. Propose incremental scaling steps: queues, caching, sharding. Always justify why each solution fits the bottleneck.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Introduce read replicas and write queues to distribute load and smooth writes before scaling app servers.

Key Result
The database is the first bottleneck as notification volume grows; scaling requires queues, caching, read replicas, and sharding to handle high write/read loads efficiently.