Bird
Raised Fist0
HLDsystem_design~10 mins

Design a notification system in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Design a notification system
Growth Table: Notification System Scaling
UsersNotifications/DayKey Changes
100~1,000Simple queue, single server, direct DB writes
10,000~100,000Message queue introduced, caching user preferences, DB indexing
1,000,000~10,000,000Multiple app servers, distributed queue, read replicas, push notification services
100,000,000~1,000,000,000Sharded DB, global CDN for media, microservices, event-driven architecture
First Bottleneck

At around 10,000 users, the database becomes the first bottleneck. Writing and reading notification data for many users causes high latency and connection limits. The single server and simple queue cannot handle the volume efficiently.

Scaling Solutions
  • Horizontal Scaling: Add more application servers behind a load balancer to handle more notification requests.
  • Message Queues: Use distributed queues (e.g., Kafka, RabbitMQ) to decouple notification generation from delivery.
  • Caching: Cache user notification preferences and recent notifications to reduce DB load.
  • Database Read Replicas: Use replicas to distribute read traffic and reduce load on the primary DB.
  • Sharding: Partition the database by user ID or region to scale writes and storage.
  • Push Notification Services: Use external services (e.g., Firebase, APNs) for mobile push notifications to offload delivery.
  • CDN: Use CDN for static media in notifications to reduce bandwidth and latency.
Back-of-Envelope Cost Analysis
  • At 1M users sending 10 notifications/day: ~10M notifications/day ≈ 115 notifications/sec.
  • Database: Needs to handle ~115 writes/sec plus reads; a single DB can handle ~5,000 QPS, so one instance is sufficient but close to limits.
  • Message Queue: Must support ~115 enqueue/dequeue operations per second, well within Kafka or RabbitMQ capabilities.
  • Bandwidth: Assuming 1 KB per notification, 115 KB/s ≈ 0.9 Mbps, easily handled by 1 Gbps network.
  • Storage: 10M notifications/day x 1 KB = ~10 GB/day; plan for archiving and tiered storage.
Interview Tip

Start by clarifying notification types and user scale. Discuss data flow from event to delivery. Identify bottlenecks at each scale. Propose incremental scaling solutions: caching, queues, DB replicas, sharding. Mention trade-offs and real-world constraints like latency and cost.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read traffic and reduce load on the primary database. Also, introduce caching for frequent reads and consider message queues to decouple processing.

Key Result
The database is the first bottleneck as user notifications grow; scaling requires adding read replicas, caching, and distributed queues before sharding and microservices.