HLDsystem_design~10 mins

Design a notification system in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Design a notification system

Growth Table: Notification System Scaling

Users	Notifications/Day	Key Changes
100	~1,000	Simple queue, single server, direct DB writes
10,000	~100,000	Message queue introduced, caching user preferences, DB indexing
1,000,000	~10,000,000	Multiple app servers, distributed queue, read replicas, push notification services
100,000,000	~1,000,000,000	Sharded DB, global CDN for media, microservices, event-driven architecture

First Bottleneck

At around 10,000 users, the database becomes the first bottleneck. Writing and reading notification data for many users causes high latency and connection limits. The single server and simple queue cannot handle the volume efficiently.

Scaling Solutions

Horizontal Scaling: Add more application servers behind a load balancer to handle more notification requests.
Message Queues: Use distributed queues (e.g., Kafka, RabbitMQ) to decouple notification generation from delivery.
Caching: Cache user notification preferences and recent notifications to reduce DB load.
Database Read Replicas: Use replicas to distribute read traffic and reduce load on the primary DB.
Sharding: Partition the database by user ID or region to scale writes and storage.
Push Notification Services: Use external services (e.g., Firebase, APNs) for mobile push notifications to offload delivery.
CDN: Use CDN for static media in notifications to reduce bandwidth and latency.

Back-of-Envelope Cost Analysis

At 1M users sending 10 notifications/day: ~10M notifications/day ≈ 115 notifications/sec.
Database: Needs to handle ~115 writes/sec plus reads; a single DB can handle ~5,000 QPS, so one instance is sufficient but close to limits.
Message Queue: Must support ~115 enqueue/dequeue operations per second, well within Kafka or RabbitMQ capabilities.
Bandwidth: Assuming 1 KB per notification, 115 KB/s ≈ 0.9 Mbps, easily handled by 1 Gbps network.
Storage: 10M notifications/day x 1 KB = ~10 GB/day; plan for archiving and tiered storage.

Interview Tip

Start by clarifying notification types and user scale. Discuss data flow from event to delivery. Identify bottlenecks at each scale. Propose incremental scaling solutions: caching, queues, DB replicas, sharding. Mention trade-offs and real-world constraints like latency and cost.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read traffic and reduce load on the primary database. Also, introduce caching for frequent reads and consider message queues to decouple processing.

Key Result

The database is the first bottleneck as user notifications grow; scaling requires adding read replicas, caching, and distributed queues before sharding and microservices.