| Scale | Users | Notifications/Day | Key Changes |
|---|---|---|---|
| Small | 100 | 1,000 | Single server handles API and DB; simple queue; direct push |
| Medium | 10,000 | 100,000 | Introduce message queue; DB read replicas; caching user preferences |
| Large | 1,000,000 | 10,000,000 | Multiple app servers; sharded DB; distributed queue; push notification services |
| Very Large | 100,000,000 | 1,000,000,000 | Global load balancers; multi-region DB shards; CDN for static content; advanced throttling |
Notification system design in HLD - Scalability & System Analysis
At small to medium scale, the database is the first bottleneck. It struggles with high write volume for notifications and user preferences. As traffic grows, the message queue and application servers also become bottlenecks due to processing and delivery delays.
- Database: Use read replicas for reads, write sharding by user ID, and caching for user settings.
- Application Servers: Horizontally scale with load balancers to handle more concurrent connections.
- Message Queue: Use distributed queues like Kafka or RabbitMQ to handle high throughput and ensure reliable delivery.
- Push Delivery: Integrate with platform push services (APNs, FCM) and use CDN for static notification content.
- Throttling & Batching: Batch notifications and throttle to avoid overwhelming users and systems.
- Requests per second: At 1M users sending 10 notifications/day, ~115 QPS (10M/86400s).
- Storage: Assuming 1KB per notification, 10M notifications/day = ~10GB/day storage.
- Bandwidth: Push payloads are small (~1KB), so 10M notifications ~10GB outbound daily.
- Server capacity: One app server handles ~3000 concurrent connections; scale horizontally as users grow.
- Database QPS: One PostgreSQL instance handles ~5000 QPS; use sharding and replicas beyond that.
Start by clarifying notification types and delivery guarantees. Discuss user scale and traffic patterns. Identify bottlenecks step-by-step: database, queue, delivery. Propose incremental scaling solutions with clear reasoning. Mention trade-offs like latency vs cost and user experience.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Add read replicas to offload read queries and implement caching for frequent reads. For writes, consider sharding or batching writes to reduce load.
