| Users | Push Requests/Second | Message Volume | Infrastructure Changes |
|---|---|---|---|
| 100 users | ~10 req/s | Low volume, simple queue | Single server, basic push service |
| 10,000 users | ~1,000 req/s | Moderate volume, queue grows | Load balancer, multiple push workers, caching |
| 1,000,000 users | ~100,000 req/s | High volume, large queues | Distributed push services, sharded queues, CDN for payloads |
| 100,000,000 users | ~10,000,000 req/s | Very high volume, massive queues | Multi-region clusters, advanced sharding, edge caching, auto-scaling |
Push notification integration in HLD - Scalability & System Analysis
The first bottleneck is the message queue and push service throughput. As user count grows, the system struggles to enqueue and deliver notifications fast enough. Single servers and simple queues cannot handle high concurrent push requests, causing delays and dropped messages.
- Horizontal scaling: Add more push worker servers behind a load balancer to distribute load.
- Message queue sharding: Split queues by user segments or notification types to reduce contention.
- Caching: Cache notification payloads or user tokens to reduce repeated database lookups.
- Use CDN: For large payloads like images, use CDN to offload delivery from push servers.
- Auto-scaling: Automatically add/remove push workers based on traffic spikes.
- Multi-region deployment: Deploy push services closer to users to reduce latency and network load.
At 1 million users sending 1 notification per second, expect ~1 million push requests per second. Each request is small (~1 KB), so bandwidth is about 1 GB/s. Storage for logs and retries can grow to terabytes daily. Infrastructure costs include multiple servers, message queues, and CDN usage. Efficient batching and filtering reduce costs.
Start by explaining the push flow simply: app server sends notification to queue, workers deliver to devices. Discuss bottlenecks like queue throughput and network limits. Then propose scaling steps: horizontal scaling, sharding, caching, CDN. Always justify why each step solves the bottleneck.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce load on the main database before scaling application servers.
