| Users | Messages/Day | Server Load | Database Load | Network | Notes |
|---|---|---|---|---|---|
| 100 | 1,000 | Single app server handles all | Single DB instance handles writes/reads | Low bandwidth, no CDN needed | Simple setup, no caching needed |
| 10,000 | 100,000 | Multiple app servers behind load balancer | DB starts to see high write/read load | Moderate bandwidth, consider caching | Introduce Redis cache for recent messages |
| 1,000,000 | 10,000,000 | Hundreds of app servers, autoscaling | DB bottleneck: read replicas, sharding needed | High bandwidth, CDN for media files | Use message queues for delivery, partition users |
| 100,000,000 | 1,000,000,000 | Thousands of app servers, geo-distributed | Multi-region DB clusters, advanced sharding | Very high bandwidth, global CDN | Strong consistency challenges, eventual consistency for some data |
One-to-one messaging in HLD - Scalability & System Analysis
At small scale, the database is the first bottleneck because it must handle all message writes and reads. As users grow, the DB CPU and disk I/O get saturated. This slows down message delivery and retrieval.
- Horizontal scaling: Add more app servers behind a load balancer to handle concurrent connections.
- Database read replicas: Offload read queries to replicas to reduce load on primary DB.
- Sharding: Split user data across multiple database instances by user ID to distribute load.
- Caching: Use Redis or Memcached to cache recent messages and user presence info.
- Message queues: Use queues like Kafka or RabbitMQ to decouple message ingestion and delivery.
- CDN: For media files (images, videos), use CDN to reduce bandwidth on origin servers.
- Geo-distribution: Deploy servers and databases closer to users to reduce latency.
Assuming 1 million users sending 10 messages/day each:
- Messages per second (QPS): ~115 (10M messages / 86400 seconds)
- Database writes: 115 QPS (each message is a write)
- Database reads: 230 QPS (assuming 2 reads per message for delivery and retrieval)
- Storage: 10M messages/day * 1KB/message = ~10GB/day
- Network bandwidth: 10M messages/day * 1KB = ~120 KB/s peak
- One DB instance can handle ~5,000 QPS, so single DB can handle this load but with little room for growth.
Start by explaining the user scale and expected message volume. Identify the database as the first bottleneck. Discuss horizontal scaling of app servers, caching, and database read replicas. Then explain sharding and geo-distribution for large scale. Always justify why each solution fits the bottleneck.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas to offload read queries and reduce load on the primary database. Also consider caching frequently accessed data to reduce DB hits.
