| Users | Commands (Writes) | Queries (Reads) | Data Storage | Latency | Complexity |
|---|---|---|---|---|---|
| 100 users | Low volume, single write DB | Low volume, single read DB | Single DB instance | Low latency, simple sync | Simple CQRS setup |
| 10,000 users | Moderate writes, DB scaling needed | High reads, read DB replicas | Primary + read replicas | Eventual consistency visible | Separate read/write DBs |
| 1 million users | High writes, sharded command DB | Very high reads, distributed read DB | Sharding + replication | Increased eventual consistency delay | Event sourcing may be added |
| 100 million users | Massive writes, multi-region shards | Extreme reads, global caches/CDNs | Geo-distributed DB clusters | Consistency trade-offs, async sync | Complex event sourcing + CQRS |
CQRS (Command Query Responsibility Segregation) in HLD - Scalability & System Analysis
At small scale, the database handling writes (command side) is the first bottleneck because all changes must be processed and stored reliably. As users grow, the write DB CPU and disk I/O limits are reached first.
Read side scales easier with replicas, so write DB capacity is the main limit initially.
- Vertical scaling: Increase CPU, RAM, and disk speed on command DB server to handle more writes.
- Horizontal scaling: Shard the command database by user or entity to distribute writes across servers.
- Read replicas: Add multiple read-only replicas to serve queries and reduce load on primary DB.
- Caching: Use in-memory caches (e.g., Redis) on query side to speed up frequent reads.
- Event sourcing: Store changes as events to enable replay and rebuild read models asynchronously.
- Asynchronous sync: Decouple command and query sides with message queues to improve write throughput.
- Geo-distribution: Deploy DB clusters in multiple regions to reduce latency for global users.
Assuming 1 million users with 10% active concurrently:
- Concurrent users: 100,000
- Writes per second (commands): ~5,000 QPS (assuming 5% write rate)
- Reads per second (queries): ~95,000 QPS (assuming 95% read rate)
- Storage: Command DB stores events/transactions, estimated 1TB/month
- Read DB stores denormalized views, estimated 2TB for fast queries
- Network bandwidth: Reads dominate, ~1 Gbps needed for query responses
Start by explaining the separation of commands and queries and why it helps scalability.
Discuss bottlenecks on the write side first, then read side.
Outline scaling steps: vertical scaling, read replicas, sharding, caching, and asynchronous event processing.
Use real numbers to show understanding of limits and solutions.
Your database handles 1000 QPS writes. Traffic grows 10x to 10,000 QPS writes. What do you do first?
Answer: Implement sharding of the command database to distribute write load across multiple servers, because vertical scaling alone likely won't handle 10x increase efficiently.
