| Users/Traffic | Write Model | Read Model | Data Sync | Infrastructure |
|---|---|---|---|---|
| 100 users | Single write service, single DB | Single read DB, simple queries | Simple event propagation, low latency | 1 app server, 1 DB server |
| 10K users | Write service scales horizontally, DB with connection pooling | Read replicas added, caching introduced | Event queue for async updates | Multiple app servers, read replicas |
| 1M users | Write DB sharded by user or domain, write service scaled | Read DBs sharded, heavy caching (Redis/ElasticSearch) | Robust event streaming (Kafka), eventual consistency | Clustered microservices, message brokers |
| 100M users | Multi-region write DB shards, global write services | Global read replicas, CDN for read-heavy data | Highly available event streaming, conflict resolution | Geo-distributed clusters, advanced monitoring |
CQRS pattern in Microservices - Scalability & System Analysis
At small scale, the write database is the first bottleneck because all commands must be processed and stored reliably. As users grow, the write DB hits limits on connections and transaction throughput.
Read side scales easier with replicas and caching, so write DB capacity and consistency become the main challenge.
- Horizontal scaling: Add more write service instances behind a load balancer.
- Database sharding: Split write DB by user or domain to reduce contention.
- Read replicas: Use multiple read-only DB replicas to handle query load.
- Caching: Use in-memory caches (Redis) or search engines (ElasticSearch) for fast reads.
- Event streaming: Use message brokers (Kafka) for reliable async data sync between write and read models.
- Conflict resolution: Implement eventual consistency and handle conflicts gracefully.
- Geo-distribution: Deploy services and DB shards in multiple regions for latency and availability.
Assuming 1M users with 10,000 requests per second (RPS) total:
- Write QPS: ~1000 (assuming 10% writes)
- Read QPS: ~9000 (read-heavy)
- Write DB: Needs to handle ~1000 transactions/sec, requires sharding or strong scaling
- Read DB replicas: Each can handle ~5000 QPS, so 2 replicas suffice
- Cache: Redis can handle 100K ops/sec, enough for read caching
- Network bandwidth: 1 Gbps (~125 MB/s) sufficient for event streaming and API traffic
- Storage: Write DB stores all commands, estimate 1 KB per write -> ~1 MB/s write storage growth
Start by explaining the separation of write and read models in CQRS and why it helps scaling.
Discuss bottlenecks on the write side first, then how read replicas and caching improve read scalability.
Mention event streaming for syncing data and eventual consistency trade-offs.
Finally, talk about sharding and geo-distribution for very large scale.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Introduce database sharding or add write DB replicas with partitioning to distribute load, because a single DB cannot handle 10,000 QPS reliably.