| Users / Data Size | 100 Users | 10K Users | 1M Users | 100M Users |
|---|---|---|---|---|
| Data Volume | Small (MBs to GBs) | Medium (GBs to TBs) | Large (TBs to PBs) | Very Large (PBs+) |
| Replication Type | Simple master-slave replication | Asynchronous replication with read replicas | Multi-region replication with conflict resolution | Geo-distributed multi-master replication with sharding |
| Latency Tolerance | Low latency acceptable | Some replication lag tolerated | Eventual consistency common | Strong consistency challenging, eventual consistency preferred |
| Replication Lag | Minimal, near real-time | Seconds to minutes | Minutes to hours | Hours, depending on region and network |
| Failure Handling | Manual failover possible | Automated failover with monitoring | Conflict detection and resolution needed | Complex conflict resolution and partition tolerance |
Data replication strategies in HLD - Scalability & System Analysis
At small scale, the primary database server becomes the bottleneck due to write load and replication overhead.
As users grow, replication lag increases causing stale reads and inconsistent data views.
Network bandwidth limits cross-region replication speed at large scale.
Conflict resolution complexity grows with multi-master setups, impacting performance and consistency.
- Horizontal scaling: Add read replicas to distribute read traffic and reduce load on master.
- Asynchronous replication: Use async replication to reduce write latency but accept eventual consistency.
- Multi-region replication: Deploy replicas in multiple regions to reduce latency for global users.
- Sharding: Partition data by key ranges or user segments to distribute load and storage.
- Conflict resolution: Implement application-level or database-level conflict handling for multi-master replication.
- Compression and batching: Compress data and batch replication updates to optimize network usage.
- Monitoring and automated failover: Use monitoring tools to detect failures and switch replicas automatically.
- Requests per second (RPS):
- 100 users: ~10-100 RPS
- 10K users: ~1K-10K RPS
- 1M users: ~100K-1M RPS
- 100M users: ~10M+ RPS
- Storage needed:
- 100 users: GBs
- 10K users: TBs
- 1M users: PBs
- 100M users: Multiple PBs to Exabytes
- Network bandwidth:
- Replication traffic grows with data volume and update frequency.
- Cross-region replication requires high bandwidth and low latency links.
Start by clarifying the scale and consistency requirements.
Discuss trade-offs between synchronous and asynchronous replication.
Explain how replication affects latency, availability, and consistency.
Describe how you would detect and handle replication lag and conflicts.
Outline scaling steps as user base grows, focusing on bottlenecks and solutions.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement connection pooling to distribute read load and reduce master bottleneck. Consider asynchronous replication to reduce write latency and monitor replication lag closely.