| Scale | Users | Data Centers | Network Latency | Consistency | Availability | Partition Tolerance |
|---|---|---|---|---|---|---|
| Small | 100 users | 1 | Low | Strong consistency easy | High availability easy | Minimal partitions |
| Medium | 10K users | 2-3 | Moderate | Trade-offs start | Maintained with retries | Partitions possible |
| Large | 1M users | Multiple geo-distributed | High | Often eventual consistency | High availability prioritized | Partitions common |
| Very Large | 100M users | Global multi-region | Very high | Mostly eventual consistency | Availability critical | Partition tolerance mandatory |
The CAP theorem in HLD - Scalability & System Analysis
As the system grows, network partitions become inevitable due to unreliable communication between data centers. The first bottleneck is the trade-off between consistency and availability during partitions. You cannot have both strong consistency and full availability when partitions happen.
- Choose your priority: Decide if your system needs strong consistency (CP) or high availability (AP) during partitions.
- Use replication: Replicate data across nodes to improve availability and fault tolerance.
- Implement eventual consistency: Accept temporary inconsistencies to keep the system available.
- Partition tolerance: Design the system to handle network failures gracefully.
- Use consensus algorithms: For CP systems, use protocols like Paxos or Raft to maintain consistency.
- Client-side conflict resolution: For AP systems, resolve conflicts after partitions heal.
Assuming 1M users with 1 request per second:
- Requests per second: ~1,000,000 QPS
- Single database node handles: ~5,000 QPS → Need ~200 nodes or sharding
- Network bandwidth: 1 Gbps = 125 MB/s, so multiple data centers need high bandwidth links
- Storage: Depends on data size; replication increases storage needs by number of replicas
- Latency: Higher with geo-distribution, affects consistency choices
When discussing CAP theorem in an interview, start by explaining the three guarantees clearly. Then, describe real-world scenarios where you would prioritize consistency over availability or vice versa. Use examples like banking (CP) vs social media feeds (AP). Finally, mention how network partitions force trade-offs and how you would design your system accordingly.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce load on the primary database. Consider sharding data to distribute load. Also, evaluate if consistency requirements allow for eventual consistency to improve availability.