0
0
HLDsystem_design~10 mins

Database replication (master-slave) in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Database replication (master-slave)
Growth Table: Database Replication (Master-Slave)
Users / TrafficWrites per secondReads per secondReplication LagNumber of SlavesNotes
100 users1050Negligible1Single master, one slave for read scaling
10,000 users1,0005,000Small, under 1 second3-5More slaves added to handle read traffic
1,000,000 users50,000200,000Noticeable, seconds delay10-20Replication lag increases, master CPU high
100,000,000 users1,000,000+5,000,000+High, seconds to minutes50+Master bottleneck, replication lag critical
First Bottleneck

The master database server is the first bottleneck because it handles all write operations. As traffic grows, the master CPU and disk I/O become overwhelmed. Replication lag to slaves increases, causing stale reads. Network bandwidth between master and slaves can also limit replication speed.

Scaling Solutions
  • Read Scaling: Add more slave replicas to distribute read queries.
  • Write Scaling: Use sharding to split data across multiple masters.
  • Connection Pooling: Reduce overhead by reusing database connections.
  • Caching: Use in-memory caches (e.g., Redis) to reduce read load on slaves.
  • Asynchronous Replication: Accept some lag to improve master throughput.
  • Monitoring: Track replication lag and master load to trigger scaling actions.
Back-of-Envelope Cost Analysis

Assuming 1 master and 5 slaves at 10,000 users:

  • Writes: ~1,000 QPS on master (CPU and disk intensive)
  • Reads: ~5,000 QPS distributed across slaves (~1,000 QPS each)
  • Replication bandwidth: ~10 MB/s between master and slaves (depends on data size)
  • Storage: Master and slaves store full dataset; storage grows with data size
  • Network: Master needs high bandwidth to push changes to all slaves
Interview Tip

Start by explaining the master-slave model and its purpose (write master, read slaves). Discuss bottlenecks focusing on the master for writes and replication lag. Then propose scaling strategies like adding slaves, sharding, and caching. Always mention trade-offs such as eventual consistency and lag. Use numbers to justify your points.

Self Check Question

Your database master handles 1,000 QPS writes. Traffic grows 10x to 10,000 QPS writes. What do you do first and why?

Answer: The master is overwhelmed. First, consider sharding the database to split write load across multiple masters. This reduces load per master and improves write scalability. Adding more slaves won't help write bottleneck.

Key Result
Master database becomes the first bottleneck as write traffic grows; scaling requires sharding for writes and adding slaves for reads.