HLDsystem_design~10 mins

Database replication (master-slave) in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Database replication (master-slave)

Growth Table: Database Replication (Master-Slave)

Users / Traffic	Writes per second	Reads per second	Replication Lag	Number of Slaves	Notes
100 users	10	50	Negligible	1	Single master, one slave for read scaling
10,000 users	1,000	5,000	Small, under 1 second	3-5	More slaves added to handle read traffic
1,000,000 users	50,000	200,000	Noticeable, seconds delay	10-20	Replication lag increases, master CPU high
100,000,000 users	1,000,000+	5,000,000+	High, seconds to minutes	50+	Master bottleneck, replication lag critical

First Bottleneck

The master database server is the first bottleneck because it handles all write operations. As traffic grows, the master CPU and disk I/O become overwhelmed. Replication lag to slaves increases, causing stale reads. Network bandwidth between master and slaves can also limit replication speed.

Scaling Solutions

Read Scaling: Add more slave replicas to distribute read queries.
Write Scaling: Use sharding to split data across multiple masters.
Connection Pooling: Reduce overhead by reusing database connections.
Caching: Use in-memory caches (e.g., Redis) to reduce read load on slaves.
Asynchronous Replication: Accept some lag to improve master throughput.
Monitoring: Track replication lag and master load to trigger scaling actions.

Back-of-Envelope Cost Analysis

Assuming 1 master and 5 slaves at 10,000 users:

Writes: ~1,000 QPS on master (CPU and disk intensive)
Reads: ~5,000 QPS distributed across slaves (~1,000 QPS each)
Replication bandwidth: ~10 MB/s between master and slaves (depends on data size)
Storage: Master and slaves store full dataset; storage grows with data size
Network: Master needs high bandwidth to push changes to all slaves

Interview Tip

Start by explaining the master-slave model and its purpose (write master, read slaves). Discuss bottlenecks focusing on the master for writes and replication lag. Then propose scaling strategies like adding slaves, sharding, and caching. Always mention trade-offs such as eventual consistency and lag. Use numbers to justify your points.

Self Check Question

Your database master handles 1,000 QPS writes. Traffic grows 10x to 10,000 QPS writes. What do you do first and why?

Answer: The master is overwhelmed. First, consider sharding the database to split write load across multiple masters. This reduces load per master and improves write scalability. Adding more slaves won't help write bottleneck.

Key Result

Master database becomes the first bottleneck as write traffic grows; scaling requires sharding for writes and adding slaves for reads.