0
0
HLDsystem_design~15 mins

Database replication (master-slave) in HLD - Deep Dive

Choose your learning style9 modes available
Overview - Database replication (master-slave)
What is it?
Database replication (master-slave) is a way to copy data from one main database called the master to one or more copies called slaves. The master handles all the writes and updates, while slaves keep copies of the data to help with reading. This setup helps spread the work and keeps data safe if one database fails. It is like having a main notebook and several copies to share with friends.
Why it matters
Without replication, a single database can become slow or stop working if too many people use it or if it crashes. Replication helps by sharing the load and making sure data is not lost. This means websites and apps stay fast and reliable, even with many users or problems. Without it, users would face slow responses and data loss risks.
Where it fits
Before learning this, you should understand basic databases and how data is stored and retrieved. After this, you can learn about more advanced replication types like multi-master or distributed databases, and how to handle conflicts and scaling in big systems.
Mental Model
Core Idea
Master-slave replication copies data from one main database to others to share reading work and improve reliability.
Think of it like...
It is like a teacher (master) writing notes on a blackboard, and students (slaves) copying those notes into their notebooks to study and share with others.
┌───────────┐       Replication       ┌───────────┐
│  Master   │ ──────────────────────▶ │  Slave 1  │
│ (Writes)  │                         │ (Reads)   │
└───────────┘                         └───────────┘
      │                                   │
      │                                   │
      │                                   ▼
      │                             ┌───────────┐
      └────────────────────────────▶ │  Slave 2  │
                                    │ (Reads)   │
                                    └───────────┘
Build-Up - 7 Steps
1
FoundationWhat is Master-Slave Replication
🤔
Concept: Introduce the basic idea of master-slave replication and its roles.
Master-slave replication means one database (master) handles all changes like adding or updating data. Other databases (slaves) copy this data and only answer read requests. This helps balance the work and keeps copies safe.
Result
You understand the roles of master and slaves and why replication exists.
Knowing the separate roles of master and slaves helps you see how replication improves performance and safety.
2
FoundationHow Data Flows from Master to Slaves
🤔
Concept: Explain the process of copying data changes from master to slaves.
When the master changes data, it records these changes in a log. Slaves read this log and apply the same changes to their copies. This keeps all copies up to date with the master.
Result
You see how data is copied step-by-step to keep slaves synchronized.
Understanding the log-based copying explains how slaves stay consistent without direct writes.
3
IntermediateHandling Read and Write Requests
🤔Before reading on: do you think slaves can handle write requests or only reads? Commit to your answer.
Concept: Clarify which database handles which type of request and why.
The master handles all write requests because it is the only one that can change data. Slaves handle read requests to reduce the load on the master. This separation improves speed and avoids conflicts.
Result
You know how traffic is split between master and slaves for efficiency.
Knowing this split prevents confusion about why slaves don’t accept writes and helps design better systems.
4
IntermediateDealing with Replication Delay
🤔Before reading on: do you think slaves always have the exact same data as the master instantly? Commit to your answer.
Concept: Introduce the concept of delay between master updates and slave copies.
Because slaves copy changes after the master writes them, there is a small delay called replication lag. During this time, slaves might show slightly older data. Systems must handle this delay carefully.
Result
You understand why data on slaves might be a little behind the master.
Recognizing replication lag helps avoid mistakes when reading data that must be up-to-date.
5
IntermediateScaling Reads with Multiple Slaves
🤔
Concept: Explain how adding more slaves helps handle more read requests.
By adding more slaves, many users can read data at the same time without slowing down the master. Each slave copies the master’s data and serves read requests independently.
Result
You see how replication helps scale systems to support many users.
Knowing this scaling method shows how replication supports growth without changing the master.
6
AdvancedFailover and Data Safety in Replication
🤔Before reading on: do you think slaves can replace the master automatically if it fails? Commit to your answer.
Concept: Discuss how systems handle master failures and keep data safe.
If the master fails, one slave can be promoted to master to keep the system running. This requires careful coordination to avoid data loss or conflicts. Backup and monitoring are also important.
Result
You understand how replication supports high availability and disaster recovery.
Knowing failover processes helps design systems that stay online and protect data.
7
ExpertChallenges and Tradeoffs in Master-Slave Replication
🤔Before reading on: do you think master-slave replication solves all scaling and consistency problems perfectly? Commit to your answer.
Concept: Explore limitations like consistency issues, write bottlenecks, and complexity.
Master-slave replication can cause stale reads due to lag, and the master can become a bottleneck for writes. Complex systems may need multi-master or distributed replication to solve these problems. Understanding these tradeoffs is key for real-world design.
Result
You grasp the limits of master-slave replication and when to consider other methods.
Understanding these tradeoffs prevents overusing master-slave replication where it does not fit.
Under the Hood
The master database writes all changes to a special log called the binary log. Slaves connect to the master and read this log continuously. They replay the changes in the same order to keep their data copies synchronized. This process uses network communication and careful ordering to avoid conflicts.
Why designed this way?
This design separates write and read workloads to improve performance and reliability. Using a log allows slaves to apply changes asynchronously, reducing master load. Alternatives like synchronous replication were slower and less scalable, so asynchronous log-based replication became popular.
┌───────────┐       ┌───────────────┐       ┌───────────┐
│  Master   │──────▶│ Binary Log    │──────▶│  Slave 1  │
│ (Writes)  │       │ (Change Log)  │       │ (Reads)   │
└───────────┘       └───────────────┘       └───────────┘
                         │                        │
                         │                        ▼
                         │                  ┌───────────┐
                         └─────────────────▶│  Slave 2  │
                                            │ (Reads)   │
                                            └───────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do slaves handle write requests in master-slave replication? Commit to yes or no.
Common Belief:Slaves can accept write requests just like the master.
Tap to reveal reality
Reality:Only the master handles writes; slaves only serve reads to avoid conflicts.
Why it matters:Trying to write to slaves causes errors and data inconsistency.
Quick: Do slaves always have the exact same data as the master instantly? Commit to yes or no.
Common Belief:Slaves always have perfectly up-to-date data matching the master.
Tap to reveal reality
Reality:Slaves have a delay called replication lag and may show older data temporarily.
Why it matters:Assuming slaves are always current can cause wrong decisions based on stale data.
Quick: Does master-slave replication solve all scaling problems perfectly? Commit to yes or no.
Common Belief:Master-slave replication can handle unlimited scaling and consistency perfectly.
Tap to reveal reality
Reality:It improves read scaling but the master can become a write bottleneck and consistency issues remain.
Why it matters:Overreliance on master-slave replication can limit system growth and cause unexpected bugs.
Quick: Can slaves automatically become master without any setup? Commit to yes or no.
Common Belief:Slaves automatically take over as master if the master fails without extra work.
Tap to reveal reality
Reality:Failover requires manual or automated setup; it is not automatic by default.
Why it matters:Assuming automatic failover leads to downtime and data loss during failures.
Expert Zone
1
Replication lag varies with network speed and workload, so monitoring is essential to avoid stale reads.
2
Some systems use semi-synchronous replication to reduce lag but accept slight delays for better performance.
3
Promoting a slave to master requires careful handling of pending transactions to avoid data loss.
When NOT to use
Master-slave replication is not suitable when write scaling is needed or when strong consistency is critical. Alternatives like multi-master replication or distributed consensus systems (e.g., Raft, Paxos) should be used instead.
Production Patterns
In production, master-slave replication is often combined with load balancers directing reads to slaves and writes to master. Monitoring tools track replication lag and automate failover with orchestration systems like Kubernetes or custom scripts.
Connections
Event Sourcing
Builds-on
Both use logs of changes to reconstruct state, helping understand how replication logs keep data consistent.
Content Delivery Networks (CDNs)
Similar pattern
CDNs replicate content from origin servers to edge caches, like slaves copying data from master, to improve read speed and availability.
Human Memory Systems
Analogy in biology
Just as the brain stores a master memory and creates copies for quick access, master-slave replication balances accuracy and speed.
Common Pitfalls
#1Trying to write data directly to a slave database.
Wrong approach:INSERT INTO slave_table VALUES ('data'); -- executed on slave
Correct approach:INSERT INTO master_table VALUES ('data'); -- executed on master
Root cause:Misunderstanding that slaves are read-only and do not accept writes.
#2Ignoring replication lag and reading stale data from slaves for critical decisions.
Wrong approach:SELECT balance FROM slave_account WHERE user_id=123; -- immediately after a deposit
Correct approach:SELECT balance FROM master_account WHERE user_id=123; -- for latest data
Root cause:Not accounting for delay between master updates and slave synchronization.
#3Assuming slaves automatically become master on failure without setup.
Wrong approach:No failover configuration; relying on slaves to take over automatically.
Correct approach:Configure automated failover tools or manual promotion procedures.
Root cause:Lack of understanding of failover mechanisms and their configuration.
Key Takeaways
Master-slave replication separates write and read workloads to improve database performance and reliability.
The master database handles all writes, while slaves copy data and serve read requests, reducing load on the master.
Replication lag means slaves may show slightly outdated data, so critical reads should consider this delay.
Failover requires careful setup to promote a slave to master and avoid data loss during failures.
Master-slave replication has limits in write scaling and consistency, so other replication methods may be needed for complex systems.