Overview - Database replication (master-slave)

What is it?

Database replication (master-slave) is a way to copy data from one main database called the master to one or more copies called slaves. The master handles all the writes and updates, while slaves keep copies of the data to help with reading. This setup helps spread the work and keeps data safe if one database fails. It is like having a main notebook and several copies to share with friends.

Why it matters

Without replication, a single database can become slow or stop working if too many people use it or if it crashes. Replication helps by sharing the load and making sure data is not lost. This means websites and apps stay fast and reliable, even with many users or problems. Without it, users would face slow responses and data loss risks.

Where it fits

Before learning this, you should understand basic databases and how data is stored and retrieved. After this, you can learn about more advanced replication types like multi-master or distributed databases, and how to handle conflicts and scaling in big systems.

Mental Model

Core Idea

Master-slave replication copies data from one main database to others to share reading work and improve reliability.

Think of it like...

It is like a teacher (master) writing notes on a blackboard, and students (slaves) copying those notes into their notebooks to study and share with others.

┌───────────┐       Replication       ┌───────────┐
│  Master   │ ──────────────────────▶ │  Slave 1  │
│ (Writes)  │                         │ (Reads)   │
└───────────┘                         └───────────┘
      │                                   │
      │                                   │
      │                                   ▼
      │                             ┌───────────┐
      └────────────────────────────▶ │  Slave 2  │
                                    │ (Reads)   │
                                    └───────────┘

Build-Up - 7 Steps

1

FoundationWhat is Master-Slave Replication

Concept: Introduce the basic idea of master-slave replication and its roles.

Master-slave replication means one database (master) handles all changes like adding or updating data. Other databases (slaves) copy this data and only answer read requests. This helps balance the work and keeps copies safe.

Result

You understand the roles of master and slaves and why replication exists.

Knowing the separate roles of master and slaves helps you see how replication improves performance and safety.

2

FoundationHow Data Flows from Master to Slaves

3

IntermediateHandling Read and Write Requests

4

IntermediateDealing with Replication Delay

5

IntermediateScaling Reads with Multiple Slaves

6

AdvancedFailover and Data Safety in Replication

7

ExpertChallenges and Tradeoffs in Master-Slave Replication

Under the Hood

The master database writes all changes to a special log called the binary log. Slaves connect to the master and read this log continuously. They replay the changes in the same order to keep their data copies synchronized. This process uses network communication and careful ordering to avoid conflicts.

Why designed this way?

This design separates write and read workloads to improve performance and reliability. Using a log allows slaves to apply changes asynchronously, reducing master load. Alternatives like synchronous replication were slower and less scalable, so asynchronous log-based replication became popular.

┌───────────┐       ┌───────────────┐       ┌───────────┐
│  Master   │──────▶│ Binary Log    │──────▶│  Slave 1  │
│ (Writes)  │       │ (Change Log)  │       │ (Reads)   │
└───────────┘       └───────────────┘       └───────────┘
                         │                        │
                         │                        ▼
                         │                  ┌───────────┐
                         └─────────────────▶│  Slave 2  │
                                            │ (Reads)   │
                                            └───────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do slaves handle write requests in master-slave replication? Commit to yes or no.

Common Belief:Slaves can accept write requests just like the master.

Tap to reveal reality

Quick: Do slaves always have the exact same data as the master instantly? Commit to yes or no.

Common Belief:Slaves always have perfectly up-to-date data matching the master.

Tap to reveal reality

Quick: Does master-slave replication solve all scaling problems perfectly? Commit to yes or no.

Common Belief:Master-slave replication can handle unlimited scaling and consistency perfectly.

Tap to reveal reality

Quick: Can slaves automatically become master without any setup? Commit to yes or no.

Common Belief:Slaves automatically take over as master if the master fails without extra work.

Tap to reveal reality

Expert Zone

1

Replication lag varies with network speed and workload, so monitoring is essential to avoid stale reads.

2

Some systems use semi-synchronous replication to reduce lag but accept slight delays for better performance.

3

Promoting a slave to master requires careful handling of pending transactions to avoid data loss.

When NOT to use

Master-slave replication is not suitable when write scaling is needed or when strong consistency is critical. Alternatives like multi-master replication or distributed consensus systems (e.g., Raft, Paxos) should be used instead.

Production Patterns

In production, master-slave replication is often combined with load balancers directing reads to slaves and writes to master. Monitoring tools track replication lag and automate failover with orchestration systems like Kubernetes or custom scripts.

Connections

Event Sourcing

Builds-on

Both use logs of changes to reconstruct state, helping understand how replication logs keep data consistent.

Content Delivery Networks (CDNs)

Similar pattern

CDNs replicate content from origin servers to edge caches, like slaves copying data from master, to improve read speed and availability.

Human Memory Systems

Analogy in biology

Just as the brain stores a master memory and creates copies for quick access, master-slave replication balances accuracy and speed.

Common Pitfalls

#1Trying to write data directly to a slave database.

Wrong approach:INSERT INTO slave_table VALUES ('data'); -- executed on slave

Correct approach:INSERT INTO master_table VALUES ('data'); -- executed on master

Root cause:Misunderstanding that slaves are read-only and do not accept writes.

#2Ignoring replication lag and reading stale data from slaves for critical decisions.

Wrong approach:SELECT balance FROM slave_account WHERE user_id=123; -- immediately after a deposit

Correct approach:SELECT balance FROM master_account WHERE user_id=123; -- for latest data

Root cause:Not accounting for delay between master updates and slave synchronization.

#3Assuming slaves automatically become master on failure without setup.

Wrong approach:No failover configuration; relying on slaves to take over automatically.

Correct approach:Configure automated failover tools or manual promotion procedures.

Root cause:Lack of understanding of failover mechanisms and their configuration.

Key Takeaways

Master-slave replication separates write and read workloads to improve database performance and reliability.

The master database handles all writes, while slaves copy data and serve read requests, reducing load on the master.

Replication lag means slaves may show slightly outdated data, so critical reads should consider this delay.

Failover requires careful setup to promote a slave to master and avoid data loss during failures.

Master-slave replication has limits in write scaling and consistency, so other replication methods may be needed for complex systems.