0
0
Redisquery~15 mins

Synchronization process in Redis - Deep Dive

Choose your learning style9 modes available
Overview - Synchronization process
What is it?
Synchronization process in Redis is how data is copied and kept consistent between a master server and its replicas. It ensures that replicas have the same data as the master, even if they connect later or lose connection temporarily. This process involves sending the full dataset or just the changes since the last sync. It helps Redis maintain fast and reliable data replication.
Why it matters
Without synchronization, replicas would have outdated or missing data, causing errors or inconsistent results in applications relying on Redis. Synchronization allows Redis to scale reads and provide high availability by keeping multiple copies of data up to date. This means users get fast responses and systems stay reliable even if some servers fail.
Where it fits
Before learning synchronization, you should understand Redis basics like keys, values, and commands. After this, you can learn about Redis replication, failover, and clustering to build resilient and scalable Redis setups.
Mental Model
Core Idea
Synchronization in Redis is the process of copying data from the master to replicas to keep them identical and up to date.
Think of it like...
Imagine a teacher (master) giving notes to students (replicas). When a new student joins late, the teacher gives them all the notes so far (full sync). After that, the teacher only shares new notes (incremental sync) so everyone stays on the same page.
┌─────────────┐       Full Sync       ┌─────────────┐
│   Master    │──────────────────────▶│  Replica 1  │
└─────────────┘                       └─────────────┘
       │                                   ▲
       │ Incremental Sync                  │
       └───────────────────────────────────┘

If Replica 1 disconnects and reconnects:

┌─────────────┐       Full Sync       ┌─────────────┐
│   Master    │──────────────────────▶│  Replica 1  │
└─────────────┘                       └─────────────┘
Build-Up - 7 Steps
1
FoundationWhat is Redis replication
🤔
Concept: Replication means copying data from one Redis server (master) to others (replicas).
Redis replication allows one server to be the source of truth (master) and others to keep copies (replicas). Replicas can serve read requests, reducing load on the master. This copying happens continuously to keep data consistent.
Result
You get multiple Redis servers with the same data, improving read speed and availability.
Understanding replication is key because synchronization is the process that makes replication possible.
2
FoundationBasics of synchronization in Redis
🤔
Concept: Synchronization is how Redis copies data from master to replicas, either fully or incrementally.
When a replica connects, it asks the master for data. The master sends a full copy of all data (full sync). After that, the master sends only new changes (incremental sync) to keep the replica updated.
Result
Replicas start with the same data as the master and stay updated with new changes.
Knowing the two types of sync helps you understand how Redis keeps replicas current without wasting resources.
3
IntermediateFull synchronization process explained
🤔Before reading on: do you think full sync sends only changed data or the entire dataset? Commit to your answer.
Concept: Full sync sends the entire dataset from master to replica to start fresh.
When a replica connects for the first time or after losing connection too long, the master creates a snapshot of its data (RDB file) and sends it to the replica. The replica loads this snapshot to have the full dataset.
Result
Replica has a complete copy of the master's data, ready to receive updates.
Understanding full sync explains how replicas recover from disconnection or start fresh without missing data.
4
IntermediateIncremental synchronization details
🤔Before reading on: do you think incremental sync sends all data again or only recent changes? Commit to your answer.
Concept: Incremental sync sends only new commands executed on the master after the last sync.
After full sync, the master keeps a buffer of recent commands. It sends these commands to replicas as they happen, so replicas apply only new changes. This keeps replicas updated efficiently.
Result
Replicas stay current with minimal data transfer, improving performance.
Knowing incremental sync shows how Redis optimizes replication to avoid unnecessary data copying.
5
IntermediatePartial resynchronization to save bandwidth
🤔
Concept: Partial resynchronization lets replicas reconnect without full sync if they lost connection briefly.
If a replica disconnects but the master's command buffer still has the missing commands, the replica asks for only those commands instead of a full sync. This saves time and bandwidth.
Result
Faster recovery and less network load when replicas reconnect quickly.
Understanding partial resync helps you appreciate Redis's efficiency in real-world unstable networks.
6
AdvancedReplication backlog and its role
🤔Before reading on: do you think the master stores all past commands forever or only a limited recent set? Commit to your answer.
Concept: The replication backlog is a fixed-size buffer storing recent commands for partial resync.
The master keeps a circular buffer of recent write commands. If a replica disconnects, it can request missing commands from this backlog. If the backlog is too small or replica disconnects too long, full sync is needed.
Result
Efficient replication with fallback to full sync when needed.
Knowing about the backlog explains why partial resync sometimes fails and full sync is necessary.
7
ExpertSurprises in synchronization internals
🤔Before reading on: do you think Redis blocks all writes during full sync? Commit to your answer.
Concept: During full sync, Redis forks a child process to create the snapshot, allowing writes to continue.
Redis uses a fork system call to create a child process that saves the snapshot (RDB file) without blocking the master. This means the master can keep serving commands while preparing data for replicas.
Result
Minimal impact on performance during full sync, enabling high availability.
Understanding this internal forking reveals how Redis balances data consistency with performance during synchronization.
Under the Hood
Redis synchronization uses a combination of snapshotting and command streaming. When a replica connects, the master forks a child process to create a point-in-time snapshot (RDB file) without blocking writes. This snapshot is sent to the replica for full sync. Meanwhile, the master buffers new write commands in a replication backlog. After full sync, the master streams these buffered commands and new commands to replicas for incremental sync. If a replica disconnects briefly, it can request missing commands from the backlog for partial resync. If the backlog no longer contains needed commands, a full sync is triggered again.
Why designed this way?
This design balances data consistency, performance, and network efficiency. Forking avoids blocking the master during snapshot creation, which is critical for fast response times. The replication backlog enables efficient incremental updates and partial resync, reducing bandwidth and latency. Alternatives like blocking writes during sync or sending full data every time were rejected because they hurt performance and scalability.
┌─────────────┐          fork          ┌─────────────┐
│   Master    │────────────────────────▶│ Child (RDB) │
│ (serves cmds)│                         └─────────────┘
│             │
│  Buffers cmds│
└─────┬───────┘
      │
      │ Full Sync (RDB file)
      ▼
┌─────────────┐
│  Replica    │
│  loads RDB  │
└─────┬───────┘
      │ Incremental Sync (cmds from backlog)
      ▼
┌─────────────┐
│  Replica    │
│  applies cmds│
└─────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does Redis send the entire dataset every time a replica reconnects? Commit yes or no.
Common Belief:Redis always sends the full dataset to replicas on every reconnect.
Tap to reveal reality
Reality:Redis sends the full dataset only on first sync or if partial resync is not possible; otherwise, it sends only incremental changes.
Why it matters:Believing this leads to overestimating network usage and misunderstanding Redis's efficiency.
Quick: Does Redis block all writes during full synchronization? Commit yes or no.
Common Belief:Redis stops all writes while creating the snapshot for full sync.
Tap to reveal reality
Reality:Redis forks a child process to create the snapshot, allowing the master to continue serving writes.
Why it matters:Thinking writes are blocked causes unnecessary fear about Redis performance during sync.
Quick: Can a replica always do partial resync after disconnect? Commit yes or no.
Common Belief:Partial resync always works if a replica reconnects quickly.
Tap to reveal reality
Reality:Partial resync only works if the master's replication backlog still contains the missing commands; otherwise, full sync is needed.
Why it matters:Assuming partial resync always works can cause unexpected full syncs and performance hits.
Quick: Is synchronization only about copying data once? Commit yes or no.
Common Belief:Synchronization is a one-time data copy when replicas connect.
Tap to reveal reality
Reality:Synchronization is ongoing, with incremental updates sent continuously after the initial full sync.
Why it matters:Misunderstanding this leads to wrong assumptions about data freshness and replication behavior.
Expert Zone
1
The replication backlog size and configuration directly impact the success rate of partial resynchronizations.
2
During full sync, the forked child process shares memory pages with the master until writes cause copy-on-write, optimizing memory usage.
3
Network latency and bandwidth can affect how quickly incremental commands reach replicas, influencing data freshness.
When NOT to use
Synchronization via replication is not suitable for multi-master setups or complex conflict resolution scenarios. For such cases, use Redis Cluster or external consensus systems like Raft or Paxos-based databases.
Production Patterns
In production, Redis replication with synchronization is used to scale read workloads by directing reads to replicas. It also supports high availability setups with automatic failover tools like Redis Sentinel, which rely on synchronization to keep replicas ready to promote.
Connections
Distributed Consensus Algorithms
Both ensure data consistency across multiple nodes but use different methods.
Understanding synchronization in Redis helps grasp how simpler replication differs from complex consensus protocols like Raft that handle conflicts and leader election.
Version Control Systems (e.g., Git)
Both use snapshots and incremental changes to keep copies in sync.
Seeing synchronization as snapshots plus incremental updates connects Redis replication to how Git manages repository states efficiently.
Human Memory and Learning
Synchronization mimics how humans recall full context first, then update with new information.
This cross-domain link shows how systems and brains optimize information transfer by combining full recall with incremental updates.
Common Pitfalls
#1Replica never catches up after disconnecting.
Wrong approach:Replica reconnects and expects partial resync but the master's backlog is too small or lost. # Replica config: replicaof master 6379 # Master backlog too small or lost
Correct approach:Increase replication backlog size on master: # Master config: repl-backlog-size 1mb # Then restart master and reconnect replica
Root cause:Misunderstanding that partial resync depends on backlog size and availability.
#2High latency during full sync causes Redis to block writes.
Wrong approach:Assuming Redis blocks writes during snapshot creation and limiting write operations manually.
Correct approach:Trust Redis's fork mechanism that allows writes during snapshot creation; optimize snapshot speed instead.
Root cause:Not knowing Redis uses OS-level fork to avoid blocking writes.
#3Configuring replicas to always do full sync unnecessarily.
Wrong approach:Disabling partial resync with: replica-serve-stale-data no forcing full sync on every reconnect.
Correct approach:Enable partial resync by default and tune backlog size for efficiency.
Root cause:Misunderstanding partial resync benefits and configuration.
Key Takeaways
Synchronization in Redis keeps replicas updated by sending a full data snapshot initially and incremental changes afterward.
Redis uses a forked child process to create snapshots without blocking writes, ensuring high performance during sync.
Partial resynchronization optimizes reconnection by sending only missing commands if the master's backlog still has them.
Replication backlog size and network conditions affect synchronization efficiency and reliability.
Understanding synchronization is essential for building scalable, highly available Redis systems with consistent data.