Overview - Oplog and replication mechanism

What is it?

In MongoDB, the oplog is a special log that records all changes made to the data. Replication is the process where these changes are copied from one database server (primary) to others (secondaries) to keep them in sync. This ensures data is safe and available even if one server fails. The oplog is the key tool that makes replication possible by tracking every write operation.

Why it matters

Without the oplog and replication, if a database server crashes, all data could be lost or outdated. Replication keeps multiple copies of data updated automatically, so users can keep working without interruption. It also helps distribute read requests to improve performance. This system makes MongoDB reliable and scalable for real-world applications.

Where it fits

Before learning about oplog and replication, you should understand basic MongoDB concepts like collections, documents, and CRUD operations. After this, you can explore advanced topics like sharding, failover, and backup strategies. This topic is a foundation for building resilient and distributed MongoDB systems.

Mental Model

Core Idea

The oplog is a journal of all data changes that MongoDB uses to copy updates from the primary server to secondary servers, keeping them synchronized automatically.

Think of it like...

Imagine a shared notebook where one person writes down every change they make to a recipe. Others copy these notes to their own notebooks to keep their recipes exactly the same. The notebook is like the oplog, and copying notes is like replication.

Primary Server
┌─────────────────────┐
│   Application writes │
│   changes to data    │
│   (insert, update)   │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│    Oplog records    │
│   every change      │
└─────────┬───────────┘
          │
          ▼
Secondary Servers
┌─────────────────────┐   ┌─────────────────────┐
│  Read oplog entries │   │  Read oplog entries │
│  and apply changes  │   │  and apply changes  │
└─────────────────────┘   └─────────────────────┘

Build-Up - 7 Steps

1

FoundationWhat is the oplog in MongoDB

Concept: The oplog is a special collection that stores a record of all changes made to the data on the primary server.

MongoDB keeps a special collection called the oplog (operations log) in a database named local. Every time data changes (like adding or updating documents), MongoDB writes a record of that change into the oplog. This log is a simple, ordered list of operations.

Result

You get a continuous, ordered list of all data changes that happened on the primary server.

Understanding the oplog as a change journal is key because it is the source of truth for replication.

2

FoundationBasics of MongoDB replication

3

IntermediateHow secondaries use the oplog

4

IntermediateOplog size and rollover behavior

5

IntermediateReplication states and failover

6

AdvancedHandling replication lag and rollback

7

ExpertOplog internals and optimization

Under the Hood

The oplog is a capped collection that stores a rolling log of all write operations on the primary. Each entry includes a timestamp, operation type, namespace, and operation details. Secondary servers continuously read new oplog entries in order and apply them to their data sets. This tailing process uses a tailable cursor that waits for new entries. The capped nature ensures fixed size and fast writes. During failover, election protocols ensure the new primary has the latest oplog entries to maintain consistency.

Why designed this way?

MongoDB designed the oplog as a capped collection to balance performance and storage. Using a rolling log avoids unbounded growth and allows efficient sequential reads by secondaries. The operation-based log (not full snapshots) reduces network and disk load. The primary-secondary model with oplog tailing simplifies replication logic and supports automatic failover. Alternatives like full snapshot copying or statement-based replication were less efficient or less reliable for distributed systems.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Primary     │       │   Oplog       │       │   Secondary   │
│  (writes)    │──────▶│ (capped log)  │──────▶│ (reads oplog) │
└───────────────┘       └───────────────┘       └───────────────┘
        │                      ▲                       │
        │                      │                       │
        │                      │                       │
        │               Oplog tailing cursor           │
        │                                              │
        ▼                                              ▼
  Application                                    Data updated

Myth Busters - 4 Common Misconceptions

Quick: Does the oplog store full copies of documents or just the changes? Commit to your answer.

Common Belief:The oplog stores full copies of every document after each change.

Tap to reveal reality

Quick: Can a secondary server serve writes directly? Commit to your answer.

Common Belief:Secondary servers can accept write operations just like the primary.

Tap to reveal reality

Quick: Does the oplog grow indefinitely without limit? Commit to your answer.

Common Belief:The oplog keeps growing forever, storing all history.

Tap to reveal reality

Quick: Does replication guarantee zero lag between primary and secondaries? Commit to your answer.

Common Belief:Replication is instant and secondaries are always perfectly up-to-date.

Tap to reveal reality

Expert Zone

1

Oplog entries are stored as BSON documents with a precise timestamp (ts) that ensures strict ordering across the replica set.

2

During failover, the new primary must have an oplog that is a superset of the old primary’s oplog to avoid data loss, which can cause election delays.

3

The oplog size and write volume must be balanced carefully; too small oplog or high write rates can cause secondaries to fall behind and require full resync.

When NOT to use

Oplog-based replication is not suitable for multi-master or active-active setups where multiple nodes accept writes simultaneously. For such cases, other databases or conflict-free replicated data types (CRDTs) are better. Also, if extremely low latency synchronous replication is required, MongoDB’s asynchronous oplog replication may not suffice.

Production Patterns

In production, teams monitor oplog size and replication lag closely using monitoring tools. They configure read preferences to distribute read load to secondaries. Backup strategies often rely on secondary nodes to avoid impacting primary performance. Failover and election settings are tuned to balance availability and consistency. Large clusters use sharding combined with replication for scalability.

Connections

Event Sourcing

Both use a log of changes to reconstruct current state.

Understanding oplog as an event log helps grasp how systems can rebuild data state from a sequence of changes, a pattern common in software design.

Distributed Consensus Algorithms (e.g., Raft, Paxos)

Replication and failover rely on consensus to elect primaries and maintain consistency.

Knowing consensus algorithms clarifies how MongoDB ensures only one primary exists and how it handles failover safely.

Version Control Systems (e.g., Git)

Both track changes incrementally and handle merging or rollback of changes.

Seeing oplog replication like version control helps understand rollback scenarios and conflict resolution in distributed databases.

Common Pitfalls

#1Trying to write directly to a secondary server.

Wrong approach:db.collection.insert({name: 'test'}) // run on secondary server

Correct approach:Connect to the primary server to perform writes: db.collection.insert({name: 'test'})

Root cause:Misunderstanding that only the primary accepts writes; secondaries are read-only replicas.

#2Setting oplog size too small for write volume.

Wrong approach:Starting MongoDB with oplog size of 100MB on a high-write workload cluster.

Correct approach:Configure oplog size based on expected write volume, e.g., 5GB for heavy write workloads.

Root cause:Underestimating oplog size causes entries to be overwritten before secondaries catch up, forcing full resync.

#3Ignoring replication lag in application design.

Wrong approach:Reading from secondaries immediately after writes without considering lag.

Correct approach:Use read preferences carefully or read from primary when fresh data is required.

Root cause:Not accounting for asynchronous replication delay leads to stale reads and inconsistent application behavior.

Key Takeaways

The oplog is a special, fixed-size log that records every data change on the primary server in MongoDB.

Replication uses the oplog to copy changes efficiently from the primary to secondary servers, keeping data synchronized.

Secondaries apply oplog entries incrementally, which is faster and uses less storage than copying full data repeatedly.

Replication supports automatic failover by electing a new primary, ensuring high availability without manual intervention.

Understanding oplog internals and replication behavior helps design reliable, scalable MongoDB systems and avoid common pitfalls.