Overview - MirrorMaker 2 concept

What is it?

MirrorMaker 2 is a tool in Apache Kafka that copies data between Kafka clusters. It helps keep data synchronized across different locations or environments. It works by reading messages from one cluster and writing them to another, ensuring data is mirrored. This is useful for backup, disaster recovery, or data migration.

Why it matters

Without MirrorMaker 2, managing data across multiple Kafka clusters would be manual and error-prone. It solves the problem of keeping data consistent and available in different places automatically. This means businesses can avoid data loss, reduce downtime, and support global applications that need data close to users. Without it, data replication would be slow, unreliable, or require complex custom solutions.

Where it fits

Before learning MirrorMaker 2, you should understand basic Kafka concepts like topics, producers, consumers, and clusters. After MirrorMaker 2, you can explore advanced Kafka features like multi-cluster management, Kafka Connect, and disaster recovery strategies.

Mental Model

Core Idea

MirrorMaker 2 acts like a smart courier that continuously copies messages from one Kafka cluster to another to keep them in sync.

Think of it like...

Imagine two offices in different cities that need to have the same files. MirrorMaker 2 is like a courier service that picks up new files from one office and delivers them to the other, making sure both offices always have the latest copies.

┌───────────────┐       ┌───────────────┐
│ Source Kafka  │──────▶│ MirrorMaker 2 │──────▶
│   Cluster     │       │ (Courier Tool)│       │
└───────────────┘       └───────────────┘       
                                │              
                                ▼              
                       ┌───────────────┐       
                       │ Target Kafka  │       
                       │   Cluster     │       
                       └───────────────┘

Build-Up - 7 Steps

1

FoundationBasic Kafka Cluster Concepts

Concept: Understanding what a Kafka cluster is and how it stores and streams data.

A Kafka cluster is a group of servers that work together to store and manage streams of messages. Producers send messages to topics in the cluster, and consumers read from these topics. Each cluster is independent and manages its own data.

Result

You know that Kafka clusters hold streams of messages and that producers and consumers interact with these clusters.

Understanding clusters is essential because MirrorMaker 2 moves data between these independent units.

2

FoundationWhat is Data Replication in Kafka?

3

IntermediateMirrorMaker 2 Architecture Overview

4

IntermediateHow MirrorMaker 2 Handles Topic and Offset Translation

5

IntermediateConfiguring MirrorMaker 2 for Multi-Cluster Replication

6

AdvancedHandling Failover and Disaster Recovery with MirrorMaker 2

7

ExpertInternal Offset Sync and Conflict Resolution

Under the Hood

MirrorMaker 2 runs as a Kafka Connect cluster with source and sink connectors. The source connector reads messages from the source Kafka cluster topics, tracking offsets. The sink connector writes these messages to the target cluster, translating topic names and consumer offsets as configured. It uses internal Kafka topics to store replication metadata and offset sync information. This design allows distributed, fault-tolerant replication with monitoring and control.

Why designed this way?

MirrorMaker 2 was built on Kafka Connect to reuse its scalable, pluggable architecture. Earlier MirrorMaker versions were simpler but less reliable and flexible. Using Kafka Connect allows MirrorMaker 2 to handle complex replication scenarios, offset translation, and topic renaming. This design balances performance, reliability, and ease of configuration, addressing limitations of the original MirrorMaker.

┌───────────────┐       ┌─────────────────────┐       ┌───────────────┐
│ Source Kafka  │──────▶│ Kafka Connect with   │──────▶│ Target Kafka  │
│   Cluster     │       │ MirrorMaker 2       │       │   Cluster     │
│ (Topics, Data)│       │ (Source & Sink      │       │ (Topics, Data)│
└───────────────┘       │  Connectors, Offset │       └───────────────┘
                        │  Sync, Metadata)    │
                        └─────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does MirrorMaker 2 automatically failover clients to the target cluster? Commit yes or no.

Common Belief:MirrorMaker 2 automatically switches consumers and producers to the backup cluster if the source fails.

Tap to reveal reality

Quick: Does MirrorMaker 2 guarantee zero message duplication? Commit yes or no.

Common Belief:MirrorMaker 2 guarantees no duplicate messages in the target cluster.

Tap to reveal reality

Quick: Can MirrorMaker 2 replicate data between different Kafka versions without issues? Commit yes or no.

Common Belief:MirrorMaker 2 works seamlessly between any Kafka cluster versions.

Tap to reveal reality

Quick: Is MirrorMaker 2 just a simple message copier? Commit yes or no.

Common Belief:MirrorMaker 2 just copies messages from one cluster to another without extra processing.

Tap to reveal reality

Expert Zone

1

MirrorMaker 2's internal offset sync topics are critical for exactly-once semantics but can cause lag if not monitored.

2

Topic renaming is powerful but can cause consumer confusion if not coordinated with client applications.

3

Running MirrorMaker 2 in distributed mode improves fault tolerance but requires careful resource planning to avoid bottlenecks.

When NOT to use

MirrorMaker 2 is not suitable for real-time active-active cluster setups requiring synchronous replication. For such use cases, consider Kafka's native cluster linking or external geo-replication tools. Also, for simple one-time migrations, manual export/import may be simpler.

Production Patterns

In production, MirrorMaker 2 is often used for disaster recovery setups, geo-replication to bring data closer to users, and migration between Kafka versions or cloud providers. It is integrated with monitoring tools to track replication lag and failures, and combined with orchestration systems to automate failover.

Connections

Kafka Connect

MirrorMaker 2 is built on top of Kafka Connect framework.

Understanding Kafka Connect's connector model clarifies how MirrorMaker 2 achieves scalable and pluggable replication.

Disaster Recovery Planning

MirrorMaker 2 supports disaster recovery by replicating data across clusters.

Knowing disaster recovery principles helps design effective replication and failover strategies using MirrorMaker 2.

Supply Chain Logistics

Both involve moving goods or data reliably between locations with tracking and error handling.

Recognizing replication as a logistics problem highlights the importance of tracking, ordering, and conflict resolution.

Common Pitfalls

#1Assuming MirrorMaker 2 handles client failover automatically.

Wrong approach:Start MirrorMaker 2 and expect consumers to switch clusters without configuration.

Correct approach:Use external orchestration or client configuration to handle failover alongside MirrorMaker 2 replication.

Root cause:Misunderstanding MirrorMaker 2's role as data replicator only, not a failover manager.

#2Not configuring topic renaming when source and target clusters share topic names.

Wrong approach:"topics": "*" without renaming in MirrorMaker 2 config, causing conflicts.

Correct approach:"topics": "*", with "topic.rename.format" to avoid name clashes.

Root cause:Overlooking topic name conflicts leads to data overwrites or replication errors.

#3Ignoring Kafka version compatibility between clusters.

Wrong approach:Replicating between Kafka 2.2 and Kafka 3.4 clusters without testing.

Correct approach:Verify and upgrade Kafka clusters to compatible versions before replication.

Root cause:Assuming all Kafka versions are interoperable causes replication failures.

Key Takeaways

MirrorMaker 2 is a Kafka Connect-based tool that replicates data between Kafka clusters reliably and flexibly.

It manages not just message copying but also offset syncing and topic renaming to maintain data consistency.

MirrorMaker 2 supports disaster recovery and geo-replication but does not handle automatic client failover.

Proper configuration and monitoring are essential to avoid common pitfalls like topic conflicts and version mismatches.

Understanding MirrorMaker 2's internals and limits helps design robust multi-cluster Kafka architectures.