0
0
Kafkadevops~7 mins

Geo-replication strategies in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
Geo-replication helps copy data between Kafka clusters in different locations. It keeps data available and consistent even if one site fails or is slow.
When you want to keep Kafka data available across multiple data centers for disaster recovery.
When users in different regions need fast access to local Kafka data copies.
When you want to balance load by replicating topics to clusters closer to users.
When you need to migrate Kafka data from one cluster to another without downtime.
When you want to ensure data durability by having copies in separate geographic locations.
Config File - replicator.properties
replicator.properties
bootstrap.servers=clusterA.kafka.local:9092
replication.policy.class=org.apache.kafka.connect.mirror.MirrorReplicationPolicy
replication.factor=3
topics=important-topic,logs
source.cluster.alias=clusterA
target.cluster.alias=clusterB
emit.heartbeats.enabled=true
heartbeat.interval.seconds=30
sync.topic.acls.enabled=true

This configuration file sets up MirrorMaker 2 for geo-replication between two Kafka clusters.

bootstrap.servers: Connects to the source Kafka cluster.

replication.policy.class: Defines the replication strategy used.

replication.factor: Number of replicas for mirrored topics in the target cluster.

topics: Lists topics to replicate.

source.cluster.alias and target.cluster.alias: Names for source and target clusters.

emit.heartbeats.enabled and heartbeat.interval.seconds: Enable and set frequency for heartbeat messages to monitor replication health.

sync.topic.acls.enabled: Ensures access control lists are synced between clusters.

Commands
Starts the MirrorMaker 2 tool using the configuration file to begin replicating topics from the source to the target Kafka cluster.
Terminal
connect-mirror-maker replicator.properties
Expected OutputExpected
INFO Starting MirrorMaker 2 INFO Connecting to source cluster clusterA.kafka.local:9092 INFO Connecting to target cluster clusterB.kafka.local:9092 INFO Replication started for topics: important-topic, logs INFO Heartbeats enabled every 30 seconds
Lists all topics in the target cluster to verify that the topics have been replicated successfully.
Terminal
kafka-topics --bootstrap-server clusterB.kafka.local:9092 --list
Expected OutputExpected
important-topic logs
--bootstrap-server - Specifies the Kafka cluster to connect to.
--list - Lists all topics in the cluster.
Checks the status of the MirrorMaker 2 consumer group on the target cluster to monitor replication lag and health.
Terminal
kafka-consumer-groups --bootstrap-server clusterB.kafka.local:9092 --describe --group mirror_maker_2
Expected OutputExpected
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID mirror_maker_2 important-topic 0 1500 1500 0 consumer-1 /0 mm2-client mirror_maker_2 logs 0 3000 3000 0 consumer-1 /0 mm2-client
--describe - Shows detailed information about the consumer group.
--group - Specifies the consumer group to describe.
Key Concept

If you remember nothing else from geo-replication, remember: MirrorMaker 2 copies topics between Kafka clusters to keep data consistent and available across locations.

Common Mistakes
Not specifying the correct source and target cluster aliases in the configuration.
MirrorMaker 2 won't know which clusters to replicate between, causing replication to fail.
Always set source.cluster.alias and target.cluster.alias correctly to match your Kafka clusters.
Forgetting to include the topics list or using an incorrect topic name.
No topics or wrong topics will be replicated, so data won't appear in the target cluster.
Specify the exact topic names you want to replicate in the topics property.
Not monitoring the consumer group lag for MirrorMaker 2.
Replication might be delayed or stuck without your knowledge, risking data loss or inconsistency.
Regularly check the MirrorMaker 2 consumer group lag using kafka-consumer-groups command.
Summary
Create a MirrorMaker 2 configuration file to define source and target Kafka clusters and topics to replicate.
Start MirrorMaker 2 with the configuration to begin geo-replication.
Verify replicated topics exist on the target cluster and monitor replication health using consumer group status.