0
0
Kafkadevops~5 mins

Why multi-datacenter ensures availability in Kafka - Why It Works

Choose your learning style9 modes available
Introduction
When a system runs in multiple data centers, it keeps working even if one data center fails. This setup helps avoid downtime and keeps your services available to users.
When you want your messaging system to keep working even if one data center loses power or network.
When you have users in different regions and want faster access by placing data centers closer to them.
When you want to protect your data from disasters like fires or floods in one location.
When you need to balance load across multiple locations to avoid overloading a single data center.
Commands
This command creates a Kafka topic named 'example-topic' with 3 partitions and a replication factor of 3 across brokers in different data centers to ensure data is copied and available in multiple places.
Terminal
kafka-topics.sh --create --topic example-topic --partitions 3 --replication-factor 3 --bootstrap-server kafka1.example.com:9092,kafka2.example.com:9092,kafka3.example.com:9092
Expected OutputExpected
Created topic example-topic.
--partitions - Sets how many parts the topic is split into for parallel processing.
--replication-factor - Sets how many copies of each partition exist across brokers for fault tolerance.
--bootstrap-server - Specifies the Kafka brokers to connect to for topic creation.
This command shows details about the topic, including partition leaders and replicas, so you can verify data is replicated across data centers.
Terminal
kafka-topics.sh --describe --topic example-topic --bootstrap-server kafka1.example.com:9092
Expected OutputExpected
Topic: example-topic PartitionCount: 3 ReplicationFactor: 3 Configs: Topic: example-topic Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic: example-topic Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: example-topic Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
--describe - Shows detailed information about the topic.
--topic - Specifies which topic to describe.
This command starts a producer to send messages to the topic, demonstrating that data can be written to the multi-datacenter setup.
Terminal
kafka-console-producer.sh --topic example-topic --bootstrap-server kafka1.example.com:9092
Expected OutputExpected
No output (command runs silently)
--topic - Specifies the topic to send messages to.
--bootstrap-server - Specifies the Kafka broker to connect to.
This command reads messages from the topic starting from the beginning, showing that data is available from another data center's broker.
Terminal
kafka-console-consumer.sh --topic example-topic --bootstrap-server kafka2.example.com:9092 --from-beginning --max-messages 1
Expected OutputExpected
test-message
--from-beginning - Reads messages from the start of the topic.
--max-messages - Stops after reading the specified number of messages.
Key Concept

If you remember nothing else from this pattern, remember: replicating data across multiple data centers keeps your system running even if one location fails.

Common Mistakes
Setting replication-factor lower than the number of data centers.
This means some data centers won't have copies, risking data loss if their broker fails.
Set replication-factor equal to or less than the number of brokers available to ensure full replication.
Using only one bootstrap server from a single data center.
If that server or data center goes down, clients cannot connect to Kafka.
List bootstrap servers from multiple data centers to allow clients to connect even if one data center is down.
Not verifying topic replication status after creation.
You might think data is replicated but it is not, risking availability.
Always run 'kafka-topics.sh --describe' to confirm replication and leader assignments.
Summary
Create Kafka topics with replication across multiple data centers to keep data safe and available.
Check topic details to confirm data is properly replicated and leaders are assigned.
Use producers and consumers connected to different data centers to verify availability and data access.