Kafkadevops~10 mins

Disaster recovery planning in Kafka - Commands & Configuration

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Disaster recovery planning helps you prepare for unexpected failures in your Kafka system. It ensures your data and services can be restored quickly after problems like hardware failure or data corruption.

When you want to protect your Kafka data from accidental deletion or corruption.

When you need to recover Kafka topics and messages after a server crash.

When you want to replicate Kafka data to another data center for backup.

When you want to automate restoring Kafka configurations and topics after failure.

When you want to minimize downtime and data loss during Kafka outages.

Config File - server.properties

server.properties

broker.id=1
log.dirs=/var/lib/kafka-logs
zookeeper.connect=localhost:2181
# Enable topic auto-creation
auto.create.topics.enable=true
# Enable log retention for disaster recovery
log.retention.hours=168
# Enable replication for fault tolerance
default.replication.factor=3
min.insync.replicas=2
# Enable log cleaner for compacted topics
log.cleaner.enable=true
# Configure listeners
listeners=PLAINTEXT://:9092
# Configure inter-broker communication
inter.broker.listener.name=PLAINTEXT

This configuration file sets up a Kafka broker with key settings for disaster recovery:

log.retention.hours=168: Keeps logs for 7 days to allow recovery from recent data loss.
default.replication.factor=3: Ensures each topic partition is copied to 3 brokers for fault tolerance.
min.insync.replicas=2: Requires at least 2 replicas to acknowledge writes, preventing data loss.
log.cleaner.enable=true: Enables log compaction for topics where only the latest value per key is kept, useful for recovery.

Commands

This command creates a Kafka topic named 'disaster-recovery-test' with 3 partitions and replication factor 3 to ensure data is copied across brokers for recovery.

Terminal

kafka-topics.sh --create --topic disaster-recovery-test --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

Expected OutputExpected

Created topic disaster-recovery-test.

→

--partitions - Number of partitions for parallelism and fault tolerance

→

--replication-factor - Number of copies of each partition for data safety

This command starts a producer to send messages to the 'disaster-recovery-test' topic to simulate data input.

Terminal

kafka-console-producer.sh --topic disaster-recovery-test --bootstrap-server localhost:9092

Expected OutputExpected

No output (command runs silently)

→

--topic - Specifies the topic to send messages to

This command reads the first 5 messages from the 'disaster-recovery-test' topic to verify data is stored and replicated correctly.

Terminal

kafka-console-consumer.sh --topic disaster-recovery-test --bootstrap-server localhost:9092 --from-beginning --max-messages 5

Expected OutputExpected

message1 message2 message3 message4 message5

→

--from-beginning - Reads messages from the start of the topic

→

--max-messages - Limits the number of messages to read

This command starts a partition reassignment to move partitions to different brokers as part of recovery or maintenance.

Terminal

kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file reassignment.json --execute

Expected OutputExpected

Starting partition reassignment for topic disaster-recovery-test. Reassignment started successfully.

→

--execute - Executes the reassignment described in the JSON file

This command verifies the status of the partition reassignment to ensure it completed successfully.

Terminal

kafka-reassign-partitions.sh --zookeeper localhost:2181 --verify

Expected OutputExpected

Reassignment of partitions is complete.

→

--verify - Checks the progress and completion of reassignment

Key Concept

If you remember nothing else from disaster recovery planning in Kafka, remember: replication and log retention settings are your first line of defense against data loss.

Common Mistakes

Setting replication factor to 1 for topics

This means there is only one copy of data, so if that broker fails, data is lost.

Always set replication factor to at least 2 or 3 for important topics to ensure copies exist.

Not verifying partition reassignment status

You might think reassignment finished but it could be stuck, causing uneven load or data loss risk.

Always run the verify command after reassignment to confirm success.

Not enabling log retention or setting it too low

Kafka deletes old data too soon, making recovery impossible for recent messages.

Set log retention to a reasonable time (e.g., 7 days) to allow recovery window.

Summary

Create topics with replication factor 3 to keep multiple copies of data.

Use producers and consumers to test data flow and verify replication.

Use partition reassignment commands to move data safely during recovery.

Verify reassignment completion to ensure cluster health.

Configure log retention and replication in server.properties for disaster readiness.