In Kafka, the replication factor is a key configuration. What does it control?
Think about data safety and availability in Kafka.
The replication factor sets how many copies of each partition exist on different brokers. This helps with fault tolerance.
Given the following Kafka topic creation command, what will be the replication factor of the topic?
kafka-topics.sh --create --topic test-topic --partitions 2 --replication-factor 3 --bootstrap-server localhost:9092
Replication factor is set explicitly in the command.
The replication factor option sets the number of copies per partition. Here it is 3, so each partition has 3 copies.
Consider a Kafka cluster with 2 brokers. What happens if you try to create a topic with replication factor 3?
Replication factor cannot exceed the number of brokers.
Kafka requires the replication factor to be less than or equal to the number of brokers. Otherwise, topic creation fails.
What is the main benefit of setting a higher replication factor for a Kafka topic?
Think about what happens if a broker fails.
A higher replication factor means more copies of data exist on different brokers, so if one broker fails, data is still available.
You have a Kafka topic with replication factor 5. To ensure strong durability, you want to set the minimum in-sync replicas (min.insync.replicas) to the highest number that still allows the leader to acknowledge writes if one broker fails. What should this value be?
min.insync.replicas must be less than or equal to replication factor minus one to tolerate one failure.
With replication factor 5, setting min.insync.replicas to 4 means the leader requires 4 replicas to acknowledge writes. This allows one broker to fail and still maintain durability.