0
0
KafkaDebug / FixIntermediate · 4 min read

How to Handle Duplicate Messages in Kafka Effectively

Duplicate messages in Kafka happen because producers or consumers may retry sending or processing messages. To handle duplicates, use idempotent producers and exactly-once semantics (EOS) with Kafka transactions, or implement consumer-side deduplication by tracking processed message IDs.
🔍

Why This Happens

Duplicates occur because Kafka producers or consumers may retry sending or processing messages when they don't get an acknowledgment or commit confirmation. Network issues or crashes can cause the same message to be sent or processed multiple times.

For example, a producer without idempotence enabled can send the same message twice if it retries after a failure.

java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// Idempotence NOT enabled
KafkaProducer<String, String> producer = new KafkaProducer<>(props);

ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key1", "value1");
producer.send(record);
producer.send(record); // duplicate send due to retry or bug
producer.close();
Output
Message 'value1' sent twice to 'my-topic' causing duplicates in the topic.
🔧

The Fix

Enable idempotent producer to avoid duplicate sends on retries. Use enable.idempotence=true in producer config. For exactly-once processing, use Kafka transactions to commit both producer and consumer offsets atomically.

Alternatively, implement consumer-side deduplication by storing processed message IDs in a database or cache and skipping duplicates.

java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("enable.idempotence", true); // Enable idempotence

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key1", "value1");
producer.send(record);
producer.send(record); // safe retry, no duplicates
producer.close();
Output
Message 'value1' sent once to 'my-topic' even if send is retried.
🛡️

Prevention

To prevent duplicates in Kafka:

  • Always enable enable.idempotence=true for producers.
  • Use Kafka transactions for exactly-once processing when consuming and producing.
  • Implement consumer-side deduplication by tracking message keys or IDs.
  • Design messages with unique IDs to identify duplicates easily.
  • Monitor and handle retries carefully in your application logic.
⚠️

Related Errors

Other common issues related to duplicates include:

  • Offset commit failures: If consumer offsets are not committed properly, messages may be reprocessed.
  • At-least-once delivery: Default Kafka guarantees may cause duplicates if your app does not handle them.
  • Producer retries without idempotence: Can cause duplicate messages.

Fixes usually involve enabling idempotence, using transactions, and careful offset management.

Key Takeaways

Enable idempotent producers with 'enable.idempotence=true' to avoid duplicate sends.
Use Kafka transactions for exactly-once processing to atomically commit messages and offsets.
Implement consumer-side deduplication by tracking processed message IDs or keys.
Design messages with unique IDs to easily detect duplicates.
Monitor retries and offset commits carefully to prevent reprocessing duplicates.