How to Handle Duplicate Messages in Kafka Effectively
idempotent producers and exactly-once semantics (EOS) with Kafka transactions, or implement consumer-side deduplication by tracking processed message IDs.Why This Happens
Duplicates occur because Kafka producers or consumers may retry sending or processing messages when they don't get an acknowledgment or commit confirmation. Network issues or crashes can cause the same message to be sent or processed multiple times.
For example, a producer without idempotence enabled can send the same message twice if it retries after a failure.
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); // Idempotence NOT enabled KafkaProducer<String, String> producer = new KafkaProducer<>(props); ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key1", "value1"); producer.send(record); producer.send(record); // duplicate send due to retry or bug producer.close();
The Fix
Enable idempotent producer to avoid duplicate sends on retries. Use enable.idempotence=true in producer config. For exactly-once processing, use Kafka transactions to commit both producer and consumer offsets atomically.
Alternatively, implement consumer-side deduplication by storing processed message IDs in a database or cache and skipping duplicates.
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("enable.idempotence", true); // Enable idempotence KafkaProducer<String, String> producer = new KafkaProducer<>(props); ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key1", "value1"); producer.send(record); producer.send(record); // safe retry, no duplicates producer.close();
Prevention
To prevent duplicates in Kafka:
- Always enable
enable.idempotence=truefor producers. - Use Kafka transactions for exactly-once processing when consuming and producing.
- Implement consumer-side deduplication by tracking message keys or IDs.
- Design messages with unique IDs to identify duplicates easily.
- Monitor and handle retries carefully in your application logic.
Related Errors
Other common issues related to duplicates include:
- Offset commit failures: If consumer offsets are not committed properly, messages may be reprocessed.
- At-least-once delivery: Default Kafka guarantees may cause duplicates if your app does not handle them.
- Producer retries without idempotence: Can cause duplicate messages.
Fixes usually involve enabling idempotence, using transactions, and careful offset management.