Consider a Kafka consumer configured with enable.auto.commit=true and auto.commit.interval.ms=1000. The consumer reads messages continuously. What happens if the consumer crashes after processing some messages but before the next auto-commit?
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test-group"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList("my-topic")); while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { System.out.printf("offset = %d, key = %s, value = %s\n", record.offset(), record.key(), record.value()); // Process message } // No manual commit here }
Think about how auto-commit interval affects when offsets are saved.
With auto-commit enabled, offsets are committed periodically (every 1000 ms here). If the consumer crashes before the next commit, some processed messages may not have their offsets saved, causing reprocessing after restart.
A Kafka consumer is configured with enable.auto.commit=false. The code processes messages but never calls commitSync() or commitAsync(). What is the effect on message processing?
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test-group"); props.put("enable.auto.commit", "false"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList("my-topic")); while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { System.out.printf("offset = %d, key = %s, value = %s\n", record.offset(), record.key(), record.value()); // Process message } // No commit called here }
Consider what happens if offsets are never saved.
When auto-commit is disabled and no manual commit is done, offsets are never saved. After restart, the consumer will re-read messages from the last committed offset, causing duplicates.
Which of the following is the best reason to use manual commit instead of auto-commit in Kafka consumers?
Think about reliability and message processing guarantees.
Manual commit allows the consumer to commit offsets only after messages are fully processed, which helps avoid data loss or duplicate processing. Auto-commit commits offsets periodically regardless of processing success.
Given a Kafka consumer with enable.auto.commit=false, what happens if commitSync() is called before any poll() call?
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test-group"); props.put("enable.auto.commit", "false"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); // Calling commitSync before poll consumer.commitSync();
Consider what commitSync needs before it can commit offsets.
commitSync requires the consumer to have assigned partitions and fetched offsets via poll. Calling it before poll causes IllegalStateException because no offsets are known yet.
You want to process Kafka messages exactly once with manual commit. Which approach below correctly ensures that offsets are committed only after successful processing?
Think about committing offsets only after all messages are processed successfully.
Committing offsets once after processing the entire batch ensures that no message is marked as processed before it actually is. Committing inside the loop risks committing offsets for unprocessed messages if a failure occurs.