Earliest vs Latest Offset in Kafka: Key Differences and Usage
earliest offset points to the oldest available message in a partition, while the latest offset points to the position after the newest message. Consumers using earliest start reading from the beginning of the log, and those using latest start reading only new messages arriving after subscription.Quick Comparison
This table summarizes the main differences between the earliest and latest offsets in Kafka.
| Aspect | Earliest Offset | Latest Offset |
|---|---|---|
| Definition | Points to the oldest available message in a partition | Points to the position after the newest message in a partition |
| Consumer Start Position | Reads from the beginning of the log | Reads only new messages arriving after subscription |
| Use Case | Replay all messages or start fresh from start | Process only new incoming data |
| Offset Value | Smallest offset number available | Offset number one greater than the last message |
| Behavior on Empty Topic | Waits for first message to appear | Waits for new messages to arrive |
Key Differences
The earliest offset in Kafka refers to the smallest offset number currently retained in a partition. When a consumer sets its offset to earliest, it starts reading messages from the very beginning of the available log. This is useful when you want to process all existing data, such as during initial data loading or replaying events.
On the other hand, the latest offset points to the position just after the newest message in the partition. Consumers starting at latest will ignore all past messages and only receive messages produced after they start consuming. This is ideal for real-time processing where only new data matters.
Both offsets are used as special reset points when a consumer group has no committed offset or when offsets are out of range. Choosing between them depends on whether you want to process historical data (earliest) or only new incoming data (latest).
Earliest Offset Code Example
This example shows how to configure a Kafka consumer in Java to start reading from the earliest offset.
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "example-group"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("auto.offset.reset", "earliest"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Collections.singletonList("my-topic")); while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { System.out.printf("Offset = %d, Key = %s, Value = %s\n", record.offset(), record.key(), record.value()); } }
Latest Offset Equivalent
This example shows how to configure a Kafka consumer in Java to start reading from the latest offset.
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "example-group"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("auto.offset.reset", "latest"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Collections.singletonList("my-topic")); while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { System.out.printf("Offset = %d, Key = %s, Value = %s\n", record.offset(), record.key(), record.value()); } }
When to Use Which
Choose earliest offset when you want to process all existing messages in a topic, such as during initial data ingestion, debugging, or replaying events.
Choose latest offset when you only want to consume new messages arriving after your consumer starts, ideal for real-time streaming and live data processing.
Using the correct offset reset strategy ensures your application processes data as intended without missing or duplicating messages.