0
0
KafkaComparisonBeginner · 3 min read

Earliest vs Latest Offset in Kafka: Key Differences and Usage

In Kafka, the earliest offset points to the oldest available message in a partition, while the latest offset points to the position after the newest message. Consumers using earliest start reading from the beginning of the log, and those using latest start reading only new messages arriving after subscription.
⚖️

Quick Comparison

This table summarizes the main differences between the earliest and latest offsets in Kafka.

AspectEarliest OffsetLatest Offset
DefinitionPoints to the oldest available message in a partitionPoints to the position after the newest message in a partition
Consumer Start PositionReads from the beginning of the logReads only new messages arriving after subscription
Use CaseReplay all messages or start fresh from startProcess only new incoming data
Offset ValueSmallest offset number availableOffset number one greater than the last message
Behavior on Empty TopicWaits for first message to appearWaits for new messages to arrive
⚖️

Key Differences

The earliest offset in Kafka refers to the smallest offset number currently retained in a partition. When a consumer sets its offset to earliest, it starts reading messages from the very beginning of the available log. This is useful when you want to process all existing data, such as during initial data loading or replaying events.

On the other hand, the latest offset points to the position just after the newest message in the partition. Consumers starting at latest will ignore all past messages and only receive messages produced after they start consuming. This is ideal for real-time processing where only new data matters.

Both offsets are used as special reset points when a consumer group has no committed offset or when offsets are out of range. Choosing between them depends on whether you want to process historical data (earliest) or only new incoming data (latest).

💻

Earliest Offset Code Example

This example shows how to configure a Kafka consumer in Java to start reading from the earliest offset.

java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "example-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "earliest");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("my-topic"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("Offset = %d, Key = %s, Value = %s\n", record.offset(), record.key(), record.value());
    }
}
Output
Offset = 0, Key = key1, Value = message1 Offset = 1, Key = key2, Value = message2 ... (all messages from start)
↔️

Latest Offset Equivalent

This example shows how to configure a Kafka consumer in Java to start reading from the latest offset.

java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "example-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "latest");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("my-topic"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("Offset = %d, Key = %s, Value = %s\n", record.offset(), record.key(), record.value());
    }
}
Output
(No output until new messages arrive after consumer starts)
🎯

When to Use Which

Choose earliest offset when you want to process all existing messages in a topic, such as during initial data ingestion, debugging, or replaying events.

Choose latest offset when you only want to consume new messages arriving after your consumer starts, ideal for real-time streaming and live data processing.

Using the correct offset reset strategy ensures your application processes data as intended without missing or duplicating messages.

Key Takeaways

The earliest offset starts reading from the oldest available message in Kafka partitions.
The latest offset starts reading only new messages produced after the consumer begins.
Use earliest to replay or process all data; use latest for real-time, new data processing.
Set the consumer property 'auto.offset.reset' to 'earliest' or 'latest' to control this behavior.
Choosing the right offset prevents data loss or duplicate processing in Kafka consumers.