KafkaComparisonBeginner · 3 min read

Earliest vs Latest Offset in Kafka: Key Differences and Usage

In Kafka, the earliest offset points to the oldest available message in a partition, while the latest offset points to the position after the newest message. Consumers using earliest start reading from the beginning of the log, and those using latest start reading only new messages arriving after subscription.

⚖️

Quick Comparison

This table summarizes the main differences between the earliest and latest offsets in Kafka.

Aspect	Earliest Offset	Latest Offset
Definition	Points to the oldest available message in a partition	Points to the position after the newest message in a partition
Consumer Start Position	Reads from the beginning of the log	Reads only new messages arriving after subscription
Use Case	Replay all messages or start fresh from start	Process only new incoming data
Offset Value	Smallest offset number available	Offset number one greater than the last message
Behavior on Empty Topic	Waits for first message to appear	Waits for new messages to arrive

⚖️

Key Differences

The earliest offset in Kafka refers to the smallest offset number currently retained in a partition. When a consumer sets its offset to earliest, it starts reading messages from the very beginning of the available log. This is useful when you want to process all existing data, such as during initial data loading or replaying events.

On the other hand, the latest offset points to the position just after the newest message in the partition. Consumers starting at latest will ignore all past messages and only receive messages produced after they start consuming. This is ideal for real-time processing where only new data matters.

Both offsets are used as special reset points when a consumer group has no committed offset or when offsets are out of range. Choosing between them depends on whether you want to process historical data (earliest) or only new incoming data (latest).

💻

Earliest Offset Code Example

This example shows how to configure a Kafka consumer in Java to start reading from the earliest offset.

java

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "example-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "earliest");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("my-topic"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("Offset = %d, Key = %s, Value = %s\n", record.offset(), record.key(), record.value());
    }
}

Output

Offset = 0, Key = key1, Value = message1 Offset = 1, Key = key2, Value = message2 ... (all messages from start)

↔️

Latest Offset Equivalent

This example shows how to configure a Kafka consumer in Java to start reading from the latest offset.

java

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "example-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "latest");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("my-topic"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("Offset = %d, Key = %s, Value = %s\n", record.offset(), record.key(), record.value());
    }
}

Output

(No output until new messages arrive after consumer starts)

🎯

When to Use Which

Choose earliest offset when you want to process all existing messages in a topic, such as during initial data ingestion, debugging, or replaying events.

Choose latest offset when you only want to consume new messages arriving after your consumer starts, ideal for real-time streaming and live data processing.

Using the correct offset reset strategy ensures your application processes data as intended without missing or duplicating messages.

✅

Key Takeaways

The earliest offset starts reading from the oldest available message in Kafka partitions.

The latest offset starts reading only new messages produced after the consumer begins.

Use earliest to replay or process all data; use latest for real-time, new data processing.

Set the consumer property 'auto.offset.reset' to 'earliest' or 'latest' to control this behavior.

Choosing the right offset prevents data loss or duplicate processing in Kafka consumers.