Overview - Consumer API basics

What is it?

The Consumer API in Kafka is a way for applications to read messages from Kafka topics. It allows programs to subscribe to one or more topics and receive data streams in real time. Consumers manage their position in the stream, called offsets, to keep track of which messages they have processed. This API is essential for building systems that react to data as it arrives.

Why it matters

Without the Consumer API, applications would have no structured way to get data from Kafka topics, making real-time data processing impossible. It solves the problem of efficiently and reliably reading continuous streams of data. Without it, systems would struggle to keep up with fast data flows or risk losing messages, leading to outdated or incomplete information.

Where it fits

Before learning the Consumer API, you should understand Kafka basics like topics, partitions, and producers. After mastering the Consumer API, you can explore advanced topics like consumer groups, offset management, and stream processing frameworks that build on this foundation.

Mental Model

Core Idea

The Consumer API lets applications read and track their place in a continuous stream of messages from Kafka topics.

Think of it like...

Imagine a newspaper subscriber who receives daily papers (messages) and keeps a bookmark (offset) to know which page they last read, so they never miss or reread articles.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Kafka Topic 1 │──────▶│ Consumer App  │──────▶│ Processed Data│
│ Partition 0   │       │ (reads stream)│       │ (business use)│
└───────────────┘       └───────────────┘       └───────────────┘
       ▲                      │
       │                      ▼
   Offsets tracked       Commits offsets
   per message batch    to remember position

Build-Up - 7 Steps

1

FoundationWhat is a Kafka Consumer

Concept: Introduces the basic role of a Kafka consumer in reading messages.

A Kafka consumer is a program that connects to Kafka and reads messages from one or more topics. It listens for new messages and processes them as they arrive. Each message has an offset, which is a number that marks its position in the topic partition.

Result

You understand that a consumer reads messages and that each message has a unique position called an offset.

Understanding that consumers read messages sequentially and track offsets is the foundation for reliable data processing.

2

FoundationSubscribing to Topics

3

IntermediateUnderstanding Offsets and Their Management

4

IntermediateConsumer Groups and Load Balancing

5

IntermediatePolling for Messages

6

AdvancedManual Offset Control for Precise Processing

7

ExpertRebalancing and Its Impact on Consumers

Under the Hood

The Consumer API works by maintaining a TCP connection to Kafka brokers and sending fetch requests for assigned partitions. Kafka brokers respond with batches of messages starting from the requested offset. The consumer tracks offsets locally and can commit them back to Kafka's internal __consumer_offsets topic. Group coordination is managed by a group coordinator broker that handles membership and partition assignments using a protocol called the group coordinator protocol.

Why designed this way?

Kafka's Consumer API was designed to handle high-throughput, distributed data streams with fault tolerance. Using offset tracking allows consumers to resume exactly where they left off. The group coordinator protocol enables scalable load balancing among consumers. This design avoids centralized bottlenecks and supports flexible consumption patterns.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Kafka Broker  │◀──────│ Consumer API  │──────▶│ Application   │
│ (stores data) │ fetch │ (fetches data)│ process│ (business logic)
└───────────────┘       └───────────────┘       └───────────────┘
       ▲                      │
       │                      ▼
  __consumer_offsets       Commit offsets
  topic stores offsets     to broker
       │
       ▼
┌───────────────┐
│ Group         │
│ Coordinator   │
│ (manages     │
│ consumer group│
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does Kafka guarantee that each message is delivered exactly once to consumers by default? Commit to yes or no.

Common Belief:Kafka ensures that each message is delivered exactly once to consumers automatically.

Tap to reveal reality

Quick: Can multiple consumers in the same group read the same partition at the same time? Commit to yes or no.

Common Belief:Multiple consumers in the same group can read the same partition simultaneously to speed up processing.

Tap to reveal reality

Quick: Does committing offsets automatically mean messages are fully processed? Commit to yes or no.

Common Belief:Automatic offset commits mean messages are processed and safe to forget.

Tap to reveal reality

Quick: Does consumer rebalancing happen instantly without affecting message processing? Commit to yes or no.

Common Belief:Rebalancing is a quick background task that does not interrupt consumers.

Tap to reveal reality

Expert Zone

1

Offset commits can be asynchronous or synchronous; choosing between them affects latency and reliability tradeoffs.

2

The choice of partition assignment strategy (range, round-robin, sticky) impacts load balancing and message ordering guarantees.

3

Handling rebalance callbacks properly is critical to avoid losing uncommitted offsets or processing duplicates.

When NOT to use

The Consumer API is not suitable when you need complex event processing or transformations; in such cases, Kafka Streams or ksqlDB are better alternatives. Also, for very low-latency or exactly-once semantics, specialized frameworks or external transaction managers may be required.

Production Patterns

In production, consumers often run in groups across multiple servers for scalability. They use manual offset commits after processing batches to ensure reliability. Rebalance listeners handle state cleanup and offset commits to avoid duplicates. Monitoring consumer lag and health is standard practice to detect processing delays.

Connections

Publish-Subscribe Messaging Pattern

The Consumer API implements the subscribe side of the pub-sub pattern.

Understanding pub-sub helps grasp why consumers subscribe to topics and how messages flow from producers to multiple consumers.

Checkpointing in Stream Processing

Offset commits in Kafka consumers are a form of checkpointing to save progress.

Knowing checkpointing concepts from stream processing clarifies why and when consumers commit offsets to avoid reprocessing.

Bookmarking in Reading Apps

Tracking offsets is like bookmarking your place in a book or article.

This cross-domain idea helps understand why consumers must remember their position to continue reading without missing or repeating content.

Common Pitfalls

#1Relying on automatic offset commits without ensuring message processing is complete.

Wrong approach:consumerConfig.put("enable.auto.commit", "true"); // process messages // no manual commit

Correct approach:consumerConfig.put("enable.auto.commit", "false"); // process messages consumer.commitSync();

Root cause:Misunderstanding that automatic commits happen independently of processing completion.

#2Not handling consumer rebalances, causing lost offsets or duplicate processing.

Wrong approach:// No rebalance listener consumer.subscribe(topics);

Correct approach:consumer.subscribe(topics, new ConsumerRebalanceListener() { public void onPartitionsRevoked(Collection partitions) { consumer.commitSync(); } public void onPartitionsAssigned(Collection partitions) {} });

Root cause:Ignoring the rebalance lifecycle and its impact on offset management.

#3Multiple consumers in the same group subscribing to the same partition, expecting parallel reads.

Wrong approach:Two consumers with same group ID manually assigned to same partition.

Correct approach:Let Kafka assign partitions automatically or ensure each partition is assigned to only one consumer.

Root cause:Misunderstanding how Kafka enforces partition ownership within consumer groups.

Key Takeaways

Kafka's Consumer API allows applications to read messages from topics while tracking their position using offsets.

Consumers subscribe to topics and use polling to fetch messages in batches, maintaining group membership through regular polls.

Offset management is crucial to avoid message loss or duplication; manual commits provide precise control over processing state.

Consumer groups enable load balancing by assigning partitions exclusively to one consumer in the group at a time.

Handling rebalances properly is essential to maintain processing continuity and data consistency in production.