Consumer API basics in Kafka - Time & Space Complexity
When using Kafka's Consumer API, it's important to know how the time to process messages changes as the number of messages grows.
We want to understand how the consumer's work scales when reading many messages.
Analyze the time complexity of the following Kafka consumer code snippet.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("my-topic"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.value());
}
}
This code connects to Kafka, subscribes to a topic, and continuously polls for new messages, processing each one.
Look at what repeats as the consumer runs.
- Primary operation: Polling messages and iterating over each message in the batch.
- How many times: The poll loop runs indefinitely, and inside each poll, the for-loop runs once per message received.
As the number of messages increases, the consumer processes more messages each poll.
| Input Size (messages per poll) | Approx. Operations (prints) |
|---|---|
| 10 | 10 |
| 100 | 100 |
| 1000 | 1000 |
Pattern observation: The work grows directly with the number of messages received each time.
Time Complexity: O(n)
This means the time to process messages grows linearly with the number of messages received.
[X] Wrong: "Polling once will always take the same time regardless of messages."
[OK] Correct: The poll call returns a batch of messages, and processing each message takes time, so more messages mean more work.
Understanding how message processing time grows helps you design efficient consumers and shows you can reason about real-world streaming data.
"What if we processed messages in parallel inside the poll loop? How would that affect the time complexity?"