Why Kafka exists - Performance Analysis
We want to understand why Kafka was created by looking at how it handles data flow over time.
What question are we trying to answer? How does Kafka manage many messages efficiently as the amount of data grows?
Analyze the time complexity of this simple Kafka producer and consumer interaction.
// Producer sends messages to a topic
producer.send(new ProducerRecord(topic, key, value));
// Consumer polls messages from the topic
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
process(record.value());
}
This code shows how messages are sent and received in Kafka, representing the core data flow.
Look at what repeats as data grows.
- Primary operation: The consumer loops over all messages received in each poll.
- How many times: Once per message in the batch, which depends on how many messages the producer sent.
As more messages are sent, the consumer has more to process each time it polls.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Processes 10 messages |
| 100 | Processes 100 messages |
| 1000 | Processes 1000 messages |
Pattern observation: The work grows directly with the number of messages; more messages mean more processing time.
Time Complexity: O(n)
This means the time to process messages grows linearly with the number of messages.
[X] Wrong: "Kafka processes all messages instantly regardless of how many there are."
[OK] Correct: Each message must be handled one by one, so more messages take more time.
Understanding how Kafka handles growing data helps you explain real-world systems that manage streams of information efficiently.
"What if the consumer processed messages in parallel? How would the time complexity change?"