Stream topology in Kafka - Time & Space Complexity
When working with Kafka stream topology, it's important to understand how the processing time grows as the data flows through the stream steps.
We want to know how the number of operations changes as the input data size increases in the stream.
Analyze the time complexity of the following Kafka Streams topology setup.
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("input-topic");
KStream<String, String> filtered = source.filter((key, value) -> value.contains("important"));
KStream<String, String> mapped = filtered.mapValues(value -> value.toUpperCase());
mapped.to("output-topic");
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
This code creates a stream that reads messages, filters them by a condition, transforms the values, and writes to an output topic.
Look at the operations that happen repeatedly for each message in the stream.
- Primary operation: Processing each message through filter and map steps.
- How many times: Once per message arriving in the input topic.
Each message is processed one by one through the steps, so the total work grows as more messages arrive.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 filter and map operations |
| 100 | About 100 filter and map operations |
| 1000 | About 1000 filter and map operations |
Pattern observation: The work grows linearly with the number of messages.
Time Complexity: O(n)
This means the processing time increases directly in proportion to the number of messages.
[X] Wrong: "The filter and map steps run only once regardless of input size."
[OK] Correct: Each message is processed individually, so these steps happen for every message, making the total work grow with input size.
Understanding how stream processing scales with data size helps you explain system behavior clearly and shows you grasp real-world data flow challenges.
"What if we added a nested loop inside the mapValues step that processes each message multiple times? How would the time complexity change?"