Windowed operations in Kafka - Time & Space Complexity
When working with windowed operations in Kafka, it's important to understand how processing time changes as data grows.
We want to know how the time to process messages changes when using windows.
Analyze the time complexity of the following Kafka Streams windowed aggregation.
KStream<String, Long> stream = builder.stream("input-topic");
KTable<Windowed<String>, Long> windowedCounts = stream
.groupByKey()
.windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
.count();
This code groups messages by key, applies a 5-minute time window, and counts messages per window.
Look at what repeats as data flows through the stream.
- Primary operation: For each incoming message, the system updates the count for its key in the current time window.
- How many times: Once per message, continuously as messages arrive.
As more messages arrive, the system updates counts for each message's window.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 updates to window counts |
| 100 | 100 updates to window counts |
| 1000 | 1000 updates to window counts |
Pattern observation: The number of operations grows directly with the number of messages processed.
Time Complexity: O(n)
This means processing time grows linearly with the number of messages processed in the stream.
[X] Wrong: "Windowed operations process all past messages every time a new message arrives."
[OK] Correct: Actually, each message updates only its own window count, not all past messages, so work grows linearly, not exponentially.
Understanding how windowed operations scale helps you explain real-time data processing clearly and confidently.
What if we changed the window size to 1 hour instead of 5 minutes? How would the time complexity change?