Kappa architecture (streaming only) in Hadoop - Time & Space Complexity
We want to understand how the time needed to process data grows when using Kappa architecture for streaming data.
How does the system handle more data as it keeps coming in?
Analyze the time complexity of the following streaming processing code snippet.
// Pseudocode for streaming data processing in Kappa architecture
stream = readFromKafka(topic)
processedStream = stream.map(record => transform(record))
processedStream.foreachBatch(batch => {
batch.writeToStorage()
})
This code reads data continuously from a stream, transforms each record, and writes batches to storage.
Look at what repeats as data flows in.
- Primary operation: Processing each record in the stream (map transformation).
- How many times: Once per incoming record, continuously as data arrives.
As more records come in, the system processes each one individually.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 transformations |
| 100 | 100 transformations |
| 1000 | 1000 transformations |
Pattern observation: The number of operations grows directly with the number of records.
Time Complexity: O(n)
This means the time to process data grows linearly with the number of incoming records.
[X] Wrong: "Streaming processing time stays the same no matter how much data arrives."
[OK] Correct: Each new record needs processing, so more data means more work and more time.
Understanding how streaming systems scale with data helps you explain real-time data processing clearly and confidently.
"What if the transform step became more complex and took longer per record? How would the time complexity change?"