0
0
Hadoopdata~5 mins

Lambda architecture (batch + streaming) in Hadoop - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Lambda architecture (batch + streaming)
O(n)
Understanding Time Complexity

We want to understand how the time needed to process data grows in a Lambda architecture using Hadoop.

Specifically, how batch and streaming parts affect the total work as data size increases.

Scenario Under Consideration

Analyze the time complexity of the following Hadoop code snippet for batch and streaming layers.

// Batch layer: process large data in Hadoop MapReduce
job.setMapperClass(BatchMapper.class);
job.setReducerClass(BatchReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

// Speed layer: process streaming data with small batches
StreamingJob streamingJob = new StreamingJob();
streamingJob.setMapperClass(StreamMapper.class);
streamingJob.setReducerClass(StreamReducer.class);
streamingJob.run();

This code runs two parts: batch jobs on big data sets and streaming jobs on small, fast data chunks.

Identify Repeating Operations

Look at the loops and repeated processing steps.

  • Primary operation: Batch layer runs MapReduce over all data once per batch.
  • How many times: Batch runs periodically on full data; streaming runs continuously on small data chunks.
How Execution Grows With Input

Batch layer time grows with total data size, streaming layer time grows with incoming data rate.

Input Size (n)Approx. Batch OperationsApprox. Streaming Operations
10 GB10 units1 unit per small chunk
100 GB100 units1 unit per small chunk
1000 GB1000 units1 unit per small chunk

Batch work grows linearly with total data size; streaming work depends on how fast data arrives, not total size.

Final Time Complexity

Time Complexity: O(n)

This means the batch processing time grows linearly with the size of the data processed.

Common Mistake

[X] Wrong: "Streaming layer time grows the same as batch because it processes all data."

[OK] Correct: Streaming processes small, recent data chunks continuously, not the entire dataset, so its time depends on data arrival rate, not total size.

Interview Connect

Understanding how batch and streaming parts scale helps you explain real-world data processing systems clearly and confidently.

Self-Check

"What if the streaming layer started processing larger batches instead of small chunks? How would the time complexity change?"