0
0
Hadoopdata~5 mins

MapReduce job execution flow in Hadoop - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: MapReduce job execution flow
O(n)
Understanding Time Complexity

We want to understand how the time to run a MapReduce job changes as the input data grows.

How does the job's execution time grow when we add more data to process?

Scenario Under Consideration

Analyze the time complexity of the following MapReduce job execution flow.

// Simplified MapReduce job flow
Job job = new Job(conf, "word count");
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(inputPath));
FileOutputFormat.setOutputPath(job, new Path(outputPath));
job.waitForCompletion(true);

This code sets up and runs a MapReduce job that counts words in input files.

Identify Repeating Operations
  • Primary operation: The Mapper processes each input record once.
  • How many times: Once per input record, so as many times as there are records.
  • Secondary operation: The Reducer processes each unique key and its values.
  • How many times: Once per unique key, which depends on data.
How Execution Grows With Input

As input data grows, the Mapper runs more times, once per record.

Input Size (n)Approx. Operations
10About 10 map operations
100About 100 map operations
1000About 1000 map operations

Pattern observation: The number of map operations grows directly with input size.

Final Time Complexity

Time Complexity: O(n)

This means the job's running time grows roughly in direct proportion to the amount of input data.

Common Mistake

[X] Wrong: "The Reducer runs once regardless of input size."

[OK] Correct: The Reducer runs once per unique key, so if input grows and produces more unique keys, the Reducer work also grows.

Interview Connect

Understanding how MapReduce scales with data size helps you explain how big data jobs handle growth efficiently.

Self-Check

"What if we added a Combiner step before the Reducer? How would the time complexity change?"