0
0
Hadoopdata~5 mins

Word count as MapReduce example in Hadoop - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Word count as MapReduce example
O(n)
Understanding Time Complexity

We want to understand how the time needed to count words grows as the input text gets bigger.

How does the number of words affect the work done by MapReduce?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

public class WordCount {
  public static class Mapper extends org.apache.hadoop.mapreduce.Mapper<Object, Text, Text, IntWritable> {
    public void map(Object key, Text value, Context context) {
      String[] words = value.toString().split("\\s+");
      for (String word : words) {
        context.write(new Text(word), new IntWritable(1));
      }
    }
  }

  public static class Reducer extends org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable> {
    public void reduce(Text key, Iterable<IntWritable> values, Context context) {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      context.write(key, new IntWritable(sum));
    }
  }
}

This code counts how many times each word appears in the input text using MapReduce.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Loop over each word in the input text in the Mapper, and loop over counts for each word in the Reducer.
  • How many times: The Mapper loops once per word in the input; the Reducer loops once per unique word's counts.
How Execution Grows With Input

As the number of words grows, the Mapper processes each word once, so work grows directly with input size.

Input Size (n words)Approx. Operations
10About 10 map operations + reduce on unique words
100About 100 map operations + reduce on unique words
1000About 1000 map operations + reduce on unique words

Pattern observation: The total work grows roughly in direct proportion to the number of words.

Final Time Complexity

Time Complexity: O(n)

This means the time to count words grows linearly with the number of words in the input.

Common Mistake

[X] Wrong: "The reduce step takes as much time as the map step because it loops over all words again."

[OK] Correct: The reduce step only loops over counts for each unique word, which is usually much smaller than total words.

Interview Connect

Understanding how MapReduce scales with input size helps you explain distributed processing clearly and confidently.

Self-Check

"What if the input text had many repeated words? How would that affect the time complexity of the reduce step?"