0
0
Hadoopdata~5 mins

Migration from Hadoop to cloud-native - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Migration from Hadoop to cloud-native
O(n)
Understanding Time Complexity

When moving data processing from Hadoop to cloud-native systems, it is important to understand how the time to complete tasks changes.

We want to know how the work grows as data size grows during this migration.

Scenario Under Consideration

Analyze the time complexity of the following Hadoop MapReduce job.

// Hadoop MapReduce example
public class WordCount {
  public static class Mapper extends Mapper {
    public void map(LongWritable key, Text value, Context context) {
      for (String word : value.toString().split(" ")) {
        context.write(new Text(word), new IntWritable(1));
      }
    }
  }

  public static class Reducer extends Reducer {
    public void reduce(Text key, Iterable values, Context context) {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      context.write(key, new IntWritable(sum));
    }
  }
}

This code counts how many times each word appears in a large text dataset using Hadoop MapReduce.

Identify Repeating Operations

Look at the loops that repeat work:

  • Primary operation: The Mapper loops over every word in each input line.
  • How many times: Once for every word in the input data.
  • Secondary operation: The Reducer loops over all counts for each word to sum them.
  • How many times: Once for each occurrence of each word.
How Execution Grows With Input

As the input data size grows, the number of words grows roughly in the same way.

Input Size (n words)Approx. Operations
10About 10 word processing steps
100About 100 word processing steps
1000About 1000 word processing steps

Pattern observation: The work grows directly with the number of words; doubling words doubles work.

Final Time Complexity

Time Complexity: O(n)

This means the time to process grows in a straight line with the amount of data.

Common Mistake

[X] Wrong: "Migrating to cloud-native will always make processing faster regardless of data size."

[OK] Correct: The time still depends on data size; cloud tools may be faster per unit but the overall growth with data remains linear.

Interview Connect

Understanding how data size affects processing time helps you explain migration benefits clearly and shows you grasp core data engineering concepts.

Self-Check

"What if we changed the Reducer to combine counts before sending data? How would the time complexity change?"