MapReduce job tuning parameters in Hadoop - Time & Space Complexity
When tuning MapReduce jobs, we want to know how changing parameters affects how long the job takes.
We ask: How does the job's running time grow as input size or settings change?
Analyze the time complexity of this MapReduce job tuning snippet.
// Set number of reducers
job.setNumReduceTasks(numReducers);
// Set input split size
FileInputFormat.setMaxInputSplitSize(job, splitSize);
// Submit job and wait
job.waitForCompletion(true);
This code sets how many reducers run and how big each input split is before running the job.
Look at what repeats during the job:
- Primary operation: Map tasks process input splits in parallel.
- How many times: Number of map tasks depends on input size divided by split size.
- Reduce tasks: Number of reducers controls how many reduce operations run after maps finish.
As input size grows, the number of map tasks grows roughly proportional to input size divided by split size.
| Input Size (GB) | Approx. Map Tasks |
|---|---|
| 10 | 10 / splitSize |
| 100 | 100 / splitSize |
| 1000 | 1000 / splitSize |
Pattern observation: More input means more map tasks, so execution time grows roughly linearly with input size.
Time Complexity: O(n)
This means the job's running time grows roughly in direct proportion to the input size.
[X] Wrong: "Increasing reducers always makes the job faster."
[OK] Correct: Too many reducers can cause overhead and slow down the job instead of speeding it up.
Understanding how tuning parameters affect job time helps you explain performance trade-offs clearly and confidently.
"What if we double the input split size? How would the time complexity change?"