Memory and container sizing in Hadoop - Time & Space Complexity
We want to understand how the size of memory and containers affects the time it takes for Hadoop jobs to run.
How does changing memory or container size change the work done over time?
Analyze the time complexity of the following Hadoop container allocation snippet.
// Request container with specific memory
Configuration conf = new Configuration();
conf.setInt("yarn.scheduler.maximum-allocation-mb", 4096);
conf.setInt("mapreduce.map.memory.mb", 2048);
conf.setInt("mapreduce.reduce.memory.mb", 4096);
// Launch container with requested memory
ContainerLaunchContext ctx = ContainerLaunchContext.newInstance();
// Container runs map or reduce task
This code sets memory sizes for containers that run map and reduce tasks in Hadoop.
Look at what repeats during job execution.
- Primary operation: Running map and reduce tasks inside containers.
- How many times: Once per task, repeated for all tasks in the job.
When input data grows, more tasks run, so more containers launch.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 containers launched |
| 100 | 100 containers launched |
| 1000 | 1000 containers launched |
Pattern observation: Operations grow roughly linearly with input size because each input chunk needs a container.
Time Complexity: O(n)
This means the total work grows in direct proportion to the number of input splits or tasks.
[X] Wrong: "Increasing container memory always makes the job run faster."
[OK] Correct: More memory per container can reduce parallelism if total cluster memory is fixed, causing longer total time.
Understanding how container memory sizing affects job time helps you design efficient Hadoop jobs and shows you can think about resource use and scaling.
"What if we doubled the container memory but kept total cluster memory the same? How would the time complexity change?"