YARN vs MapReduce v1 in Hadoop - Performance Comparison
We want to understand how the time to run jobs changes when using YARN compared to MapReduce v1.
How does the system handle tasks as the data size grows?
Analyze the time complexity of the resource management in these simplified code snippets.
// MapReduce v1
while (job not finished) {
allocate slots;
run map tasks;
run reduce tasks;
}
// YARN
while (job not finished) {
request containers;
run tasks in containers;
monitor and reallocate;
}
These snippets show how each system manages resources and tasks during job execution.
Look at what repeats as the job runs.
- Primary operation: Looping over tasks to allocate resources and run them.
- How many times: Once per task batch until all tasks finish.
As data size grows, more tasks need resources and time.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 task allocations and runs |
| 100 | About 100 task allocations and runs |
| 1000 | About 1000 task allocations and runs |
Pattern observation: The number of operations grows roughly in direct proportion to the number of tasks.
Time Complexity: O(n)
This means the time grows linearly with the number of tasks or data size.
[X] Wrong: "YARN always runs faster because it is newer and more complex."
[OK] Correct: Both systems handle tasks in loops; YARN improves resource use but the total work still grows linearly with data size.
Understanding how resource management affects time helps you explain system design choices clearly and confidently.
What if YARN could run multiple tasks in parallel containers without waiting? How would the time complexity change?