0
0
Hadoopdata~10 mins

YARN vs MapReduce v1 in Hadoop - Visual Side-by-Side Comparison

Choose your learning style9 modes available
Concept Flow - YARN vs MapReduce v1
Job Submission
MapReduce v1
JobTracker manages resources & tasks
TaskTrackers run map/reduce tasks
Job completes
Job Submission
YARN Architecture
ResourceManager manages cluster resources
NodeManagers run containers
ApplicationMaster manages job tasks
Job completes
Shows the flow of job execution in MapReduce v1 with JobTracker and TaskTrackers versus YARN with ResourceManager, NodeManagers, and ApplicationMaster.
Execution Sample
Hadoop
# Pseudocode for job flow
submit_job()
if system == 'MapReduce v1':
  JobTracker.assign_tasks()
  TaskTrackers.run_tasks()
else:
  ResourceManager.allocate_resources()
  ApplicationMaster.manage_tasks()
  NodeManagers.run_containers()
This pseudocode shows how job submission differs between MapReduce v1 and YARN.
Execution Table
StepComponentActionResult
1UserSubmit jobJob request sent
2MapReduce v1: JobTrackerReceive jobSchedules tasks and manages cluster
3MapReduce v1: TaskTrackersRun map/reduce tasksExecute tasks and report status
4MapReduce v1: JobTrackerMonitor tasksTrack progress and handle failures
5MapReduce v1Job completesOutput data ready
6UserSubmit jobJob request sent
7YARN: ResourceManagerReceive jobAllocates cluster resources
8YARN: ApplicationMasterManage job tasksCoordinates task execution
9YARN: NodeManagersRun containersExecute tasks in containers
10YARN: ApplicationMasterMonitor tasksTrack progress and handle failures
11YARNJob completesOutput data ready
💡 Job completes after all tasks finish successfully in both systems
Variable Tracker
ComponentInitial StateAfter Job SubmissionDuring ExecutionFinal State
JobTrackerIdleReceives jobSchedules and monitors tasksIdle
TaskTrackersIdleWaiting for tasksRunning map/reduce tasksIdle
ResourceManagerIdleReceives jobAllocates resourcesIdle
ApplicationMasterNot runningStarts for jobManages tasksStops after job
NodeManagersIdleReadyRun containersIdle
Key Moments - 3 Insights
Why does MapReduce v1 have a single JobTracker managing everything?
In MapReduce v1, the JobTracker is the central point for resource management and task scheduling, which can cause bottlenecks as shown in steps 2-4 of the execution_table.
How does YARN improve resource management compared to MapReduce v1?
YARN separates resource management (ResourceManager) from job management (ApplicationMaster), distributing responsibilities and improving scalability, as seen in steps 7-10.
What role do NodeManagers play in YARN that TaskTrackers played in MapReduce v1?
NodeManagers in YARN run containers to execute tasks, similar to TaskTrackers running map/reduce tasks, but with more flexibility and better resource isolation (steps 9 vs 3).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, which component in MapReduce v1 schedules tasks?
AJobTracker
BResourceManager
CTaskTrackers
DApplicationMaster
💡 Hint
Check step 2 in the execution_table where scheduling is mentioned.
At which step does YARN allocate cluster resources?
AStep 9
BStep 8
CStep 7
DStep 10
💡 Hint
Look at the action 'Allocates cluster resources' in the execution_table.
If the ApplicationMaster fails during execution, which step in YARN is affected?
AStep 7
BStep 10
CStep 9
DStep 8
💡 Hint
Refer to the monitoring and failure handling step for ApplicationMaster in the execution_table.
Concept Snapshot
YARN vs MapReduce v1:
- MapReduce v1 uses JobTracker (single manager) and TaskTrackers (workers).
- YARN splits roles: ResourceManager (resources), ApplicationMaster (job), NodeManagers (containers).
- YARN improves scalability and resource use.
- JobTracker is a bottleneck in v1; YARN distributes control.
- Both systems run map and reduce tasks but differ in architecture.
Full Transcript
This visual execution compares YARN and MapReduce v1 job flows. In MapReduce v1, the JobTracker manages both resources and tasks, while TaskTrackers run the actual map and reduce tasks. This centralization can cause bottlenecks. YARN improves this by splitting responsibilities: the ResourceManager handles cluster resources, the ApplicationMaster manages job tasks, and NodeManagers run containers to execute tasks. The execution table shows each step from job submission to completion in both systems. Variable tracking highlights how components change state during execution. Key moments clarify common confusions about the roles of JobTracker and ApplicationMaster. The quiz tests understanding of which components perform key actions at specific steps. Overall, YARN offers better scalability and resource management than MapReduce v1.