YARN vs MapReduce v1 in Hadoop: Key Differences and Usage
YARN is the resource management layer introduced in Hadoop 2 that separates resource management from job scheduling, while MapReduce v1 tightly couples resource management and job execution. YARN improves cluster utilization, scalability, and supports multiple processing models beyond MapReduce.Quick Comparison
This table summarizes the main differences between YARN and MapReduce v1 in Hadoop.
| Feature | MapReduce v1 | YARN |
|---|---|---|
| Architecture | Monolithic with JobTracker managing resources and jobs | Separated ResourceManager and ApplicationMaster for resource and job management |
| Resource Management | JobTracker handles both resource allocation and job scheduling | ResourceManager manages resources; ApplicationMaster manages job scheduling |
| Scalability | Limited; single JobTracker is a bottleneck | Highly scalable; multiple ApplicationMasters run concurrently |
| Fault Tolerance | JobTracker failure affects all jobs | ResourceManager and ApplicationMasters can fail independently without cluster-wide impact |
| Support for Processing Models | Only MapReduce | Supports MapReduce and other models like Spark, Tez |
| Cluster Utilization | Lower due to static resource allocation | Higher due to dynamic resource allocation |
Key Differences
MapReduce v1 uses a single master node called the JobTracker that manages both resource allocation and job scheduling. This design creates a bottleneck and limits scalability because all tasks depend on the JobTracker's availability and capacity.
YARN (Yet Another Resource Negotiator) separates resource management from job scheduling by introducing a ResourceManager and per-application ApplicationMasters. This allows multiple applications to run simultaneously with better resource sharing and fault isolation.
While MapReduce v1 supports only the MapReduce programming model, YARN is a general-purpose resource management platform that supports various processing frameworks like Spark and Tez. This flexibility improves cluster utilization and enables modern big data workloads.
MapReduce v1 Code Example
This is a simple MapReduce v1 word count job in Java.
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; public class WordCount { public static class TokenizerMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } public static class IntSumReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, java.util.Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } result.set(sum); output.collect(key, result); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("word count"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(TokenizerMapper.class); conf.setCombinerClass(IntSumReducer.class); conf.setReducerClass(IntSumReducer.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
YARN Equivalent Code Example
The same word count job runs on YARN but uses the same MapReduce API; the difference is in the cluster management and execution environment.
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCountYARN { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String[] tokens = value.toString().split("\\s+"); for (String token : tokens) { word.set(token); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count on YARN"); job.setJarByClass(WordCountYARN.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
When to Use Which
Choose MapReduce v1 only if you are working with legacy Hadoop clusters that do not support YARN. It is simpler but limited in scalability and flexibility.
Choose YARN for modern Hadoop deployments because it offers better resource management, supports multiple processing frameworks, and scales well for large clusters and diverse workloads.
YARN is the recommended choice for new projects and production environments due to its improved fault tolerance and cluster utilization.