HadoopComparisonIntermediate · 4 min read

YARN vs MapReduce v1 in Hadoop: Key Differences and Usage

YARN is the resource management layer introduced in Hadoop 2 that separates resource management from job scheduling, while MapReduce v1 tightly couples resource management and job execution. YARN improves cluster utilization, scalability, and supports multiple processing models beyond MapReduce.

⚖️

Quick Comparison

This table summarizes the main differences between YARN and MapReduce v1 in Hadoop.

Feature	MapReduce v1	YARN
Architecture	Monolithic with JobTracker managing resources and jobs	Separated ResourceManager and ApplicationMaster for resource and job management
Resource Management	JobTracker handles both resource allocation and job scheduling	ResourceManager manages resources; ApplicationMaster manages job scheduling
Scalability	Limited; single JobTracker is a bottleneck	Highly scalable; multiple ApplicationMasters run concurrently
Fault Tolerance	JobTracker failure affects all jobs	ResourceManager and ApplicationMasters can fail independently without cluster-wide impact
Support for Processing Models	Only MapReduce	Supports MapReduce and other models like Spark, Tez
Cluster Utilization	Lower due to static resource allocation	Higher due to dynamic resource allocation

⚖️

Key Differences

MapReduce v1 uses a single master node called the JobTracker that manages both resource allocation and job scheduling. This design creates a bottleneck and limits scalability because all tasks depend on the JobTracker's availability and capacity.

YARN (Yet Another Resource Negotiator) separates resource management from job scheduling by introducing a ResourceManager and per-application ApplicationMasters. This allows multiple applications to run simultaneously with better resource sharing and fault isolation.

While MapReduce v1 supports only the MapReduce programming model, YARN is a general-purpose resource management platform that supports various processing frameworks like Spark and Tez. This flexibility improves cluster utilization and enables modern big data workloads.

💻

MapReduce v1 Code Example

This is a simple MapReduce v1 word count job in Java.

java

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

  public static class TokenizerMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(word, one);
      }
    }
  }

  public static class IntSumReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, java.util.Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
        sum += values.next().get();
      }
      result.set(sum);
      output.collect(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("word count");

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(TokenizerMapper.class);
    conf.setCombinerClass(IntSumReducer.class);
    conf.setReducerClass(IntSumReducer.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    JobClient.runJob(conf);
  }
}

Output

word1 3 word2 5 word3 2 ...

💻

YARN Equivalent Code Example

The same word count job runs on YARN but uses the same MapReduce API; the difference is in the cluster management and execution environment.

java

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountYARN {

  public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
      String[] tokens = value.toString().split("\\s+");
      for (String token : tokens) {
        word.set(token);
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count on YARN");
    job.setJarByClass(WordCountYARN.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Output

word1 3 word2 5 word3 2 ...

🎯

When to Use Which

Choose MapReduce v1 only if you are working with legacy Hadoop clusters that do not support YARN. It is simpler but limited in scalability and flexibility.

Choose YARN for modern Hadoop deployments because it offers better resource management, supports multiple processing frameworks, and scales well for large clusters and diverse workloads.

YARN is the recommended choice for new projects and production environments due to its improved fault tolerance and cluster utilization.

✅

Key Takeaways

YARN separates resource management from job scheduling, improving scalability and flexibility over MapReduce v1.

MapReduce v1 uses a single JobTracker, which limits cluster size and fault tolerance.

YARN supports multiple processing models beyond MapReduce, enabling modern big data applications.

Use MapReduce v1 only for legacy systems; prefer YARN for all new Hadoop deployments.

YARN improves cluster utilization with dynamic resource allocation and better fault isolation.