0
0
HadoopComparisonIntermediate · 4 min read

YARN vs MapReduce v1 in Hadoop: Key Differences and Usage

YARN is the resource management layer introduced in Hadoop 2 that separates resource management from job scheduling, while MapReduce v1 tightly couples resource management and job execution. YARN improves cluster utilization, scalability, and supports multiple processing models beyond MapReduce.
⚖️

Quick Comparison

This table summarizes the main differences between YARN and MapReduce v1 in Hadoop.

FeatureMapReduce v1YARN
ArchitectureMonolithic with JobTracker managing resources and jobsSeparated ResourceManager and ApplicationMaster for resource and job management
Resource ManagementJobTracker handles both resource allocation and job schedulingResourceManager manages resources; ApplicationMaster manages job scheduling
ScalabilityLimited; single JobTracker is a bottleneckHighly scalable; multiple ApplicationMasters run concurrently
Fault ToleranceJobTracker failure affects all jobsResourceManager and ApplicationMasters can fail independently without cluster-wide impact
Support for Processing ModelsOnly MapReduceSupports MapReduce and other models like Spark, Tez
Cluster UtilizationLower due to static resource allocationHigher due to dynamic resource allocation
⚖️

Key Differences

MapReduce v1 uses a single master node called the JobTracker that manages both resource allocation and job scheduling. This design creates a bottleneck and limits scalability because all tasks depend on the JobTracker's availability and capacity.

YARN (Yet Another Resource Negotiator) separates resource management from job scheduling by introducing a ResourceManager and per-application ApplicationMasters. This allows multiple applications to run simultaneously with better resource sharing and fault isolation.

While MapReduce v1 supports only the MapReduce programming model, YARN is a general-purpose resource management platform that supports various processing frameworks like Spark and Tez. This flexibility improves cluster utilization and enables modern big data workloads.

💻

MapReduce v1 Code Example

This is a simple MapReduce v1 word count job in Java.

java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

  public static class TokenizerMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(word, one);
      }
    }
  }

  public static class IntSumReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, java.util.Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
        sum += values.next().get();
      }
      result.set(sum);
      output.collect(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("word count");

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(TokenizerMapper.class);
    conf.setCombinerClass(IntSumReducer.class);
    conf.setReducerClass(IntSumReducer.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    JobClient.runJob(conf);
  }
}
Output
word1 3 word2 5 word3 2 ...
💻

YARN Equivalent Code Example

The same word count job runs on YARN but uses the same MapReduce API; the difference is in the cluster management and execution environment.

java
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountYARN {

  public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
      String[] tokens = value.toString().split("\\s+");
      for (String token : tokens) {
        word.set(token);
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count on YARN");
    job.setJarByClass(WordCountYARN.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
Output
word1 3 word2 5 word3 2 ...
🎯

When to Use Which

Choose MapReduce v1 only if you are working with legacy Hadoop clusters that do not support YARN. It is simpler but limited in scalability and flexibility.

Choose YARN for modern Hadoop deployments because it offers better resource management, supports multiple processing frameworks, and scales well for large clusters and diverse workloads.

YARN is the recommended choice for new projects and production environments due to its improved fault tolerance and cluster utilization.

Key Takeaways

YARN separates resource management from job scheduling, improving scalability and flexibility over MapReduce v1.
MapReduce v1 uses a single JobTracker, which limits cluster size and fault tolerance.
YARN supports multiple processing models beyond MapReduce, enabling modern big data applications.
Use MapReduce v1 only for legacy systems; prefer YARN for all new Hadoop deployments.
YARN improves cluster utilization with dynamic resource allocation and better fault isolation.