What is Reducer in MapReduce in Hadoop: Explained Simply
Reducer is a component that processes and summarizes the output from the Mapper phase by aggregating data with the same key. It takes intermediate key-value pairs, combines them, and produces the final output for analysis.How It Works
Imagine you have a big pile of data, like a list of words from many books, and you want to count how many times each word appears. The Mapper breaks this big job into smaller pieces and tags each word with the number 1. But these pieces are scattered and mixed up.
The Reducer acts like a sorter and counter. It collects all the same words from the mappers, groups them together, and adds up their counts. This way, it turns many small pieces of data into a clear summary, like a final word count.
In simple terms, the reducer takes the shuffled data from mappers, processes each group of keys, and outputs the combined result. This step is essential to get meaningful answers from large datasets.
Example
This example shows a simple reducer in Java for Hadoop that sums counts for each word key.
import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }
When to Use
Use a reducer when you need to aggregate or summarize large amounts of data after mapping. For example, counting word frequencies, calculating averages, or combining logs by user ID.
In real life, companies use reducers to analyze big data like sales totals per region, website visits per day, or sensor data summaries. It helps turn raw data into useful insights efficiently.
Key Points
- The reducer processes grouped data from mappers by key.
- It combines values to produce summarized output.
- Reducers are essential for final data aggregation in MapReduce.
- They help handle big data by breaking tasks into smaller, manageable parts.