0
0
HadoopConceptBeginner · 3 min read

What is Mapper in MapReduce in Hadoop: Explained Simply

In Hadoop MapReduce, a mapper is the first processing step that takes input data and converts it into key-value pairs for further processing. It reads raw data, processes it line by line, and outputs intermediate data for the reducer to aggregate.
⚙️

How It Works

Think of the mapper as a worker who reads a big book line by line and writes down important notes as pairs of words and counts. It breaks down large data into smaller pieces that are easier to handle. Each mapper works independently on a chunk of data, making the process fast and scalable.

In MapReduce, the mapper reads input data from storage, processes each record, and outputs key-value pairs. These pairs are then shuffled and sorted before being sent to reducers. This division of work helps process huge datasets efficiently, like sorting mail by zip code before delivery.

💻

Example

This example shows a simple mapper in Java that counts words from input text lines. It outputs each word with a count of 1.

java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] words = line.split("\\s+");
        for (String w : words) {
            if (!w.isEmpty()) {
                word.set(w);
                context.write(word, one);
            }
        }
    }
}
Output
Input: "hello world hello" Output key-value pairs: hello 1 world 1 hello 1
🎯

When to Use

Use a mapper in Hadoop MapReduce when you need to process large datasets by breaking them into smaller pieces. It is ideal for tasks like counting words, filtering data, or transforming raw data into a structured form.

Real-world uses include analyzing server logs, processing social media data, or preparing data for machine learning. The mapper helps distribute the workload across many machines, speeding up big data processing.

Key Points

  • The mapper processes input data line by line and outputs key-value pairs.
  • It runs in parallel on chunks of data to improve speed and scalability.
  • Mapper output is intermediate data sent to reducers for aggregation.
  • Commonly used for filtering, transforming, and preparing data in big data workflows.

Key Takeaways

Mapper is the first step in MapReduce that converts raw input into key-value pairs.
It processes data in parallel chunks to handle large datasets efficiently.
Mapper output is intermediate data used by reducers to produce final results.
Use mappers for tasks like filtering, counting, and transforming big data.
Mappers help distribute workload and speed up data processing in Hadoop.