Hadoopdata~10 mins

Map phase explained in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Map phase explained

Input Data Split

↓

Map Function Applied

↓

Process Each Record

↓

Emit Key-Value Pairs

↓

Shuffle and Sort (Prepare for Reduce)

The Map phase takes input data splits, applies the map function to each record, and emits key-value pairs for the next phase.

Execution Sample

Hadoop

def map(record):
    words = record.split()
    for word in words:
        emit(word, 1)

This map function splits a text record into words and emits each word with count 1.

Execution Table

Step	Input Record	Action	Output Key-Value Pairs
1	"hello world"	Split into words ['hello', 'world']	None
2	First word 'hello'	Emit ('hello', 1)	('hello', 1)
3	Second word 'world'	Emit ('world', 1)	('world', 1)
4	Next record 'hello hadoop'	Split into words ['hello', 'hadoop']	None
5	First word 'hello'	Emit ('hello', 1)	('hello', 1)
6	Second word 'hadoop'	Emit ('hadoop', 1)	('hadoop', 1)
7	No more records	Map phase ends	End of map output

💡 All input records processed, map phase completes emitting key-value pairs.

Variable Tracker

Variable	Start	After 1	After 2	After 3	After 4	After 5	After 6	Final
record	None	"hello world"	"hello world"	"hello world"	"hello hadoop"	"hello hadoop"	"hello hadoop"	None
words	None	['hello', 'world']	['hello', 'world']	['hello', 'world']	['hello', 'hadoop']	['hello', 'hadoop']	['hello', 'hadoop']	None
word	None	None	'hello'	'world'	None	'hello'	'hadoop'	None
emitted_pairs	[]	[]	[('hello', 1)]	[('hello', 1), ('world', 1)]	[('hello', 1), ('world', 1)]	[('hello', 1), ('world', 1), ('hello', 1)]	[('hello', 1), ('world', 1), ('hello', 1), ('hadoop', 1)]	[('hello', 1), ('world', 1), ('hello', 1), ('hadoop', 1)]

Key Moments - 2 Insights

Why does the map function emit multiple key-value pairs for one input record?

What happens after the map function emits key-value pairs?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what key-value pair is emitted at step 3?

A('hello', 1)

B('hadoop', 1)

C('world', 1)

DNo pair emitted

Concept Snapshot

Map phase takes input data split
Processes each record with map function
Splits record into parts (e.g., words)
Emits key-value pairs for each part
Prepares data for shuffle and reduce phases

Full Transcript

The Map phase in Hadoop starts by taking a split of input data. Each record in this split is processed by the map function. For example, a text record is split into words. For each word, the map function emits a key-value pair, typically the word and the count 1. These emitted pairs are collected and later shuffled and sorted to prepare for the reduce phase. The execution table shows step-by-step how records are split and pairs emitted. Variables like 'record', 'words', and 'emitted_pairs' change as the map function runs. Beginners often wonder why multiple pairs come from one record; this is because each word is processed separately. After map, the data is ready for the next phase in Hadoop's data processing pipeline.