0
0
Hadoopdata~10 mins

Map phase explained in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Map phase explained
Input Data Split
Map Function Applied
Process Each Record
Emit Key-Value Pairs
Shuffle and Sort (Prepare for Reduce)
The Map phase takes input data splits, applies the map function to each record, and emits key-value pairs for the next phase.
Execution Sample
Hadoop
def map(record):
    words = record.split()
    for word in words:
        emit(word, 1)
This map function splits a text record into words and emits each word with count 1.
Execution Table
StepInput RecordActionOutput Key-Value Pairs
1"hello world"Split into words ['hello', 'world']None
2First word 'hello'Emit ('hello', 1)('hello', 1)
3Second word 'world'Emit ('world', 1)('world', 1)
4Next record 'hello hadoop'Split into words ['hello', 'hadoop']None
5First word 'hello'Emit ('hello', 1)('hello', 1)
6Second word 'hadoop'Emit ('hadoop', 1)('hadoop', 1)
7No more recordsMap phase endsEnd of map output
💡 All input records processed, map phase completes emitting key-value pairs.
Variable Tracker
VariableStartAfter 1After 2After 3After 4After 5After 6Final
recordNone"hello world""hello world""hello world""hello hadoop""hello hadoop""hello hadoop"None
wordsNone['hello', 'world']['hello', 'world']['hello', 'world']['hello', 'hadoop']['hello', 'hadoop']['hello', 'hadoop']None
wordNoneNone'hello''world'None'hello''hadoop'None
emitted_pairs[][][('hello', 1)][('hello', 1), ('world', 1)][('hello', 1), ('world', 1)][('hello', 1), ('world', 1), ('hello', 1)][('hello', 1), ('world', 1), ('hello', 1), ('hadoop', 1)][('hello', 1), ('world', 1), ('hello', 1), ('hadoop', 1)]
Key Moments - 2 Insights
Why does the map function emit multiple key-value pairs for one input record?
Because the input record is split into multiple words, and the map function emits one key-value pair per word as shown in execution_table rows 2, 3, 5, and 6.
What happens after the map function emits key-value pairs?
After emitting, the pairs are shuffled and sorted to prepare for the reduce phase, as indicated in the concept_flow after the 'Emit Key-Value Pairs' step.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what key-value pair is emitted at step 3?
A('hello', 1)
B('hadoop', 1)
C('world', 1)
DNo pair emitted
💡 Hint
Check the 'Output Key-Value Pairs' column at step 3 in the execution_table.
At which step does the map function process the record 'hello hadoop'?
AStep 4
BStep 1
CStep 6
DStep 7
💡 Hint
Look at the 'Input Record' column in the execution_table to find when 'hello hadoop' is processed.
If the input record was empty, what would the map function emit?
A('null', 0)
BNo key-value pairs
C('empty', 1)
DOne pair with empty string key
💡 Hint
Refer to how the map function emits pairs only for words found in the record, as shown in the execution_table.
Concept Snapshot
Map phase takes input data split
Processes each record with map function
Splits record into parts (e.g., words)
Emits key-value pairs for each part
Prepares data for shuffle and reduce phases
Full Transcript
The Map phase in Hadoop starts by taking a split of input data. Each record in this split is processed by the map function. For example, a text record is split into words. For each word, the map function emits a key-value pair, typically the word and the count 1. These emitted pairs are collected and later shuffled and sorted to prepare for the reduce phase. The execution table shows step-by-step how records are split and pairs emitted. Variables like 'record', 'words', and 'emitted_pairs' change as the map function runs. Beginners often wonder why multiple pairs come from one record; this is because each word is processed separately. After map, the data is ready for the next phase in Hadoop's data processing pipeline.