0
0
Hadoopdata~10 mins

Word count as MapReduce example in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Word count as MapReduce example
Input Text
Map Function
Emit (word, 1)
Shuffle & Sort
Reduce Function
Sum counts per word
Output (word, total count)
The input text is split and processed by the Map function to emit word counts. Then, the framework groups counts by word and the Reduce function sums them up to produce final counts.
Execution Sample
Hadoop
map(String key, String value):
  for word in value.split():
    emit(word, 1)

reduce(String word, Iterator counts):
  sum = 0
  for c in counts:
    sum += c
  emit(word, sum)
This code counts how many times each word appears in the input text using Map and Reduce functions.
Execution Table
StepInputActionOutput
1"hello world hello"Map splits text and emits (word,1)(hello,1), (world,1), (hello,1)
2(hello,1), (world,1), (hello,1)Shuffle groups by word(hello: [1,1]), (world: [1])
3(hello: [1,1])Reduce sums counts(hello, 2)
4(world: [1])Reduce sums counts(world, 1)
5All words processedOutput final counts(hello, 2), (world, 1)
💡 All words processed and counts summed, MapReduce job completes.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
map_outputempty(hello,1),(world,1),(hello,1)(hello,1),(world,1),(hello,1)(hello,1),(world,1),(hello,1)N/A
shuffle_outputemptyN/A(hello:[1,1]),(world:[1])(hello:[1,1]),(world:[1])N/A
reduce_outputemptyN/AN/A(hello,2)(hello,2),(world,1)
Key Moments - 3 Insights
Why does the Map function emit (word, 1) instead of just the word?
Because Map emits a count of 1 for each word occurrence, so Reduce can sum these counts to get total occurrences. See execution_table step 1 where each word is paired with 1.
What happens during the Shuffle & Sort phase?
Shuffle groups all (word, 1) pairs by word so Reduce gets all counts for a word together. This is shown in execution_table step 2 where words are grouped with their counts.
How does the Reduce function calculate the total count?
Reduce sums all counts for a word from the grouped list. For example, in step 3, it sums [1,1] for 'hello' to get 2.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the output of the Map function at step 1?
A(hello,1), (world,1), (hello,1)
B(hello,2), (world,1)
C(hello:[1,1]), (world:[1])
D(hello,0), (world,0)
💡 Hint
Check the 'Output' column at step 1 in the execution_table.
At which step does the Reduce function output the total count for 'hello'?
AStep 1
BStep 2
CStep 3
DStep 5
💡 Hint
Look at the 'Action' and 'Output' columns in execution_table rows for Reduce function.
If the input text had one more 'world', how would the Reduce output for 'world' change?
AIt would remain (world,1)
BIt would become (world,2)
CIt would become (world,3)
DIt would be removed
💡 Hint
Refer to variable_tracker and how counts are summed in Reduce.
Concept Snapshot
MapReduce Word Count:
- Map splits text, emits (word,1) pairs
- Shuffle groups pairs by word
- Reduce sums counts per word
- Output is (word, total count)
- Used for counting words in large data sets
Full Transcript
This example shows how MapReduce counts words in text. The Map function reads input text and emits each word with a count of 1. Then, the framework groups all counts by word in the Shuffle phase. The Reduce function sums these counts to get total occurrences per word. Finally, the output lists each word with its total count. This process allows counting words efficiently in big data by splitting work across many machines.