0
0
Hadoopdata~10 mins

Reduce phase explained in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Reduce phase explained
Receive key-value pairs from Map phase
Group values by key
Apply Reduce function to each key and its list of values
Combine results into final output
Write output to storage
The Reduce phase takes grouped data from the Map phase, processes each group by key, and produces summarized output.
Execution Sample
Hadoop
def reduce(key, values):
    total = 0
    for v in values:
        total += v
    emit(key, total)
This code sums all values for each key and outputs the total.
Execution Table
StepInput KeyInput ValuesActionOutput
1"apple"[2, 3, 5]Sum values: 2+3+5("apple", 10)
2"banana"[1, 4]Sum values: 1+4("banana", 5)
3"orange"[7]Sum values: 7("orange", 7)
4No more keysN/AReduce phase endsAll key totals emitted
💡 All keys processed, no more data to reduce
Variable Tracker
VariableStartAfter 1After 2After 3Final
keyNone"apple""banana""orange"None
valuesNone[2, 3, 5][1, 4][7]None
total01057N/A
Key Moments - 2 Insights
Why do we sum values only for the same key?
Because the Reduce phase groups all values by their key before processing, so each reduce call handles one key and its values as shown in execution_table rows 1-3.
What happens if a key has only one value?
The reduce function still runs and sums that single value, as in execution_table row 3 with key "orange" and value [7].
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the output for the key "banana"?
A("banana", 4)
B("banana", 5)
C("banana", 1)
D("banana", 0)
💡 Hint
Check row 2 in the execution_table for the output of key "banana".
At which step does the Reduce phase finish processing all keys?
AStep 2
BStep 3
CStep 4
DStep 1
💡 Hint
Look at the exit_note and the last row in execution_table.
If the values for "apple" were [2, 3, 5, 1], what would be the new output?
A("apple", 11)
B("apple", 10)
C("apple", 9)
D("apple", 12)
💡 Hint
Sum all values in the list: 2+3+5+1 = 11, compare with execution_table row 1.
Concept Snapshot
Reduce phase summary:
- Input: grouped key and list of values from Map phase
- Process: apply reduce function to aggregate values per key
- Output: key with combined result (e.g., sum)
- Ends when all keys processed
- Key step in Hadoop data processing
Full Transcript
The Reduce phase in Hadoop takes the output from the Map phase, which is grouped by keys. For each key, it receives a list of values. The reduce function processes these values, often by summing or aggregating them, and then outputs a single result per key. This process repeats for all keys until none remain. For example, if the key is "apple" and the values are [2, 3, 5], the reduce function sums these to 10 and outputs ("apple", 10). This phase is crucial for summarizing and combining data in Hadoop workflows.