Hadoopdata~10 mins

Reduce phase explained in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Reduce phase explained

Receive key-value pairs from Map phase

↓

Group values by key

↓

Apply Reduce function to each key and its list of values

↓

Combine results into final output

↓

Write output to storage

The Reduce phase takes grouped data from the Map phase, processes each group by key, and produces summarized output.

Execution Sample

Hadoop

def reduce(key, values):
    total = 0
    for v in values:
        total += v
    emit(key, total)

This code sums all values for each key and outputs the total.

Execution Table

Step	Input Key	Input Values	Action	Output
1	"apple"	[2, 3, 5]	Sum values: 2+3+5	("apple", 10)
2	"banana"	[1, 4]	Sum values: 1+4	("banana", 5)
3	"orange"	[7]	Sum values: 7	("orange", 7)
4	No more keys	N/A	Reduce phase ends	All key totals emitted

💡 All keys processed, no more data to reduce

Variable Tracker

Variable	Start	After 1	After 2	After 3	Final
key	None	"apple"	"banana"	"orange"	None
values	None	[2, 3, 5]	[1, 4]	[7]	None
total	0	10	5	7	N/A

Key Moments - 2 Insights

Why do we sum values only for the same key?

What happens if a key has only one value?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what is the output for the key "banana"?

A("banana", 4)

B("banana", 5)

C("banana", 1)

D("banana", 0)

Concept Snapshot

Reduce phase summary:
- Input: grouped key and list of values from Map phase
- Process: apply reduce function to aggregate values per key
- Output: key with combined result (e.g., sum)
- Ends when all keys processed
- Key step in Hadoop data processing

Full Transcript

The Reduce phase in Hadoop takes the output from the Map phase, which is grouped by keys. For each key, it receives a list of values. The reduce function processes these values, often by summing or aggregating them, and then outputs a single result per key. This process repeats for all keys until none remain. For example, if the key is "apple" and the values are [2, 3, 5], the reduce function sums these to 10 and outputs ("apple", 10). This phase is crucial for summarizing and combining data in Hadoop workflows.