0
0
Hadoopdata~5 mins

Reduce phase explained in Hadoop

Choose your learning style9 modes available
Introduction

The Reduce phase collects and combines data from the Map phase to produce final results. It helps summarize or aggregate data for easier understanding.

When counting total sales per product from many sales records.
When summing up votes for candidates in an election.
When grouping and averaging temperatures from weather sensors.
When merging logs from multiple servers to find total errors.
Syntax
Hadoop
reduce(key, list_of_values) {
  // process values
  // emit(key, combined_value)
}

The reduce function takes a key and all values for that key from the Map phase.

It combines these values to produce a smaller set of results.

Examples
This example sums all counts for the key 'apple'.
Hadoop
reduce('apple', [2, 3, 5]) {
  sum = 0
  for value in list_of_values:
    sum += value
  emit('apple', sum)
}
This example counts total errors by summing all 1s.
Hadoop
reduce('error', [1, 1, 1, 1]) {
  total_errors = sum(list_of_values)
  emit('error', total_errors)
}
Sample Program

This code simulates the Reduce phase by summing values for each key and printing the result.

Hadoop
def reduce(key, values):
    total = sum(values)
    print(f"{key}: {total}")

# Example data from Map phase
mapped_data = {
    'apple': [2, 3, 5],
    'banana': [1, 1],
    'orange': [4]
}

for key, values in mapped_data.items():
    reduce(key, values)
OutputSuccess
Important Notes

The Reduce phase only sees data grouped by key from the Map phase.

It is important to write reduce logic that correctly combines all values.

Summary

The Reduce phase combines data from Map outputs by key.

It helps summarize or aggregate large data sets.

Reduce functions take a key and list of values, then emit a combined result.