What is Reduce phase explained in Hadoop?

Hadoopdata~5 mins

Reduce phase explained in Hadoop

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

The Reduce phase collects and combines data from the Map phase to produce final results. It helps summarize or aggregate data for easier understanding.

When counting total sales per product from many sales records.

When summing up votes for candidates in an election.

When grouping and averaging temperatures from weather sensors.

When merging logs from multiple servers to find total errors.

Syntax

Hadoop

reduce(key, list_of_values) {
  // process values
  // emit(key, combined_value)
}

The reduce function takes a key and all values for that key from the Map phase.

It combines these values to produce a smaller set of results.

Examples

This example sums all counts for the key 'apple'.

Hadoop

reduce('apple', [2, 3, 5]) {
  sum = 0
  for value in list_of_values:
    sum += value
  emit('apple', sum)
}

This example counts total errors by summing all 1s.

Hadoop

reduce('error', [1, 1, 1, 1]) {
  total_errors = sum(list_of_values)
  emit('error', total_errors)
}

Sample Program

This code simulates the Reduce phase by summing values for each key and printing the result.

Hadoop

def reduce(key, values):
    total = sum(values)
    print(f"{key}: {total}")

# Example data from Map phase
mapped_data = {
    'apple': [2, 3, 5],
    'banana': [1, 1],
    'orange': [4]
}

for key, values in mapped_data.items():
    reduce(key, values)

OutputSuccess

Important Notes

The Reduce phase only sees data grouped by key from the Map phase.

It is important to write reduce logic that correctly combines all values.

Summary

The Reduce phase combines data from Map outputs by key.

It helps summarize or aggregate large data sets.

Reduce functions take a key and list of values, then emit a combined result.