What is Map phase explained in Hadoop?

Hadoopdata~5 mins

Map phase explained in Hadoop

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

The Map phase breaks big data into smaller pieces and processes them in parallel. This helps find useful information faster.

When you have a large list of sales records and want to count total sales per product.

When you want to analyze website logs to find how many times each page was visited.

When you need to process many text files to count how often each word appears.

When you want to filter data to keep only records matching a condition before further analysis.

Syntax

Hadoop

map(key, value):
    # process input key-value pair
    # emit intermediate key-value pairs

The map function takes one input key and one input value at a time.

It outputs zero or more intermediate key-value pairs for the next phase.

Examples

This example splits text into words and emits each word with count 1.

Hadoop

map(document_id, document_text):
    for word in document_text.split():
        emit(word, 1)

This example passes user purchases forward keyed by user ID.

Hadoop

map(user_id, purchase_amount):
    emit(user_id, purchase_amount)

Sample Program

This simple map function takes a document ID and text, splits the text into words, and prints each word with count 1 separated by a tab. This simulates the Map phase output.

Hadoop

def map(key, value):
    words = value.split()
    for word in words:
        print(f"{word}\t1")

# Simulate input
input_data = [(1, "apple banana apple"), (2, "banana orange")]

for key, value in input_data:
    map(key, value)

OutputSuccess

Important Notes

The Map phase runs on many machines at once, each working on a small part of the data.

Output keys from Map are grouped by key before the next phase (Reduce).

Summary

The Map phase processes input data piece by piece.

It outputs intermediate key-value pairs for further processing.

This phase helps handle big data by working in parallel.