The Map phase breaks big data into smaller pieces and processes them in parallel. This helps find useful information faster.
0
0
Map phase explained in Hadoop
Introduction
When you have a large list of sales records and want to count total sales per product.
When you want to analyze website logs to find how many times each page was visited.
When you need to process many text files to count how often each word appears.
When you want to filter data to keep only records matching a condition before further analysis.
Syntax
Hadoop
map(key, value): # process input key-value pair # emit intermediate key-value pairs
The map function takes one input key and one input value at a time.
It outputs zero or more intermediate key-value pairs for the next phase.
Examples
This example splits text into words and emits each word with count 1.
Hadoop
map(document_id, document_text): for word in document_text.split(): emit(word, 1)
This example passes user purchases forward keyed by user ID.
Hadoop
map(user_id, purchase_amount):
emit(user_id, purchase_amount)Sample Program
This simple map function takes a document ID and text, splits the text into words, and prints each word with count 1 separated by a tab. This simulates the Map phase output.
Hadoop
def map(key, value): words = value.split() for word in words: print(f"{word}\t1") # Simulate input input_data = [(1, "apple banana apple"), (2, "banana orange")] for key, value in input_data: map(key, value)
OutputSuccess
Important Notes
The Map phase runs on many machines at once, each working on a small part of the data.
Output keys from Map are grouped by key before the next phase (Reduce).
Summary
The Map phase processes input data piece by piece.
It outputs intermediate key-value pairs for further processing.
This phase helps handle big data by working in parallel.