0
0
Hadoopdata~5 mins

Word count as MapReduce example in Hadoop

Choose your learning style9 modes available
Introduction

Word count helps us find how many times each word appears in a big text. It shows how MapReduce breaks big tasks into small parts and then combines results.

Counting words in a large book or document collection.
Analyzing common words in social media posts.
Finding popular search terms from logs.
Summarizing text data for reports.
Learning how big data tools process information.
Syntax
Hadoop
map(key, value):
    for word in value.split():
        emit(word, 1)

reduce(key, values):
    total = sum(values)
    emit(key, total)

map processes each line and outputs (word, 1) pairs.

reduce adds all counts for each word to get total occurrences.

Examples
The map function splits the line into words and emits each word with count 1.
Hadoop
map(1, "hello world hello")
# emits ('hello', 1), ('world', 1), ('hello', 1)
The reduce function sums the counts for 'hello' to get 2.
Hadoop
reduce('hello', [1, 1])
# emits ('hello', 2)
Sample Program

This code simulates MapReduce word count: it maps words to counts, groups them, then reduces by summing counts.

Hadoop
from collections import defaultdict

# Sample input: list of lines
input_data = [
    "hello world",
    "hello hadoop",
    "hello mapreduce world"
]

# Map step
mapped = []
for line in input_data:
    for word in line.split():
        mapped.append((word, 1))

# Shuffle and sort step (group by word)
grouped = defaultdict(list)
for word, count in mapped:
    grouped[word].append(count)

# Reduce step
reduced = {}
for word, counts in grouped.items():
    reduced[word] = sum(counts)

# Print results
for word, total in sorted(reduced.items()):
    print(f"{word}: {total}")
OutputSuccess
Important Notes

MapReduce splits big data into small pieces to process in parallel.

Shuffle step groups all same words together before reducing.

This example uses Python to simulate MapReduce logic simply.

Summary

Map step breaks text into (word, 1) pairs.

Reduce step sums counts for each word.

Word count shows how MapReduce handles big data tasks.