Hadoopdata~30 mins

Word count as MapReduce example in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Word count as MapReduce example

📖 Scenario: Imagine you have a large text document and you want to find out how many times each word appears. This is useful for understanding which words are most common in a book, article, or any text data.

🎯 Goal: You will build a simple MapReduce program that counts the occurrences of each word in a given text. This program will split the text into words, count each word, and then sum the counts to get the total for each word.

📋 What You'll Learn

Create a mapper function that splits lines into words and outputs each word with a count of 1

Create a reducer function that sums counts for each word

Use a configuration variable to set the input text

Print the final word counts as output

💡 Why This Matters

🌍 Real World

Counting words helps analyze text data like customer reviews, social media posts, or books to find popular topics or keywords.

💼 Career

Understanding MapReduce and word counting is a foundational skill for big data processing jobs, especially when working with Hadoop or similar frameworks.

Progress0 / 4 steps

DATA SETUP: Create the input text variable

Create a variable called input_text and set it to the string: "hello world hello hadoop"

Hadoop

# Create a variable called input_text with the exact string
# Your code here

Need a hint?

Use a simple string assignment like input_text = "hello world hello hadoop".

CONFIGURATION: Create a list of lines from the input text

Create a variable called lines that splits input_text by spaces into a list of words

Hadoop

input_text = "hello world hello hadoop"
# Create a list called lines by splitting input_text by spaces
# Your code here

Need a hint?

Use the split() method on input_text to get words.

CORE LOGIC: Create a mapper and reducer to count words

Create a dictionary called word_counts. Use a for loop with variable word to iterate over lines. For each word, add 1 to its count in word_counts (initialize to 0 if not present).

Hadoop

input_text = "hello world hello hadoop"
lines = input_text.split()
# Create a dictionary word_counts and count each word in lines
# Your code here

Need a hint?

Use word_counts.get(word, 0) + 1 to update counts safely.

OUTPUT: Print the final word counts

Write a print statement to display the word_counts dictionary.

Hadoop

input_text = "hello world hello hadoop"
lines = input_text.split()
word_counts = {}
for word in lines:
    word_counts[word] = word_counts.get(word, 0) + 1
# Print the word_counts dictionary
# Your code here

Need a hint?

Use print(word_counts) to show the result.