0
0
Hadoopdata~30 mins

Word count as MapReduce example in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Word count as MapReduce example
📖 Scenario: Imagine you have a large text document and you want to find out how many times each word appears. This is useful for understanding which words are most common in a book, article, or any text data.
🎯 Goal: You will build a simple MapReduce program that counts the occurrences of each word in a given text. This program will split the text into words, count each word, and then sum the counts to get the total for each word.
📋 What You'll Learn
Create a mapper function that splits lines into words and outputs each word with a count of 1
Create a reducer function that sums counts for each word
Use a configuration variable to set the input text
Print the final word counts as output
💡 Why This Matters
🌍 Real World
Counting words helps analyze text data like customer reviews, social media posts, or books to find popular topics or keywords.
💼 Career
Understanding MapReduce and word counting is a foundational skill for big data processing jobs, especially when working with Hadoop or similar frameworks.
Progress0 / 4 steps
1
DATA SETUP: Create the input text variable
Create a variable called input_text and set it to the string: "hello world hello hadoop"
Hadoop
Need a hint?

Use a simple string assignment like input_text = "hello world hello hadoop".

2
CONFIGURATION: Create a list of lines from the input text
Create a variable called lines that splits input_text by spaces into a list of words
Hadoop
Need a hint?

Use the split() method on input_text to get words.

3
CORE LOGIC: Create a mapper and reducer to count words
Create a dictionary called word_counts. Use a for loop with variable word to iterate over lines. For each word, add 1 to its count in word_counts (initialize to 0 if not present).
Hadoop
Need a hint?

Use word_counts.get(word, 0) + 1 to update counts safely.

4
OUTPUT: Print the final word counts
Write a print statement to display the word_counts dictionary.
Hadoop
Need a hint?

Use print(word_counts) to show the result.