0
0
Hadoopdata~30 mins

User-defined functions (UDFs) in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
User-defined functions (UDFs) in Hadoop MapReduce
📖 Scenario: You work with a large text dataset stored in Hadoop. You want to count how many times each word appears. To do this, you will write a simple User-defined Function (UDF) in Hadoop MapReduce that processes the text data.
🎯 Goal: Build a Hadoop MapReduce program with a user-defined mapper function that splits lines into words and counts each word's occurrences.
📋 What You'll Learn
Create a mapper function that splits input lines into words
Create a reducer function that sums counts for each word
Use the Hadoop MapReduce framework to run the job
Print the final word counts
💡 Why This Matters
🌍 Real World
Counting word frequencies is a common task in analyzing large text data like logs, documents, or social media posts using Hadoop.
💼 Career
Understanding how to write user-defined functions in Hadoop MapReduce is essential for data engineers and data scientists working with big data.
Progress0 / 4 steps
1
Create the input data list
Create a list called input_data with these exact three strings: 'hello world', 'hello hadoop', and 'hello mapreduce'.
Hadoop
Need a hint?

Use square brackets to create a list and put each string inside quotes separated by commas.

2
Define the mapper function
Define a function called mapper that takes a single argument line. Inside, split line into words using line.split() and return the list of words.
Hadoop
Need a hint?

Use the split() method on the input string to get words.

3
Apply the mapper and count words
Create an empty dictionary called word_counts. Use a for loop with variable line to iterate over input_data. Inside the loop, call mapper(line) to get words. Then use another for loop with variable word to iterate over the words. For each word, add 1 to its count in word_counts (initialize to 0 if not present).
Hadoop
Need a hint?

Use dict.get(key, 0) to handle missing keys when counting.

4
Print the final word counts
Write a print statement to display the word_counts dictionary.
Hadoop
Need a hint?

The printed dictionary should show the count of each word.