In Hadoop's MapReduce, what does the Map phase primarily do?
Think about what happens right after data is read in Hadoop.
The Map phase reads input data and transforms it into intermediate key-value pairs for further processing.
Given a Map function that reads lines of text and outputs each word with count 1, what is the output for the input line: "apple banana apple"?
def map(line): words = line.split() for word in words: print(f"{word}\t1") map("apple banana apple")
Each word is output separately with count 1.
The Map function outputs each word with a count of 1, so repeated words appear multiple times with count 1.
What intermediate key-value pairs does the Map phase produce from the input data: ["cat dog", "dog mouse", "cat mouse"] using a word count Map function?
input_data = ["cat dog", "dog mouse", "cat mouse"] intermediate = [] for line in input_data: words = line.split() for word in words: intermediate.append((word, 1)) print(intermediate)
Each word in each line is output with count 1 separately.
The Map phase outputs each word occurrence as a separate key-value pair with count 1, so duplicates appear multiple times.
What error will this Map function cause when processing input lines?
def map(line):
words = line.split()
for word in words
print(f"{word}\t1")Check the for loop syntax carefully.
The for loop is missing a colon at the end, causing a SyntaxError.
In Hadoop, when processing a very large dataset, how does the Map phase efficiently handle the data?
Think about how Hadoop uses multiple computers to speed up processing.
The Map phase splits data into chunks called splits and processes them in parallel on different nodes to handle large datasets efficiently.