0
0
Hadoopdata~20 mins

Map phase explained in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Map Phase Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main role of the Map phase in Hadoop?

In Hadoop's MapReduce, what does the Map phase primarily do?

AIt combines all intermediate data into a single output file.
BIt processes input data and produces intermediate key-value pairs.
CIt sorts the final output data before saving.
DIt distributes the output data to different storage nodes.
Attempts:
2 left
💡 Hint

Think about what happens right after data is read in Hadoop.

Predict Output
intermediate
2:00remaining
Output of a simple Map function in Hadoop

Given a Map function that reads lines of text and outputs each word with count 1, what is the output for the input line: "apple banana apple"?

Hadoop
def map(line):
    words = line.split()
    for word in words:
        print(f"{word}\t1")

map("apple banana apple")
A
apple	1
banana	1
apple	1
Bapple banana apple 1
C
apple	3
banana	1
D
1	apple
1	banana
1	apple
Attempts:
2 left
💡 Hint

Each word is output separately with count 1.

data_output
advanced
2:30remaining
Intermediate key-value pairs from Map phase

What intermediate key-value pairs does the Map phase produce from the input data: ["cat dog", "dog mouse", "cat mouse"] using a word count Map function?

Hadoop
input_data = ["cat dog", "dog mouse", "cat mouse"]
intermediate = []
for line in input_data:
    words = line.split()
    for word in words:
        intermediate.append((word, 1))
print(intermediate)
A[('cat', 1), ('dog', 1), ('mouse', 1)]
B[('cat', 2), ('dog', 2), ('mouse', 2)]
C[('cat dog', 1), ('dog mouse', 1), ('cat mouse', 1)]
D[('cat', 1), ('dog', 1), ('dog', 1), ('mouse', 1), ('cat', 1), ('mouse', 1)]
Attempts:
2 left
💡 Hint

Each word in each line is output with count 1 separately.

🔧 Debug
advanced
1:30remaining
Identify the error in this Map function

What error will this Map function cause when processing input lines?

def map(line):
    words = line.split()
    for word in words
        print(f"{word}\t1")
AIndentationError due to wrong indentation
BTypeError because split() returns None
CSyntaxError due to missing colon after for loop
DNameError because 'word' is not defined
Attempts:
2 left
💡 Hint

Check the for loop syntax carefully.

🚀 Application
expert
2:30remaining
How does the Map phase handle large datasets in Hadoop?

In Hadoop, when processing a very large dataset, how does the Map phase efficiently handle the data?

AIt splits the input data into chunks and processes each chunk in parallel on different nodes.
BIt loads the entire dataset into memory on a single node before processing.
CIt sends all data to the Reduce phase without processing.
DIt compresses the data first and then processes it sequentially.
Attempts:
2 left
💡 Hint

Think about how Hadoop uses multiple computers to speed up processing.