Challenge - 5 Problems

🎖️

Map Phase Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

What is the main role of the Map phase in Hadoop?

In Hadoop's MapReduce, what does the Map phase primarily do?

AIt combines all intermediate data into a single output file.

BIt processes input data and produces intermediate key-value pairs.

CIt sorts the final output data before saving.

DIt distributes the output data to different storage nodes.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of a simple Map function in Hadoop

Given a Map function that reads lines of text and outputs each word with count 1, what is the output for the input line: "apple banana apple"?

Hadoop

def map(line):
    words = line.split()
    for word in words:
        print(f"{word}\t1")

map("apple banana apple")

apple	1
banana	1
apple	1

Bapple banana apple 1

apple	3
banana	1

1	apple
1	banana
1	apple

Attempts:

2 left

❓ data_output

advanced

2:30remaining

Intermediate key-value pairs from Map phase

What intermediate key-value pairs does the Map phase produce from the input data: ["cat dog", "dog mouse", "cat mouse"] using a word count Map function?

Hadoop

input_data = ["cat dog", "dog mouse", "cat mouse"]
intermediate = []
for line in input_data:
    words = line.split()
    for word in words:
        intermediate.append((word, 1))
print(intermediate)

A[('cat', 1), ('dog', 1), ('mouse', 1)]

B[('cat', 2), ('dog', 2), ('mouse', 2)]

C[('cat dog', 1), ('dog mouse', 1), ('cat mouse', 1)]

D[('cat', 1), ('dog', 1), ('dog', 1), ('mouse', 1), ('cat', 1), ('mouse', 1)]

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in this Map function

What error will this Map function cause when processing input lines?

def map(line):
    words = line.split()
    for word in words
        print(f"{word}\t1")

AIndentationError due to wrong indentation

BTypeError because split() returns None

CSyntaxError due to missing colon after for loop

DNameError because 'word' is not defined

Attempts:

2 left

🚀 Application

expert

2:30remaining

How does the Map phase handle large datasets in Hadoop?

In Hadoop, when processing a very large dataset, how does the Map phase efficiently handle the data?

AIt splits the input data into chunks and processes each chunk in parallel on different nodes.

BIt loads the entire dataset into memory on a single node before processing.

CIt sends all data to the Reduce phase without processing.

DIt compresses the data first and then processes it sequentially.

Attempts:

2 left