Challenge - 5 Problems

🎖️

MapReduce Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

What is the first step in a MapReduce job execution?

In the MapReduce job execution flow, which step happens first?

ASplitting the input data into chunks

BShuffling and sorting the map output

CReducing the intermediate data

DWriting the final output to HDFS

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

What happens during the shuffle phase in MapReduce?

During the MapReduce job execution, what is the main purpose of the shuffle phase?

AExecuting the map function on input data

BSplitting the input data into smaller chunks

CWriting the final output to the distributed file system

DSorting and transferring map outputs to reducers

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Identify the output of the map phase given this input

Given the input data ["apple", "banana", "apple"] and a map function that outputs (word, 1) for each word, what is the map phase output?

Hadoop

input_data = ["apple", "banana", "apple"]
map_output = [(word, 1) for word in input_data]
print(map_output)

A[('banana', 1), ('apple', 1), ('apple', 1)]

B[('apple', 1), ('banana', 1), ('apple', 1)]

C[('apple', 1), ('banana', 1)]

D[('apple', 2), ('banana', 1)]

Attempts:

2 left

❓ data_output

advanced

2:00remaining

What is the reducer output for this intermediate data?

Given the intermediate data {'apple': [1, 1], 'banana': [1]}, what is the output of the reducer that sums the values?

Hadoop

intermediate_data = {'apple': [1, 1], 'banana': [1]}
reducer_output = {k: sum(v) for k, v in intermediate_data.items()}
print(reducer_output)

A{'apple': 2, 'banana': 1}

B{'apple': 1, 'banana': 1}

C{'apple': [2], 'banana': [1]}

D{'apple': 3, 'banana': 1}

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Which component manages the overall MapReduce job execution?

In Hadoop's MapReduce architecture, which component is responsible for managing the entire job execution, including resource allocation and task scheduling?

ANameNode

BTaskTracker

CJobTracker

DDataNode

Attempts:

2 left