Challenge - 5 Problems

🎖️

Reduce Phase Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

What is the main role of the Reduce phase in Hadoop MapReduce?

In Hadoop MapReduce, after the Map phase processes input data, the Reduce phase takes over. What is the main role of the Reduce phase?

AIt sorts the input data before mapping.

BIt splits the input data into smaller chunks for processing.

CIt aggregates and summarizes the intermediate data produced by the Map phase.

DIt reads raw input data from the source files.

Attempts:

2 left

❓ Predict Output

intermediate

1:30remaining

Output of Reduce function aggregation

Consider the following simplified Reduce function in Hadoop MapReduce that sums values for each key:

def reduce(key, values):
    total = 0
    for v in values:
        total += v
    print(f"{key}: {total}")

What will be the output if the input to reduce is key = 'apple' and values = [2, 3, 5]?

Hadoop

def reduce(key, values):
    total = 0
    for v in values:
        total += v
    print(f"{key}: {total}")

reduce('apple', [2, 3, 5])

Aapple: 10

Bapple: 235

Capple: 0

Dapple: 5

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Resulting key-value pairs after Reduce phase

Given the following intermediate key-value pairs from the Map phase:

{'cat': [1, 1, 1], 'dog': [1, 1], 'bird': [1]}

If the Reduce phase sums the values for each key, what is the resulting output?

A{'cat': 1, 'dog': 1, 'bird': 1}

B{'cat': 3, 'dog': 2, 'bird': 1}

C{'cat': [3], 'dog': [2], 'bird': [1]}

D{'cat': 6, 'dog': 4, 'bird': 2}

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in this Reduce function

Look at this Reduce function code snippet:

def reduce(key, values):
    total = 0
    for v in values
        total += v
    print(f"{key}: {total}")

What error will this code produce when run?

ANameError because total is not defined

BIndentationError due to wrong indentation

CTypeError because values is not iterable

DSyntaxError due to missing colon after for loop

Attempts:

2 left

🚀 Application

expert

2:00remaining

Choosing the correct Reduce phase output for word count

In a word count MapReduce job, the Map phase outputs key-value pairs where the key is a word and the value is 1 for each occurrence. The Reduce phase sums these counts. Given the following Map output for the word 'data':

[('data', 1), ('data', 1), ('data', 1), ('data', 1)]

Which of the following is the correct Reduce phase output for the key 'data'?

A('data', 4)

B('data', [1, 1, 1, 1])

C('data', 1)

D('data', 0)

Attempts:

2 left