Challenge - 5 Problems

🎖️

Shuffle and Sort Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Role of Shuffle and Sort in MapReduce

What is the main purpose of the shuffle and sort phase in a Hadoop MapReduce job?

ATo combine the output of all reducers into a single file

BTo write the final output to HDFS after processing

CTo execute the map tasks in parallel across nodes

DTo group and sort intermediate data by key before it reaches reducers

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of Shuffle and Sort Phase

Given the following intermediate mapper outputs, what will be the grouped and sorted data after the shuffle and sort phase?

Mapper outputs:
(apple, 1), (banana, 1), (apple, 1), (banana, 1), (cherry, 1)

A[('apple', 1), ('apple', 1), ('banana', 1), ('banana', 1), ('cherry', 1)]

B{'apple': [1, 1], 'banana': [1, 1], 'cherry': [1]}

C{'banana': [1, 1], 'apple': [1, 1], 'cherry': [1]}

D{'apple': 2, 'banana': 2, 'cherry': 1}

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Number of Keys After Shuffle and Sort

If a MapReduce job processes 1000 unique keys in the map phase, how many keys will be present in the shuffle and sort phase output?

A1000

BDepends on the number of reducers

DMore than 1000 due to duplication

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying Shuffle and Sort Phase Issue

In a MapReduce job, the reducer receives unsorted keys and values. Which issue in the shuffle and sort phase could cause this?

AThe shuffle phase failed to sort keys before sending to reducers

BThe combiner is missing

CThe partitioner is not implemented correctly

DThe mapper output format is incorrect

Attempts:

2 left

🚀 Application

expert

3:00remaining

Optimizing Shuffle and Sort Phase

You want to reduce network traffic during the shuffle and sort phase in a large MapReduce job. Which approach will help achieve this?

ADisable sorting to speed up shuffle

BIncrease the number of reducers to spread the load

CUse a combiner to reduce data size before shuffle

DWrite output directly from mappers to HDFS

Attempts:

2 left