Challenge - 5 Problems

🎖️

MapReduce Parallel Processing Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

How does MapReduce split data for parallel processing?

MapReduce processes large data by splitting it into parts. What is the main reason for splitting data into chunks?

ATo encrypt data for security during processing

BTo allow multiple machines to process data parts at the same time

CTo convert data into a different format for storage

DTo reduce the total size of the data permanently

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

Why does MapReduce use a 'map' and 'reduce' step?

MapReduce has two main steps: map and reduce. Why is this two-step process important for parallel data processing?

ABecause the map step processes data pieces independently, and the reduce step combines results

BBecause the map step encrypts data, and the reduce step decrypts it

CBecause the map step deletes unnecessary data, and the reduce step stores the rest

DBecause the map step compresses data, and the reduce step decompresses it

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Output of MapReduce word count example

Given a MapReduce job that counts words in the text: 'cat dog cat bird', what is the output after the reduce step?

Hadoop

Input text: 'cat dog cat bird'
Map step output (key-value pairs): [('cat',1), ('dog',1), ('cat',1), ('bird',1)]
Reduce step sums counts for each word.

A{'cat': 2, 'dog': 2, 'bird': 1}

B{'cat': 1, 'dog': 1, 'bird': 1}

C{'cat': 3, 'dog': 1, 'bird': 1}

D{'cat': 2, 'dog': 1, 'bird': 1}

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the cause of slow MapReduce job

A MapReduce job is running slower than expected. Which of the following is the most likely cause related to parallel processing?

AThe map step is encrypting data unnecessarily

BThe reduce step is combining results too quickly

CData chunks are unevenly sized, causing some machines to finish early and others to lag

DThe input data is too small to split

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing the best data split strategy for MapReduce

You have a dataset with many small files. Which data splitting strategy will best improve MapReduce parallel processing efficiency?

ACombine small files into larger splits before mapping to reduce overhead

BProcess each small file as a separate split to maximize parallelism

CIgnore small files and process only large files

DSplit large files into smaller chunks but leave small files as is

Attempts:

2 left