Challenge - 5 Problems

🎖️

MapReduce Tuning Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding the impact of map task memory allocation

Which of the following tuning parameters directly controls the maximum memory allocated to each map task in a Hadoop MapReduce job?

Amapreduce.reduce.memory.mb

Bmapreduce.job.reduces

Cmapreduce.task.io.sort.mb

Dmapreduce.map.memory.mb

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Effect of changing number of reducers on output files

Given a MapReduce job with 10 reducers, what will be the number of output files generated in HDFS after job completion?

A10

BDepends on input splits

CNumber of map tasks

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

Analyzing the effect of io.sort.mb on map task output

Consider the following MapReduce configuration snippet:

mapreduce.task.io.sort.mb=100

What is the effect of increasing mapreduce.task.io.sort.mb from 100 to 300?

AIncreases the buffer size for sorting map outputs, reducing spills and improving performance

BIncreases the memory allocated to reduce tasks, allowing more parallelism

CLimits the maximum size of input splits processed by map tasks

DControls the number of map tasks launched per node

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying cause of slow reduce tasks

A MapReduce job has many reduce tasks running slowly. Which tuning parameter adjustment is most likely to improve reduce task performance?

ALower <code>mapreduce.map.memory.mb</code> to free cluster memory

BDecrease <code>mapreduce.task.io.sort.mb</code> to reduce map output buffer size

CIncrease <code>mapreduce.reduce.shuffle.parallelcopies</code> to allow more parallel fetches

DSet <code>mapreduce.job.reduces</code> to 1 to reduce overhead

Attempts:

2 left

🚀 Application

expert

3:00remaining

Optimizing a MapReduce job for large input data

You have a MapReduce job processing very large input files. The job is slow due to frequent spills during the map phase. Which combination of tuning parameters is best to reduce spills and improve performance?

ADecrease <code>mapreduce.task.io.sort.mb</code> and increase <code>mapreduce.reduce.memory.mb</code>

BIncrease <code>mapreduce.task.io.sort.mb</code> and increase <code>mapreduce.map.memory.mb</code>

CIncrease <code>mapreduce.job.reduces</code> and decrease <code>mapreduce.map.memory.mb</code>

DDecrease <code>mapreduce.reduce.shuffle.parallelcopies</code> and increase <code>mapreduce.task.io.sort.mb</code>

Attempts:

2 left