0
0
Hadoopdata~20 mins

MapReduce job tuning parameters in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
MapReduce Tuning Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding the impact of map task memory allocation

Which of the following tuning parameters directly controls the maximum memory allocated to each map task in a Hadoop MapReduce job?

Amapreduce.reduce.memory.mb
Bmapreduce.job.reduces
Cmapreduce.task.io.sort.mb
Dmapreduce.map.memory.mb
Attempts:
2 left
💡 Hint

Think about which parameter sets memory limits specifically for map tasks.

data_output
intermediate
2:00remaining
Effect of changing number of reducers on output files

Given a MapReduce job with 10 reducers, what will be the number of output files generated in HDFS after job completion?

A10
BDepends on input splits
CNumber of map tasks
D1
Attempts:
2 left
💡 Hint

Each reducer writes one output file.

Predict Output
advanced
2:00remaining
Analyzing the effect of io.sort.mb on map task output

Consider the following MapReduce configuration snippet:

mapreduce.task.io.sort.mb=100

What is the effect of increasing mapreduce.task.io.sort.mb from 100 to 300?

AIncreases the buffer size for sorting map outputs, reducing spills and improving performance
BIncreases the memory allocated to reduce tasks, allowing more parallelism
CLimits the maximum size of input splits processed by map tasks
DControls the number of map tasks launched per node
Attempts:
2 left
💡 Hint

Focus on what io.sort.mb controls in the map phase.

🔧 Debug
advanced
2:00remaining
Identifying cause of slow reduce tasks

A MapReduce job has many reduce tasks running slowly. Which tuning parameter adjustment is most likely to improve reduce task performance?

ALower <code>mapreduce.map.memory.mb</code> to free cluster memory
BDecrease <code>mapreduce.task.io.sort.mb</code> to reduce map output buffer size
CIncrease <code>mapreduce.reduce.shuffle.parallelcopies</code> to allow more parallel fetches
DSet <code>mapreduce.job.reduces</code> to 1 to reduce overhead
Attempts:
2 left
💡 Hint

Consider how reduce tasks fetch map outputs.

🚀 Application
expert
3:00remaining
Optimizing a MapReduce job for large input data

You have a MapReduce job processing very large input files. The job is slow due to frequent spills during the map phase. Which combination of tuning parameters is best to reduce spills and improve performance?

ADecrease <code>mapreduce.task.io.sort.mb</code> and increase <code>mapreduce.reduce.memory.mb</code>
BIncrease <code>mapreduce.task.io.sort.mb</code> and increase <code>mapreduce.map.memory.mb</code>
CIncrease <code>mapreduce.job.reduces</code> and decrease <code>mapreduce.map.memory.mb</code>
DDecrease <code>mapreduce.reduce.shuffle.parallelcopies</code> and increase <code>mapreduce.task.io.sort.mb</code>
Attempts:
2 left
💡 Hint

Think about memory available for sorting map outputs and map task memory limits.