Which of the following tuning parameters directly controls the maximum memory allocated to each map task in a Hadoop MapReduce job?
Think about which parameter sets memory limits specifically for map tasks.
The parameter mapreduce.map.memory.mb sets the maximum memory allocated to each map task. Other options control reduce task memory, sort buffer size, or number of reducers.
Given a MapReduce job with 10 reducers, what will be the number of output files generated in HDFS after job completion?
Each reducer writes one output file.
The number of output files equals the number of reducers because each reducer writes its own output file.
Consider the following MapReduce configuration snippet:
mapreduce.task.io.sort.mb=100
What is the effect of increasing mapreduce.task.io.sort.mb from 100 to 300?
Focus on what io.sort.mb controls in the map phase.
The mapreduce.task.io.sort.mb parameter sets the size of the buffer used to sort map outputs before spilling to disk. Increasing it reduces the number of spills and can improve performance.
A MapReduce job has many reduce tasks running slowly. Which tuning parameter adjustment is most likely to improve reduce task performance?
Consider how reduce tasks fetch map outputs.
Increasing mapreduce.reduce.shuffle.parallelcopies allows reduce tasks to fetch map outputs in parallel, speeding up the shuffle phase and improving reduce task performance.
You have a MapReduce job processing very large input files. The job is slow due to frequent spills during the map phase. Which combination of tuning parameters is best to reduce spills and improve performance?
Think about memory available for sorting map outputs and map task memory limits.
Increasing mapreduce.task.io.sort.mb enlarges the buffer for sorting map outputs, reducing spills. Increasing mapreduce.map.memory.mb ensures map tasks have enough memory to handle the larger buffer.