Recall & Review
beginner
What is the purpose of the mapreduce.map.memory.mb parameter?
It sets the amount of memory (in MB) allocated to each map task. Increasing this can help map tasks handle larger data or complex processing.
Click to reveal answer
intermediate
How does mapreduce.reduce.shuffle.parallelcopies affect job performance?
It controls the number of parallel transfers during the shuffle phase for reduce tasks. More parallel copies can speed up data transfer but may increase network load.
Click to reveal answer
beginner
What does the mapreduce.task.io.sort.mb parameter control?
It sets the buffer size (in MB) for sorting map output before it is written to disk. Larger buffers reduce disk I/O but use more memory.
Click to reveal answer
beginner
Why would you adjust mapreduce.reduce.memory.mb in a MapReduce job?
To allocate more memory to reduce tasks, which can help when reduce tasks process large amounts of data or perform complex computations.
Click to reveal answer
intermediate
What is the effect of increasing mapreduce.job.reduces?
It increases the number of reduce tasks. This can speed up the reduce phase by parallelizing work but may increase overhead if set too high.
Click to reveal answer
Which parameter controls the memory allocated to map tasks?
✗ Incorrect
mapreduce.map.memory.mb sets memory for map tasks.
What does increasing mapreduce.job.reduces do?
✗ Incorrect
mapreduce.job.reduces sets the number of reduce tasks.
Which parameter affects the buffer size for sorting map output?
✗ Incorrect
mapreduce.task.io.sort.mb controls the sort buffer size.
What is the role of mapreduce.reduce.shuffle.parallelcopies?
✗ Incorrect
It controls parallel transfers in the shuffle phase.
Why increase mapreduce.reduce.memory.mb?
✗ Incorrect
It allocates more memory to reduce tasks for heavy processing.
Explain how tuning mapreduce.task.io.sort.mb and mapreduce.map.memory.mb can improve map task performance.
Think about memory and sorting buffers in map tasks.
You got /4 concepts.
Describe the impact of increasing mapreduce.job.reduces and mapreduce.reduce.shuffle.parallelcopies on reduce phase performance.
Consider parallelism and data transfer in reduce tasks.
You got /4 concepts.