Bird
0
0

A MapReduce job is running slowly because map tasks are spilling data to disk too often. Which parameter should you increase to reduce spills?

medium📝 Debug Q6 of 15
Hadoop - Performance Tuning
A MapReduce job is running slowly because map tasks are spilling data to disk too often. Which parameter should you increase to reduce spills?
Amapreduce.task.io.sort.mb
Bmapreduce.job.reduces
Cmapreduce.reduce.shuffle.parallelcopies
Dmapreduce.task.io.sort.factor
Step-by-Step Solution
Solution:
  1. Step 1: Identify cause of frequent spills

    Spills happen when the sort buffer fills up during map output sorting.
  2. Step 2: Choose parameter to increase buffer size

    mapreduce.task.io.sort.mb controls the buffer size; increasing it reduces spills.
  3. Final Answer:

    mapreduce.task.io.sort.mb -> Option A
  4. Quick Check:

    Increase sort buffer size to reduce spills [OK]
Quick Trick: Increase mapreduce.task.io.sort.mb to reduce map spills [OK]
Common Mistakes:
  • Increasing number of reduce tasks instead
  • Changing shuffle parallel copies which affects reduce phase
  • Adjusting sort factor which affects merge, not spills

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More Hadoop Quizzes