Bird
0
0

Which combination of parameters would best reduce map output spills and improve shuffle performance?

hard📝 Application Q8 of 15
Hadoop - Performance Tuning
You want to optimize a MapReduce job that processes large input splits and has many small map outputs. Which combination of parameters would best reduce map output spills and improve shuffle performance?
AIncrease mapreduce.job.reduces and decrease mapreduce.task.io.sort.factor
BIncrease mapreduce.task.io.sort.mb and increase mapreduce.reduce.shuffle.parallelcopies
CDecrease mapreduce.task.io.sort.mb and decrease mapreduce.reduce.shuffle.parallelcopies
DDecrease mapreduce.job.maps and increase mapreduce.task.io.sort.factor
Step-by-Step Solution
Solution:
  1. Step 1: Reduce map output spills

    Increasing mapreduce.task.io.sort.mb enlarges the sort buffer, reducing spills.
  2. Step 2: Improve shuffle speed

    Increasing mapreduce.reduce.shuffle.parallelcopies allows more parallel fetches, speeding shuffle.
  3. Final Answer:

    Increase mapreduce.task.io.sort.mb and increase mapreduce.reduce.shuffle.parallelcopies -> Option B
  4. Quick Check:

    Big sort buffer + more shuffle threads = better performance [OK]
Quick Trick: Bigger sort buffer and more shuffle threads boost performance [OK]
Common Mistakes:
  • Decreasing buffer size which increases spills
  • Reducing shuffle parallel copies which slows shuffle
  • Changing number of maps instead of tuning buffers

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More Hadoop Quizzes