Hadoop - Performance Tuning
Consider this Hadoop code snippet to read small files and combine them into a SequenceFile format:
from pydoop import hdfs
with hdfs.open('/input/file1') as f1, hdfs.open('/input/file2') as f2:
data1 = f1.read()
data2 = f2.read()
# Combine data1 and data2 into a SequenceFile here
What is the main benefit of using SequenceFile for small files?