Recall & Review
beginner
What is the 'small files problem' in Hadoop?
It happens when Hadoop stores many tiny files instead of fewer large files. This causes overhead because each file needs metadata and resources, slowing down processing.
Click to reveal answer
beginner
Why do many small files cause performance issues in Hadoop?
Because the NameNode must keep metadata for each file, many small files increase memory use and slow down file access and job execution.
Click to reveal answer
beginner
Name one common solution to the small files problem.
Combine small files into larger files using tools like Hadoop Archive (HAR) or SequenceFile to reduce the number of files Hadoop manages.
Click to reveal answer
intermediate
What is Hadoop Archive (HAR) and how does it help?
HAR packs many small files into a single archive file. It reduces metadata overhead and improves performance while keeping files accessible.
Click to reveal answer
intermediate
How does using SequenceFile format solve the small files problem?
SequenceFile stores many small files as key-value pairs in one large file, reducing the number of files and improving read efficiency.
Click to reveal answer
What causes the small files problem in Hadoop?
✗ Incorrect
The small files problem arises when Hadoop stores many small files, causing metadata overhead.
Which Hadoop component struggles with many small files?
✗ Incorrect
NameNode manages metadata for files and struggles with many small files due to memory overhead.
Which tool can combine small files into a single archive in Hadoop?
✗ Incorrect
HAR archives many small files into one, reducing metadata overhead.
SequenceFile format stores data as:
✗ Incorrect
SequenceFile stores many small files as key-value pairs in a single file.
What is a main benefit of solving the small files problem?
✗ Incorrect
Reducing small files improves performance by lowering metadata overhead and speeding up file access.
Explain the small files problem in Hadoop and why it affects performance.
Think about how Hadoop handles file metadata and what happens when there are many tiny files.
You got /4 concepts.
Describe two solutions to the small files problem and how they help.
Focus on how combining files reduces the number of files Hadoop manages.
You got /4 concepts.