Concept Flow - Small files problem and solutions
Many small files created
HDFS stores each file as a block
NameNode stores metadata for each file
Metadata overload on NameNode
Performance degradation
Apply solutions: Combine files, Use SequenceFile, Use HAR, Use HBase
Reduced metadata and improved performance
Small files cause metadata overload in Hadoop's NameNode, slowing performance. Solutions combine or reorganize files to reduce metadata.