0
0
Hadoopdata~10 mins

Small files problem and solutions in Hadoop - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to list files in HDFS directory.

Hadoop
hdfs dfs -ls [1]
Drag options to blanks, or click blank then click option'
A/user/hadoop/input
B/tmp/data
C/var/log
D/home/user
Attempts:
3 left
💡 Hint
Common Mistakes
Using a local file system path instead of HDFS path.
Forgetting the leading slash in the path.
2fill in blank
medium

Complete the code to merge small files into one large file in HDFS.

Hadoop
hdfs dfs -getmerge [1] merged_file.txt
Drag options to blanks, or click blank then click option'
A/user/hadoop/smallfiles
B/tmp/largefiles
C/user/hadoop/output
D/var/data
Attempts:
3 left
💡 Hint
Common Mistakes
Using a directory that does not contain the small files.
Confusing local and HDFS paths.
3fill in blank
hard

Fix the error in the command to combine small files using Hadoop archive.

Hadoop
hadoop archive -archiveName [1] -p /user/hadoop smallfiles /user/hadoop/output
Drag options to blanks, or click blank then click option'
Asmallfiles.har
Barchive.har
Cmergedfiles.har
Dcombined.har
Attempts:
3 left
💡 Hint
Common Mistakes
Using the source directory name as archive name without extension.
Not specifying the target directory correctly.
4fill in blank
hard

Fill both blanks to combine small files into a single output file using Hadoop streaming.

Hadoop
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input [1] -output [2] -mapper cat -reducer cat
Drag options to blanks, or click blank then click option'
A/user/hadoop/smallfiles
B/user/hadoop/sequencefile_output
C/user/hadoop/output
D/tmp/streaming
Attempts:
3 left
💡 Hint
Common Mistakes
Using the same directory for input and output.
Choosing a local path instead of HDFS path.
5fill in blank
hard

Fill all three blanks to write a Spark job that reads small files and coalesces them into fewer partitions.

Hadoop
spark.read.text([1]).coalesce([2]).write.text([3])
Drag options to blanks, or click blank then click option'
A"hdfs:///user/hadoop/smallfiles"
B5
C"hdfs:///user/hadoop/merged_output"
D"/local/path/output"
Attempts:
3 left
💡 Hint
Common Mistakes
Using local file paths instead of HDFS paths.
Setting coalesce to a number larger than input files.