Complete the code to list files in HDFS directory.
hdfs dfs -ls [1]The command hdfs dfs -ls /user/hadoop/input lists files in the specified HDFS directory.
Complete the code to merge small files into one large file in HDFS.
hdfs dfs -getmerge [1] merged_file.txtThe getmerge command merges all files in the HDFS directory /user/hadoop/smallfiles into one local file merged_file.txt.
Fix the error in the command to combine small files using Hadoop archive.
hadoop archive -archiveName [1] -p /user/hadoop smallfiles /user/hadoop/outputThe archive name should be archive.har to create a Hadoop archive named archive.har in the target directory.
Fill both blanks to combine small files into a single output file using Hadoop streaming.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input [1] -output [2] -mapper cat -reducer cat
The input directory is /user/hadoop/smallfiles and output directory is /user/hadoop/sequencefile_output to combine them into a single output file.
Fill all three blanks to write a Spark job that reads small files and coalesces them into fewer partitions.
spark.read.text([1]).coalesce([2]).write.text([3])
The Spark job reads from hdfs:///user/hadoop/smallfiles, reduces partitions to 5, and writes output to hdfs:///user/hadoop/merged_output.