Challenge - 5 Problems
HDFS Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this HDFS write command?
Consider the following Hadoop shell command to write a local file to HDFS:
What will be the result if the file already exists in the destination path?
hdfs dfs -put /local/path/file.txt /user/hadoop/What will be the result if the file already exists in the destination path?
Attempts:
2 left
💡 Hint
Think about how HDFS handles file overwrites by default.
✗ Incorrect
By default, the 'hdfs dfs -put' command does not overwrite existing files and will fail if the destination file exists.
❓ data_output
intermediate2:00remaining
What data is read by this HDFS command?
Given the command:
What will be the output?
hdfs dfs -cat /user/hadoop/data/sample.txtWhat will be the output?
Attempts:
2 left
💡 Hint
The '-cat' option is similar to the Unix 'cat' command.
✗ Incorrect
The 'hdfs dfs -cat' command outputs the full content of the specified file to the console.
🔧 Debug
advanced2:00remaining
Why does this HDFS read command fail?
You run:
But get an error:
What is the most likely cause?
hdfs dfs -text /user/hadoop/data/sample.gzBut get an error:
gzip: stdin: not in gzip formatWhat is the most likely cause?
Attempts:
2 left
💡 Hint
Check the file format and compression type.
✗ Incorrect
The error indicates the file is not in gzip format despite the .gz extension, so the command fails to decompress.
🚀 Application
advanced2:00remaining
How to efficiently write large data to HDFS in a Spark job?
You have a Spark job that generates a large DataFrame. You want to save it to HDFS as a single compressed file. Which approach is best?
Attempts:
2 left
💡 Hint
Think about controlling the number of output files and compression in Spark.
✗ Incorrect
Using coalesce(1) reduces output to one file, and compression reduces storage size efficiently.
🧠 Conceptual
expert3:00remaining
What happens internally during an HDFS read operation?
When a client reads a file from HDFS, which sequence best describes the internal process?
Attempts:
2 left
💡 Hint
Recall the roles of NameNode and DataNodes in HDFS architecture.
✗ Incorrect
The NameNode provides metadata and block locations; the client reads blocks directly from multiple DataNodes for efficiency.