0
0
Hadoopdata~20 mins

HDFS read and write operations in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
HDFS Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this HDFS write command?
Consider the following Hadoop shell command to write a local file to HDFS:

hdfs dfs -put /local/path/file.txt /user/hadoop/

What will be the result if the file already exists in the destination path?
AThe command fails with an error saying the file already exists.
BThe command overwrites the existing file without warning.
CThe command appends the new file content to the existing file.
DThe command creates a duplicate file with a different name.
Attempts:
2 left
💡 Hint
Think about how HDFS handles file overwrites by default.
data_output
intermediate
2:00remaining
What data is read by this HDFS command?
Given the command:

hdfs dfs -cat /user/hadoop/data/sample.txt

What will be the output?
AThe entire content of the file sample.txt printed to the console.
BOnly the first 10 lines of sample.txt.
CMetadata information about sample.txt but not the content.
DThe command will copy sample.txt to the local file system.
Attempts:
2 left
💡 Hint
The '-cat' option is similar to the Unix 'cat' command.
🔧 Debug
advanced
2:00remaining
Why does this HDFS read command fail?
You run:

hdfs dfs -text /user/hadoop/data/sample.gz

But get an error:

gzip: stdin: not in gzip format

What is the most likely cause?
AThe file path is incorrect and points to a directory.
BThe HDFS cluster is down and cannot read files.
CThe command requires root privileges to read compressed files.
DThe file sample.gz is not actually compressed with gzip format.
Attempts:
2 left
💡 Hint
Check the file format and compression type.
🚀 Application
advanced
2:00remaining
How to efficiently write large data to HDFS in a Spark job?
You have a Spark job that generates a large DataFrame. You want to save it to HDFS as a single compressed file. Which approach is best?
AWrite the DataFrame as multiple files and then merge them manually in HDFS.
BUse DataFrame's write method with coalesce(1) and compression option.
CSave the DataFrame without compression to speed up the write.
DWrite the DataFrame to local disk first, then upload to HDFS.
Attempts:
2 left
💡 Hint
Think about controlling the number of output files and compression in Spark.
🧠 Conceptual
expert
3:00remaining
What happens internally during an HDFS read operation?
When a client reads a file from HDFS, which sequence best describes the internal process?
AClient requests data from DataNodes, which coordinate with NameNode to send blocks sequentially.
BClient reads the entire file from the NameNode, which streams data from DataNodes.
CClient contacts NameNode for block locations, then reads blocks directly from DataNodes in parallel.
DClient downloads the file from a single DataNode chosen by the NameNode.
Attempts:
2 left
💡 Hint
Recall the roles of NameNode and DataNodes in HDFS architecture.