Challenge - 5 Problems

🎖️

HDFS Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this HDFS write command?

Consider the following Hadoop shell command to write a local file to HDFS:

hdfs dfs -put /local/path/file.txt /user/hadoop/

What will be the result if the file already exists in the destination path?

AThe command fails with an error saying the file already exists.

BThe command overwrites the existing file without warning.

CThe command appends the new file content to the existing file.

DThe command creates a duplicate file with a different name.

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

What data is read by this HDFS command?

Given the command:

hdfs dfs -cat /user/hadoop/data/sample.txt

What will be the output?

AThe entire content of the file sample.txt printed to the console.

BOnly the first 10 lines of sample.txt.

CMetadata information about sample.txt but not the content.

DThe command will copy sample.txt to the local file system.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does this HDFS read command fail?

You run:

hdfs dfs -text /user/hadoop/data/sample.gz

But get an error:

gzip: stdin: not in gzip format

What is the most likely cause?

AThe file path is incorrect and points to a directory.

BThe HDFS cluster is down and cannot read files.

CThe command requires root privileges to read compressed files.

DThe file sample.gz is not actually compressed with gzip format.

Attempts:

2 left

🚀 Application

advanced

2:00remaining

How to efficiently write large data to HDFS in a Spark job?

You have a Spark job that generates a large DataFrame. You want to save it to HDFS as a single compressed file. Which approach is best?

AWrite the DataFrame as multiple files and then merge them manually in HDFS.

BUse DataFrame's write method with coalesce(1) and compression option.

CSave the DataFrame without compression to speed up the write.

DWrite the DataFrame to local disk first, then upload to HDFS.

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

What happens internally during an HDFS read operation?

When a client reads a file from HDFS, which sequence best describes the internal process?

AClient requests data from DataNodes, which coordinate with NameNode to send blocks sequentially.

BClient reads the entire file from the NameNode, which streams data from DataNodes.

CClient contacts NameNode for block locations, then reads blocks directly from DataNodes in parallel.

DClient downloads the file from a single DataNode chosen by the NameNode.

Attempts:

2 left