Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to write data to HDFS using the Hadoop FileSystem API.
Hadoop
fs = FileSystem.get(conf) output = fs.create(Path("/user/hadoop/[1]")) output.write(bytes("Hello HDFS", "utf-8")) output.close()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using an input file name when writing data.
Forgetting to close the output stream.
✗ Incorrect
The file name 'output.txt' is commonly used for writing output data to HDFS.
2fill in blank
mediumComplete the code to read data from HDFS using the Hadoop FileSystem API.
Hadoop
fs = FileSystem.get(conf) input = fs.open(Path("/user/hadoop/[1]")) data = input.readLine() input.close()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using an output file name when reading data.
Not closing the input stream after reading.
✗ Incorrect
The file 'input.txt' is typically used as the source file to read data from HDFS.
3fill in blank
hardFix the error in the code to properly write a string to HDFS.
Hadoop
fs = FileSystem.get(conf) output = fs.create(Path("/user/hadoop/output.txt")) output.write([1]("Hello HDFS", "utf-8")) output.close()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'str.encode' which is not a standalone function.
Using 'encode' without calling it on the string.
Using a non-existent function 'toBytes'.
✗ Incorrect
The 'bytes' function converts a string to bytes, which is required for writing to HDFS output stream.
4fill in blank
hardFill both blanks to create a dictionary comprehension that maps file names to their sizes in HDFS for files larger than 1000 bytes.
Hadoop
file_sizes = {file.getName(): file.getLen() for file in fs.listStatus(Path("/user/hadoop")) if file.getLen() [1] [2] Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>' for filtering larger files.
Using a smaller size threshold like 500 instead of 1000.
✗ Incorrect
We want files larger than 1000 bytes, so the condition is file.getLen() > 1000.
5fill in blank
hardFill all three blanks to create a dictionary comprehension that maps file names to their modification times for files modified after 2023-01-01.
Hadoop
from datetime import datetime threshold = datetime(2023, 1, 1) file_mod_times = {file.getName(): datetime.fromtimestamp(file.getModificationTime() / 1000) for file in fs.listStatus(Path("/user/hadoop")) if datetime.fromtimestamp(file.getModificationTime() / 1000) [1] threshold and file.getName() [2] [3]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>' for date comparison.
Using '==' instead of '!=' to exclude files.
Using wrong string for file exclusion.
✗ Incorrect
We filter files modified after the threshold (>) and exclude temporary files by checking name not equal (!=) to '.tmp'.