0
0
Hadoopdata~10 mins

HDFS read and write operations in Hadoop - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to write data to HDFS using the Hadoop FileSystem API.

Hadoop
fs = FileSystem.get(conf)
output = fs.create(Path("/user/hadoop/[1]"))
output.write(bytes("Hello HDFS", "utf-8"))
output.close()
Drag options to blanks, or click blank then click option'
Ainput.txt
Bdata.csv
Creadme.md
Doutput.txt
Attempts:
3 left
💡 Hint
Common Mistakes
Using an input file name when writing data.
Forgetting to close the output stream.
2fill in blank
medium

Complete the code to read data from HDFS using the Hadoop FileSystem API.

Hadoop
fs = FileSystem.get(conf)
input = fs.open(Path("/user/hadoop/[1]"))
data = input.readLine()
input.close()
Drag options to blanks, or click blank then click option'
Aoutput.txt
Blog.txt
Cinput.txt
Dconfig.xml
Attempts:
3 left
💡 Hint
Common Mistakes
Using an output file name when reading data.
Not closing the input stream after reading.
3fill in blank
hard

Fix the error in the code to properly write a string to HDFS.

Hadoop
fs = FileSystem.get(conf)
output = fs.create(Path("/user/hadoop/output.txt"))
output.write([1]("Hello HDFS", "utf-8"))
output.close()
Drag options to blanks, or click blank then click option'
Astr.encode
Bbytes
Cencode
DtoBytes
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'str.encode' which is not a standalone function.
Using 'encode' without calling it on the string.
Using a non-existent function 'toBytes'.
4fill in blank
hard

Fill both blanks to create a dictionary comprehension that maps file names to their sizes in HDFS for files larger than 1000 bytes.

Hadoop
file_sizes = {file.getName(): file.getLen() for file in fs.listStatus(Path("/user/hadoop")) if file.getLen() [1] [2]
Drag options to blanks, or click blank then click option'
A>
B1000
C<
D500
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>' for filtering larger files.
Using a smaller size threshold like 500 instead of 1000.
5fill in blank
hard

Fill all three blanks to create a dictionary comprehension that maps file names to their modification times for files modified after 2023-01-01.

Hadoop
from datetime import datetime
threshold = datetime(2023, 1, 1)
file_mod_times = {file.getName(): datetime.fromtimestamp(file.getModificationTime() / 1000) for file in fs.listStatus(Path("/user/hadoop")) if datetime.fromtimestamp(file.getModificationTime() / 1000) [1] threshold and file.getName() [2] [3]
Drag options to blanks, or click blank then click option'
A>
B!=
C".tmp"
D==
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>' for date comparison.
Using '==' instead of '!=' to exclude files.
Using wrong string for file exclusion.