0
0
Hadoopdata~10 mins

Why data lake architecture centralizes data in Hadoop - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read data from a Hadoop data lake using Spark.

Hadoop
df = spark.read.format([1]).load("/data/lake/path")
Drag options to blanks, or click blank then click option'
Acsv
Bjson
Cxml
Dparquet
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing a format not optimized for big data like XML.
2fill in blank
medium

Complete the code to filter data for a specific year in the data lake.

Hadoop
filtered_df = df.filter(df.year == [1])
Drag options to blanks, or click blank then click option'
A2020
B"2020"
C'2020'
Dyear
Attempts:
3 left
💡 Hint
Common Mistakes
Using quotes around the year number causing type mismatch.
3fill in blank
hard

Fix the error in the code to write data back to the data lake in parquet format.

Hadoop
filtered_df.write.mode([1]).format("parquet").save("/data/lake/output")
Drag options to blanks, or click blank then click option'
Aappend
Badd
Cinsert
Dupdate
Attempts:
3 left
💡 Hint
Common Mistakes
Using invalid modes like 'add' or 'update' which cause errors.
4fill in blank
hard

Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.

Hadoop
lengths = {word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Alen(word)
B>
C<
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>' causing wrong filtering.
5fill in blank
hard

Fill all three blanks to create a filtered dictionary with uppercase keys and values greater than 0.

Hadoop
result = [1]: [2] for k, v in data.items() if v [3] 0}
Drag options to blanks, or click blank then click option'
Ak.upper()
Bv
C>
Dk.lower()
Attempts:
3 left
💡 Hint
Common Mistakes
Using k.lower() instead of k.upper().
Using '<' instead of '>'.