0
0
Hadoopdata~10 mins

Why ingestion pipelines feed the data lake in Hadoop - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read data from a source into the ingestion pipeline.

Hadoop
data = spark.read.format([1]).load("/source/data")
Drag options to blanks, or click blank then click option'
Acsv
Bxml
Cjson
Dtxt
Attempts:
3 left
💡 Hint
Common Mistakes
Using unsupported formats like 'txt' or 'xml' without proper parsing.
2fill in blank
medium

Complete the code to write data into the data lake in parquet format.

Hadoop
data.write.format([1]).save("/data_lake/raw")
Drag options to blanks, or click blank then click option'
Acsv
Bparquet
Cjson
Dorc
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing CSV which is row-based and less efficient for big data.
3fill in blank
hard

Fix the error in the code to append new data to the existing data lake folder.

Hadoop
data.write.mode([1]).format("parquet").save("/data_lake/raw")
Drag options to blanks, or click blank then click option'
Aerror
Boverwrite
Cignore
Dappend
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'overwrite' which deletes existing data.
4fill in blank
hard

Fill both blanks to filter data before saving to the data lake.

Hadoop
filtered_data = data.filter(data.[1] [2] 100)
filtered_data.write.format("parquet").save("/data_lake/filtered")
Drag options to blanks, or click blank then click option'
Ascore
B>
C<
Dage
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong column names or operators that do not filter correctly.
5fill in blank
hard

Fill all three blanks to create a dictionary of word lengths for words longer than 3 characters.

Hadoop
lengths = { [1]: [2] for [3] in words if len([3]) > 3 }
Drag options to blanks, or click blank then click option'
Aword
Blen(word)
Dlen(words)
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong variable or length function.