Complete the code to read data from a source into the ingestion pipeline.
data = spark.read.format([1]).load("/source/data")
The ingestion pipeline reads data in JSON format from the source.
Complete the code to write data into the data lake in parquet format.
data.write.format([1]).save("/data_lake/raw")
Parquet is a columnar storage format optimized for big data and commonly used in data lakes.
Fix the error in the code to append new data to the existing data lake folder.
data.write.mode([1]).format("parquet").save("/data_lake/raw")
Using 'append' mode adds new data without deleting existing data in the data lake.
Fill both blanks to filter data before saving to the data lake.
filtered_data = data.filter(data.[1] [2] 100) filtered_data.write.format("parquet").save("/data_lake/filtered")
The code filters rows where the 'score' column is greater than 100 before saving.
Fill all three blanks to create a dictionary of word lengths for words longer than 3 characters.
lengths = { [1]: [2] for [3] in words if len([3]) > 3 }The dictionary comprehension maps each word to its length for words longer than 3 characters.