Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to read data from a source into the ingestion pipeline.

Hadoop

data = spark.read.format([1]).load("/source/data")

Drag options to blanks, or click blank then click option'

Acsv

Bxml

Cjson

Dtxt

Attempts:

3 left

2fill in blank

medium

Complete the code to write data into the data lake in parquet format.

Hadoop

data.write.format([1]).save("/data_lake/raw")

Drag options to blanks, or click blank then click option'

Acsv

Bparquet

Cjson

Dorc

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to append new data to the existing data lake folder.

Hadoop

data.write.mode([1]).format("parquet").save("/data_lake/raw")

Drag options to blanks, or click blank then click option'

Aerror

Boverwrite

Cignore

Dappend

Attempts:

3 left

4fill in blank

hard

Fill both blanks to filter data before saving to the data lake.

Hadoop

filtered_data = data.filter(data.[1] [2] 100)
filtered_data.write.format("parquet").save("/data_lake/filtered")

Drag options to blanks, or click blank then click option'

Ascore

Dage

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to create a dictionary of word lengths for words longer than 3 characters.

Hadoop

lengths = { [1]: [2] for [3] in words if len([3]) > 3 }

Drag options to blanks, or click blank then click option'

Aword

Blen(word)

Dlen(words)

Attempts:

3 left