Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to read a Parquet file into a Spark DataFrame.

Apache Spark

df = spark.read.[1]("data/sample.parquet")

Drag options to blanks, or click blank then click option'

Acsv

Btext

Cjson

Dparquet

Attempts:

3 left

2fill in blank

medium

Complete the code to write a DataFrame to Parquet format with overwrite mode.

Apache Spark

df.write.mode("[1]").parquet("output/path")

Drag options to blanks, or click blank then click option'

Aoverwrite

Bignore

Cerror

Dappend

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to select only the 'name' and 'age' columns from a Parquet DataFrame.

Apache Spark

selected_df = df.select([1])

Drag options to blanks, or click blank then click option'

A"name", "age"

Bname, age

C["name", "age"]

D['name', 'age']

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a dictionary comprehension that maps each column name to its data type from a DataFrame schema.

Apache Spark

col_types = {col.name: col.[1] for col in df.schema.[2]

Drag options to blanks, or click blank then click option'

AdataType

Bfields

Ccolumns

Dtypes

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to filter rows where the 'age' column is greater than 30 and select 'name' and 'age' columns.

Apache Spark

filtered_df = df.filter(df.[1] [2] [3]).select("name", "age")

Drag options to blanks, or click blank then click option'

Aage

C30

Dage > 30

Attempts:

3 left