0
0
Apache Sparkdata~10 mins

Understanding the Catalyst optimizer in Apache Spark - Interactive Quiz & Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a Spark DataFrame from a list of tuples.

Apache Spark
df = spark.createDataFrame([1], ['name', 'age'])
Drag options to blanks, or click blank then click option'
A('Alice', 30), ('Bob', 25)
B{'Alice': 30, 'Bob': 25}
C[('Alice', 30), ('Bob', 25)]
D['name', 'age']
Attempts:
3 left
💡 Hint
Common Mistakes
Using a dictionary instead of a list of tuples.
Passing column names as data.
2fill in blank
medium

Complete the code to select the 'age' column from the DataFrame.

Apache Spark
ages = df.select([1])
Drag options to blanks, or click blank then click option'
A'name'
B'age'
Cage
Ddf.age
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the column without quotes.
Selecting the wrong column name.
3fill in blank
hard

Fix the error in the code to filter rows where age is greater than 25.

Apache Spark
filtered_df = df.filter(df.age [1] 25)
Drag options to blanks, or click blank then click option'
A>
B<
C==
D<=
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>'.
Using '==' which filters only age equal to 25.
4fill in blank
hard

Fill both blanks to create a dictionary comprehension that maps names to ages for people older than 25.

Apache Spark
{ [1]: [2] for row in filtered_df.collect() }
Drag options to blanks, or click blank then click option'
Arow.name
Brow.age
Crow
Drow['age']
Attempts:
3 left
💡 Hint
Common Mistakes
Using the whole row as key or value.
Using incorrect attribute names.
5fill in blank
hard

Fill all three blanks to create a new DataFrame with an added column 'age_plus_5' that adds 5 to the age.

Apache Spark
new_df = df.withColumn('[1]', df.[2] [3] 5)
Drag options to blanks, or click blank then click option'
Aage_plus_5
Bage
C+
D-
Attempts:
3 left
💡 Hint
Common Mistakes
Using subtraction instead of addition.
Wrong column names.