0
0
Apache Sparkdata~10 mins

Unit testing Spark transformations in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a Spark DataFrame from a list of tuples.

Apache Spark
df = spark.createDataFrame([1], ['name', 'age'])
Drag options to blanks, or click blank then click option'
A[('Alice', 30), ('Bob', 25)]
B['name', 'age']
C('Alice', 30, 'Bob', 25)
D{'Alice': 30, 'Bob': 25}
Attempts:
3 left
💡 Hint
Common Mistakes
Using a dictionary instead of a list of tuples.
Passing column names as data.
2fill in blank
medium

Complete the code to filter rows where age is greater than 25.

Apache Spark
filtered_df = df.filter(df.age [1] 25)
Drag options to blanks, or click blank then click option'
A==
B>
C<=
D<
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>'
Using '==' which filters only age equal to 25.
3fill in blank
hard

Fix the error in the code to select the 'name' column as a list.

Apache Spark
names = df.select([1]).rdd.flatMap(lambda x: x).collect()
Drag options to blanks, or click blank then click option'
A'age'
Bdf.name
C'name'
Dname
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the column object instead of string.
Using variable name without quotes.
4fill in blank
hard

Fill both blanks to create a dictionary of names and ages for people older than 20.

Apache Spark
result = [1].rdd.filter(lambda x: x.[2] > 20).collectAsMap()
Drag options to blanks, or click blank then click option'
Adf.select('name', 'age')
Bdf.select('age')
Cage
Dname
Attempts:
3 left
💡 Hint
Common Mistakes
Selecting only one column.
Using wrong attribute in lambda.
5fill in blank
hard

Fill all three blanks to create a unit test that checks if the filtered DataFrame has the expected count.

Apache Spark
def test_filter_count():
    filtered = df.filter(df.[1] [2] [3])
    assert filtered.count() == 1
Drag options to blanks, or click blank then click option'
Aage
B>
C25
Dname
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong column name.
Using wrong comparison operator.
Using wrong value for filtering.