Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a Spark DataFrame from a list of tuples.
Apache Spark
df = spark.createDataFrame([1], ['name', 'age'])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a dictionary instead of a list of tuples.
Passing column names as data.
✗ Incorrect
To create a DataFrame, you need a list of tuples representing rows.
2fill in blank
mediumComplete the code to filter rows where age is greater than 25.
Apache Spark
filtered_df = df.filter(df.age [1] 25)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>'
Using '==' which filters only age equal to 25.
✗ Incorrect
The filter condition should select rows with age greater than 25.
3fill in blank
hardFix the error in the code to select the 'name' column as a list.
Apache Spark
names = df.select([1]).rdd.flatMap(lambda x: x).collect() Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the column object instead of string.
Using variable name without quotes.
✗ Incorrect
The select method requires the column name as a string.
4fill in blank
hardFill both blanks to create a dictionary of names and ages for people older than 20.
Apache Spark
result = [1].rdd.filter(lambda x: x.[2] > 20).collectAsMap()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Selecting only one column.
Using wrong attribute in lambda.
✗ Incorrect
Select both 'name' and 'age' columns, then filter by 'age' attribute in lambda.
5fill in blank
hardFill all three blanks to create a unit test that checks if the filtered DataFrame has the expected count.
Apache Spark
def test_filter_count(): filtered = df.filter(df.[1] [2] [3]) assert filtered.count() == 1
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong column name.
Using wrong comparison operator.
Using wrong value for filtering.
✗ Incorrect
The test filters rows where age is greater than 25 and asserts the count is 1.