Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to check if the DataFrame has any null values.
Apache Spark
df.selectExpr('count(*) as total', 'count([1]) as non_null').show()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a specific column name without context causes errors.
Using 'null' as a string is invalid in selectExpr.
✗ Incorrect
Using '*' counts all rows, so comparing total rows with non-null counts helps detect nulls.
2fill in blank
mediumComplete the code to assert that the column 'age' has no null values.
Apache Spark
assert df.filter(df.age.[1](None)).count() == 0, 'Null values found in age column'
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using isNotNull() filters non-null rows, which is opposite of the goal.
Using isnan() is for NaN values, not nulls.
✗ Incorrect
Filtering rows where 'age' is null and asserting count is zero ensures no nulls.
3fill in blank
hardFix the error in the code to assert that all values in 'salary' are positive.
Apache Spark
assert df.filter(df.salary [1] 0).count() == 0, 'Negative or zero salary found'
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '>=' filters salaries greater or equal to zero, which is incorrect here.
Using '>' misses zero values.
✗ Incorrect
Filtering salaries less than or equal to zero and asserting count zero ensures all are positive.
4fill in blank
hardFill both blanks to create a dictionary of counts for each unique value in 'department' where count is greater than 5.
Apache Spark
dept_counts = {row['[1]']: row['[2]'] for row in df.groupBy('department').count().collect() if row['count'] > 5} Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using incorrect keys like 'dept' or 'value' which don't exist in rows.
Confusing 'count' with other column names.
✗ Incorrect
Use 'department' as key and 'count' as value to build the dictionary.
5fill in blank
hardFill all three blanks to create a filtered DataFrame with no nulls in 'email' and 'phone' columns and only rows where 'age' is greater than 18.
Apache Spark
filtered_df = df.filter(df.email.[1]() & df.phone.[2]() & (df.age [3] 18))
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using isNull() instead of isNotNull() includes nulls.
Using '<' instead of '>' filters wrong age range.
✗ Incorrect
Use isNotNull() to filter non-null emails and phones, and '>' to filter age > 18.