Complete the code to count the number of null values in the 'age' column of the DataFrame.
null_count = df.filter(df['age'].[1]()).count()
To find null values, use the isNull() method on the Column object.
Complete the code to drop duplicate rows from the DataFrame.
df_no_duplicates = df.[1]()The dropDuplicates() method removes duplicate rows from a DataFrame.
Fix the code to filter rows where the 'salary' column is not null.
filtered_df = df.filter(df['salary'].[1]())
To filter non-null values, use isNotNull() method on the Column.
Complete the code to drop duplicate rows based on the 'name' and 'age' columns.
df_no_duplicates = df.dropDuplicates([[1], [2]])
df['name'] instead of strings.Pass a list of column names (as strings) to dropDuplicates() to remove duplicates based on those columns.
Fill all three blanks to count the number of duplicate groups based on 'name' and 'age' columns.
dupe_groups_count = df.groupBy([1], [2]).count().filter(col('[3]') > 1).count()
df['name'] instead of strings in groupBy.col() or using wrong syntax in filter.Group by columns, count rows per group, filter where count > 1, then count the duplicate groups. Assumes col is imported from pyspark.sql.functions.