Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to import the window function module in PySpark.
Apache Spark
from pyspark.sql import [1]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'functions' instead of 'Window' for window specifications.
Importing 'window' in lowercase which does not exist.
✗ Incorrect
The correct import for window functions in PySpark is 'Window' from pyspark.sql.
2fill in blank
mediumComplete the code to create a window specification partitioned by 'department'.
Apache Spark
from pyspark.sql import Window windowSpec = Window.partitionBy([1])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a numeric column like 'salary' instead of a categorical one.
Not using quotes around the column name.
✗ Incorrect
Partitioning by 'department' groups rows by department for window calculations.
3fill in blank
hardFix the error in the code to calculate the row number over the window specification.
Apache Spark
from pyspark.sql.functions import row_number result = df.withColumn('row_num', row_number().over([1]))
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'Window' class instead of the window specification instance.
Using lowercase 'window' which is undefined.
✗ Incorrect
The window specification variable is named 'windowSpec' and must be passed to 'over()'.
4fill in blank
hardFill both blanks to create a window specification partitioned by 'department' and ordered by 'salary' descending.
Apache Spark
windowSpec = Window.partitionBy([1]).orderBy(col([2]).desc())
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Ordering by a non-numeric column like 'name'.
Not using the correct column names as strings.
✗ Incorrect
Partition by 'department' and order by 'salary' descending to rank salaries within each department.
5fill in blank
hardFill all three blanks to calculate the cumulative sum of 'sales' partitioned by 'region' and ordered by 'date'.
Apache Spark
from pyspark.sql.functions import sum windowSpec = Window.partitionBy([1]).orderBy([2]) cum_sum = sum([3]).over(windowSpec)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'profit' instead of 'sales' for the sum.
Mixing up partition and order columns.
✗ Incorrect
Partition by 'region', order by 'date', and sum 'sales' cumulatively over the window.