0
0
Apache Sparkdata~10 mins

Window functions in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the window function module in PySpark.

Apache Spark
from pyspark.sql import [1]
Drag options to blanks, or click blank then click option'
Asql
Bwindow
Cfunctions
DWindow
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'functions' instead of 'Window' for window specifications.
Importing 'window' in lowercase which does not exist.
2fill in blank
medium

Complete the code to create a window specification partitioned by 'department'.

Apache Spark
from pyspark.sql import Window
windowSpec = Window.partitionBy([1])
Drag options to blanks, or click blank then click option'
A'department'
B'salary'
C'age'
D'name'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a numeric column like 'salary' instead of a categorical one.
Not using quotes around the column name.
3fill in blank
hard

Fix the error in the code to calculate the row number over the window specification.

Apache Spark
from pyspark.sql.functions import row_number
result = df.withColumn('row_num', row_number().over([1]))
Drag options to blanks, or click blank then click option'
Awindow
BwindowSpec
CWindow
Dwindow_spec
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'Window' class instead of the window specification instance.
Using lowercase 'window' which is undefined.
4fill in blank
hard

Fill both blanks to create a window specification partitioned by 'department' and ordered by 'salary' descending.

Apache Spark
windowSpec = Window.partitionBy([1]).orderBy(col([2]).desc())
Drag options to blanks, or click blank then click option'
A'department'
B'salary'
C'age'
D'name'
Attempts:
3 left
💡 Hint
Common Mistakes
Ordering by a non-numeric column like 'name'.
Not using the correct column names as strings.
5fill in blank
hard

Fill all three blanks to calculate the cumulative sum of 'sales' partitioned by 'region' and ordered by 'date'.

Apache Spark
from pyspark.sql.functions import sum
windowSpec = Window.partitionBy([1]).orderBy([2])
cum_sum = sum([3]).over(windowSpec)
Drag options to blanks, or click blank then click option'
A'region'
B'date'
C'sales'
D'profit'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'profit' instead of 'sales' for the sum.
Mixing up partition and order columns.