0
0
Apache Sparkdata~10 mins

GroupBy and aggregations in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to group the DataFrame by the 'category' column.

Apache Spark
grouped_df = df.[1]('category')
Drag options to blanks, or click blank then click option'
AgroupBy
Bselect
Cfilter
DorderBy
Attempts:
3 left
💡 Hint
Common Mistakes
Using select instead of groupBy
Using filter which filters rows, not groups
Using orderBy which sorts data
2fill in blank
medium

Complete the code to calculate the average of the 'sales' column after grouping.

Apache Spark
result = df.groupBy('category').[1]('sales')
Drag options to blanks, or click blank then click option'
Acount
Bmax
Cavg
Dmin
Attempts:
3 left
💡 Hint
Common Mistakes
Using count which counts rows
Using max or min which find extremes
3fill in blank
hard

Fix the error in the code to sum the 'quantity' column after grouping by 'product'.

Apache Spark
total_quantity = df.groupBy('product').[1]('quantity')
Drag options to blanks, or click blank then click option'
Asum
Baverage
Csummation
Dtotal
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'average' instead of 'sum'
Using non-existent methods like 'summation' or 'total'
4fill in blank
hard

Fill both blanks to create a dictionary for aggregation: count 'id' and max 'price'.

Apache Spark
agg_result = df.groupBy('store').agg([1]: 'count', [2]: 'max'})
Drag options to blanks, or click blank then click option'
A'id'
B'price'
C'quantity'
D'sales'
Attempts:
3 left
💡 Hint
Common Mistakes
Swapping the columns for count and max
Using columns not present in the DataFrame
5fill in blank
hard

Fill all three blanks to create a dictionary for aggregation: sum 'quantity', avg 'sales', and min 'discount'.

Apache Spark
agg_df = df.groupBy('region').agg([1]: 'sum', [2]: [3], 'discount': 'min'})
Drag options to blanks, or click blank then click option'
A'quantity'
B'sales'
C'avg'
D'sum'
Attempts:
3 left
💡 Hint
Common Mistakes
Mixing up aggregation functions and column names
Using 'sum' instead of 'avg' for sales