0
0
Apache Sparkdata~5 mins

GroupBy and aggregations in Apache Spark - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does the groupBy() function do in Apache Spark?
It groups rows in a DataFrame based on one or more columns, so you can perform aggregations on each group separately.
Click to reveal answer
beginner
Name three common aggregation functions used after groupBy() in Spark.
Common aggregation functions are count(), sum(), and avg() (average).
Click to reveal answer
beginner
How do you calculate the average of a column named sales after grouping by region?
Use df.groupBy('region').avg('sales') to get the average sales per region.
Click to reveal answer
intermediate
What is the difference between agg() and direct aggregation functions like sum() after groupBy()?
agg() lets you apply multiple aggregation functions at once, while direct functions like sum() apply a single aggregation.
Click to reveal answer
beginner
Why is grouping and aggregation useful in real life?
It helps summarize large data sets, like finding total sales per store or average temperature per city, making data easier to understand.
Click to reveal answer
What does df.groupBy('category').count() do?
ACounts the number of columns in the DataFrame
BCounts the total number of categories
CCounts the number of unique values in the DataFrame
DCounts the number of rows in each category group
Which function calculates the sum of a column after grouping?
Asum()
Bavg()
Ccount()
Dmax()
How do you apply multiple aggregations like sum and average together?
AUse multiple groupBy() calls
BUse agg() with a dictionary of functions
CUse count() twice
DUse select() only
What type of object does groupBy() return?
AGroupedData
BDataFrame
CRDD
DList
Which of these is NOT an aggregation function in Spark?
Amax()
Bmin()
Cfilter()
Davg()
Explain how you would use groupBy() and aggregation functions to find the total sales per product category.
Think about grouping by category and then summing sales.
You got /3 concepts.
    Describe the difference between using agg() and direct aggregation functions after groupBy().
    Consider when you want one vs many summary stats.
    You got /3 concepts.