Recall & Review
beginner
What does the
groupBy() function do in Apache Spark?It groups rows in a DataFrame based on one or more columns, so you can perform aggregations on each group separately.
Click to reveal answer
beginner
Name three common aggregation functions used after
groupBy() in Spark.Common aggregation functions are
count(), sum(), and avg() (average).Click to reveal answer
beginner
How do you calculate the average of a column named
sales after grouping by region?Use
df.groupBy('region').avg('sales') to get the average sales per region.Click to reveal answer
intermediate
What is the difference between
agg() and direct aggregation functions like sum() after groupBy()?agg() lets you apply multiple aggregation functions at once, while direct functions like sum() apply a single aggregation.Click to reveal answer
beginner
Why is grouping and aggregation useful in real life?
It helps summarize large data sets, like finding total sales per store or average temperature per city, making data easier to understand.
Click to reveal answer
What does
df.groupBy('category').count() do?✗ Incorrect
The
count() after groupBy() counts rows per group.Which function calculates the sum of a column after grouping?
✗ Incorrect
sum() adds up values in each group.How do you apply multiple aggregations like sum and average together?
✗ Incorrect
The
agg() function allows multiple aggregations at once.What type of object does
groupBy() return?✗ Incorrect
groupBy() returns a GroupedData object to apply aggregations.Which of these is NOT an aggregation function in Spark?
✗ Incorrect
filter() is for filtering rows, not aggregation.Explain how you would use
groupBy() and aggregation functions to find the total sales per product category.Think about grouping by category and then summing sales.
You got /3 concepts.
Describe the difference between using
agg() and direct aggregation functions after groupBy().Consider when you want one vs many summary stats.
You got /3 concepts.