Recall & Review
beginner
What are aggregation-based features in data science?
Aggregation-based features are new data columns created by summarizing or combining existing data points, like calculating averages, sums, or counts over groups of data.
Click to reveal answer
beginner
Why do we use aggregation-based features in machine learning?
We use aggregation-based features to capture group-level patterns and trends that single data points might miss, helping models learn better from the data.
Click to reveal answer
beginner
Example: What does the aggregation 'mean' represent when applied to a group of data?
The 'mean' is the average value of the data points in the group, showing the central tendency of that group.
Click to reveal answer
intermediate
How can you create aggregation-based features using Python's pandas library?
You can use pandas' groupby() function to group data by one or more columns, then apply aggregation functions like mean(), sum(), count(), etc., to create new features.
Click to reveal answer
intermediate
What is the difference between 'count' and 'nunique' aggregations?
'Count' gives the total number of non-null data points in a group, while 'nunique' counts how many unique values are in that group.
Click to reveal answer
Which of the following is NOT an aggregation function?
✗ Incorrect
Filter is used to select data, not to aggregate or summarize it.
What does the 'groupby' function in pandas do?
✗ Incorrect
Groupby splits data into groups so you can apply aggregation functions on each group.
If you want to find the total sales per store, which aggregation would you use?
✗ Incorrect
Sum adds all sales values to get the total per store.
What does the 'nunique' aggregation tell you?
✗ Incorrect
Nunique counts how many different unique values exist in the group.
Why might aggregation-based features improve a model's performance?
✗ Incorrect
Aggregation-based features help models learn from group trends and patterns.
Explain how you would create an aggregation-based feature to find the average purchase amount per customer using pandas.
Think about grouping data by customer and then calculating the average.
You got /4 concepts.
Describe the difference between 'count' and 'nunique' aggregations and give an example of when each might be useful.
Consider counting total purchases vs. counting unique products bought.
You got /4 concepts.