Challenge - 5 Problems
Aggregation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of group aggregation with multiple functions
What is the output of the following code snippet that groups data by 'Category' and calculates the mean and max of 'Value'?
Data Analysis Python
import pandas as pd data = pd.DataFrame({ 'Category': ['A', 'A', 'B', 'B', 'C'], 'Value': [10, 20, 10, 30, 40] }) result = data.groupby('Category')['Value'].agg(['mean', 'max']).reset_index() print(result)
Attempts:
2 left
💡 Hint
Remember that mean is the average and max is the highest value in each group.
✗ Incorrect
The group 'A' has values 10 and 20, so mean is (10+20)/2=15 and max is 20. Group 'B' has 10 and 30, mean is 20 and max is 30. Group 'C' has one value 40, so mean and max are both 40.
❓ data_output
intermediate2:00remaining
Number of unique users per product
Given a DataFrame with user purchases, what is the number of unique users who bought each product?
Data Analysis Python
import pandas as pd data = pd.DataFrame({ 'User': ['Alice', 'Bob', 'Alice', 'David', 'Bob', 'Eve'], 'Product': ['X', 'X', 'Y', 'Y', 'X', 'Z'] }) result = data.groupby('Product')['User'].nunique().reset_index(name='UniqueUsers') print(result)
Attempts:
2 left
💡 Hint
Count distinct users per product.
✗ Incorrect
Product 'X' was bought by Alice and Bob (2 unique users). Product 'Y' was bought by Alice and David (2 unique users). Product 'Z' was bought by Eve (1 unique user).
❓ visualization
advanced2:30remaining
Visualizing average sales per region
Which plot correctly shows the average sales per region from the given data?
Data Analysis Python
import pandas as pd import matplotlib.pyplot as plt data = pd.DataFrame({ 'Region': ['North', 'South', 'East', 'West', 'North', 'South'], 'Sales': [100, 150, 200, 130, 120, 170] }) avg_sales = data.groupby('Region')['Sales'].mean().reset_index() plt.bar(avg_sales['Region'], avg_sales['Sales']) plt.title('Average Sales per Region') plt.xlabel('Region') plt.ylabel('Average Sales') plt.show()
Attempts:
2 left
💡 Hint
Look for a bar chart showing average values per category.
✗ Incorrect
The code groups sales by region and calculates mean, then plots a bar chart with those averages. The heights correspond to the average sales per region.
🔧 Debug
advanced2:00remaining
Identify the error in aggregation code
What error will the following code raise when executed?
Data Analysis Python
import pandas as pd data = pd.DataFrame({ 'Category': ['A', 'B', 'A'], 'Value': [10, 20, 30] }) result = data.groupby('Category')['Value'].agg('sum', 'mean') print(result)
Attempts:
2 left
💡 Hint
Check how multiple aggregation functions are passed to agg().
✗ Incorrect
agg() expects a single function or a list/tuple of functions. Passing multiple positional arguments causes a TypeError.
🚀 Application
expert3:00remaining
Calculate weighted average feature per group
You have a DataFrame with 'Group', 'Value', and 'Weight' columns. Which code correctly calculates the weighted average of 'Value' per 'Group' using 'Weight' as weights?
Data Analysis Python
import pandas as pd data = pd.DataFrame({ 'Group': ['X', 'X', 'Y', 'Y', 'Y'], 'Value': [10, 20, 30, 40, 50], 'Weight': [1, 3, 2, 1, 1] })
Attempts:
2 left
💡 Hint
Weighted average is sum of value*weight divided by sum of weights per group.
✗ Incorrect
Option D correctly calculates weighted average by multiplying 'Value' and 'Weight' per group, summing, then dividing by sum of weights per group. Option D incorrectly uses full data weights, not group weights. Option D is simple mean, ignoring weights. Option D multiplies means, which is incorrect.