Challenge - 5 Problems
Data Aggregation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of groupby aggregation with multiple functions
What is the output of this code snippet that groups data by 'Category' and applies multiple aggregation functions?
Pandas
import pandas as pd data = pd.DataFrame({ 'Category': ['A', 'A', 'B', 'B', 'C'], 'Value': [10, 20, 10, 30, 40] }) result = data.groupby('Category').agg({'Value': ['sum', 'mean']}) print(result)
Attempts:
2 left
💡 Hint
Remember that sum adds all values per group and mean calculates the average.
✗ Incorrect
The groupby groups rows by 'Category'. For each group, sum adds values and mean calculates average. For 'A', sum is 10+20=30 and mean is 15.0. For 'B', sum is 10+30=40 and mean is 20.0. For 'C', only one value 40, so sum and mean are 40.
❓ data_output
intermediate2:00remaining
Number of groups after filtering and aggregation
Given this DataFrame, how many groups remain after grouping by 'Type' and filtering groups with sum of 'Score' > 50?
Pandas
import pandas as pd data = pd.DataFrame({ 'Type': ['X', 'X', 'Y', 'Y', 'Z', 'Z'], 'Score': [30, 25, 10, 15, 40, 20] }) filtered = data.groupby('Type').filter(lambda x: x['Score'].sum() > 50) groups = filtered.groupby('Type').ngroups print(groups)
Attempts:
2 left
💡 Hint
Calculate sum of 'Score' per 'Type' and check which sums are greater than 50.
✗ Incorrect
Sum per 'Type': X=55, Y=25, Z=60. Only groups X and Z have sum > 50. So after filtering, 2 groups remain.
🔧 Debug
advanced2:00remaining
Identify the error in aggregation code
What error does this code raise when trying to aggregate data?
Pandas
import pandas as pd data = pd.DataFrame({ 'Group': ['A', 'A', 'B'], 'Value': [1, 2, 3] }) result = data.groupby('Group').agg({'Value': 'sum', 'NonExistent': 'mean'}) print(result)
Attempts:
2 left
💡 Hint
Check if all columns in aggregation dictionary exist in the DataFrame.
✗ Incorrect
The aggregation dictionary includes 'NonExistent' column which is not in the DataFrame, causing a KeyError.
❓ visualization
advanced2:00remaining
Correct plot for aggregated data
Which option shows the correct bar plot code to visualize the sum of 'Sales' per 'Region' from this DataFrame?
Pandas
import pandas as pd import matplotlib.pyplot as plt data = pd.DataFrame({ 'Region': ['North', 'South', 'East', 'West', 'North', 'South'], 'Sales': [100, 150, 200, 130, 120, 170] }) agg_data = data.groupby('Region')['Sales'].sum()
Attempts:
2 left
💡 Hint
Remember bar plot x-axis is categories, y-axis is values.
✗ Incorrect
agg_data is a Series with index as 'Region' and values as sum of 'Sales'. plt.bar needs x as categories (index) and height as values.
🚀 Application
expert3:00remaining
Calculate weighted average after aggregation
Given this DataFrame, which code correctly calculates the weighted average 'Score' per 'Class' using 'Weight' as weights?
Pandas
import pandas as pd data = pd.DataFrame({ 'Class': ['X', 'X', 'Y', 'Y', 'Y'], 'Score': [80, 90, 70, 60, 75], 'Weight': [1, 3, 2, 1, 2] })
Attempts:
2 left
💡 Hint
Weighted average is sum of (value * weight) divided by sum of weights.
✗ Incorrect
Option A correctly computes weighted average per group by multiplying 'Score' and 'Weight' per row, summing, then dividing by sum of weights in that group.