Challenge - 5 Problems
GroupBy Transform Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of group normalization using transform
What is the output of this code that normalizes values within each group by subtracting the group mean?
Pandas
import pandas as pd df = pd.DataFrame({ 'group': ['A', 'A', 'B', 'B', 'B'], 'value': [10, 20, 30, 40, 50] }) df['norm'] = df.groupby('group')['value'].transform(lambda x: x - x.mean()) print(df)
Attempts:
2 left
💡 Hint
Think about how subtracting the mean affects each value within its group.
✗ Incorrect
The transform subtracts the mean of each group from each value, so values in group A (10,20) become (10-15, 20-15) = (-5,5), and in group B (30,40,50) become (30-40, 40-40, 50-40) = (-10,0,10).
❓ data_output
intermediate1:30remaining
Number of normalized values above zero per group
After normalizing values within groups by subtracting the group mean, how many values in group 'B' are greater than zero?
Pandas
import pandas as pd df = pd.DataFrame({ 'group': ['A', 'A', 'B', 'B', 'B'], 'value': [5, 15, 10, 20, 30] }) df['norm'] = df.groupby('group')['value'].transform(lambda x: x - x.mean()) count = df[(df['group'] == 'B') & (df['norm'] > 0)].shape[0] print(count)
Attempts:
2 left
💡 Hint
Calculate the mean of group B and count values above it.
✗ Incorrect
Group B values are 10, 20, 30 with mean 20. Normalized values are -10, 0, 10. Values greater than zero are only 10, so count is 1. But 0 is not greater than zero. So only one value > 0. Wait, check carefully: 10,20,30 mean 20, norm = value - 20: 10-20=-10, 20-20=0, 30-20=10. Only one value > 0 (30). So correct answer is 1.
🔧 Debug
advanced1:30remaining
Identify the error in group normalization code
What error does this code raise when trying to normalize values by group?
Pandas
import pandas as pd df = pd.DataFrame({ 'group': ['X', 'X', 'Y'], 'value': [1, 2, 3] }) df['norm'] = df.groupby('group')['value'].transform(lambda x: x / x.mean()) print(df)
Attempts:
2 left
💡 Hint
Check if dividing by mean is valid for these values.
✗ Incorrect
Dividing each value by the mean of its group is valid and does not raise an error. The code outputs normalized values.
🚀 Application
advanced2:00remaining
Apply group normalization with standard deviation scaling
Which code snippet correctly normalizes 'score' within each 'team' by subtracting the mean and dividing by the standard deviation?
Attempts:
2 left
💡 Hint
Remember transform keeps the original index and length.
✗ Incorrect
Option C uses transform with a lambda that subtracts mean and divides by std per group, preserving the original DataFrame shape. Option C returns a Series with group indices only, causing misalignment. Option C tries to subtract and divide Series of different lengths without alignment. Option C divides by std but does not subtract mean, so not normalized.
🧠 Conceptual
expert1:30remaining
Why use transform instead of apply for group normalization?
What is the main reason to use groupby with transform instead of apply when normalizing data within groups?
Attempts:
2 left
💡 Hint
Think about the shape and alignment of the output from each method.
✗ Incorrect
Transform returns a Series with the same index as the original DataFrame, so it can be assigned directly as a new column. Apply returns a reduced or differently indexed result, which may not align for direct assignment.