Challenge - 5 Problems
Master of group-level filtering
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of group filtering with filter()
What is the output DataFrame after applying the filter to keep groups with sum of values greater than 10?
Pandas
import pandas as pd df = pd.DataFrame({ 'Category': ['A', 'A', 'B', 'B', 'C', 'C'], 'Value': [4, 8, 3, 2, 7, 6] }) filtered = df.groupby('Category').filter(lambda x: x['Value'].sum() > 10) print(filtered)
Attempts:
2 left
💡 Hint
Sum the 'Value' column for each group and keep groups where sum > 10.
✗ Incorrect
Groups A and C have sums 12 and 13 respectively, which are greater than 10, so their rows are kept. Group B sum is 5, so it is removed.
❓ data_output
intermediate1:30remaining
Number of rows after group-level filter
After filtering groups where the mean of 'Score' is at least 75, how many rows remain in the DataFrame?
Pandas
import pandas as pd df = pd.DataFrame({ 'Team': ['X', 'X', 'Y', 'Y', 'Z', 'Z', 'Z'], 'Score': [80, 70, 60, 90, 75, 85, 65] }) filtered = df.groupby('Team').filter(lambda g: g['Score'].mean() >= 75) print(len(filtered))
Attempts:
2 left
💡 Hint
Calculate mean score per team and count rows of teams meeting the condition.
✗ Incorrect
Team X mean is 75, Team Y mean is 75, Team Z mean is 75. All teams have mean 75 or above, so all rows remain (7 rows). But Team Y mean is (60+90)/2=75, so all teams qualify. So 7 rows remain. The correct answer is 7.
🔧 Debug
advanced1:30remaining
Identify the error in group filtering code
What error will this code raise when run?
Pandas
import pandas as pd df = pd.DataFrame({ 'Group': ['G1', 'G1', 'G2', 'G2'], 'Value': [10, 20, 5, 15] }) filtered = df.groupby('Group').filter(lambda x: x['Value'].sum > 20) print(filtered)
Attempts:
2 left
💡 Hint
Check if sum is called as a method or accessed as an attribute.
✗ Incorrect
The code uses x['Value'].sum without parentheses, so sum is a method object, not called. Attempting to compare the method object to 20 causes TypeError: '>' not supported between instances of 'method' and 'int'.
🚀 Application
advanced2:00remaining
Filter groups with at least 3 rows and max value > 50
Which code correctly filters groups in DataFrame df where each group has at least 3 rows and the maximum 'Score' is greater than 50?
Pandas
import pandas as pd df = pd.DataFrame({ 'Category': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'], 'Score': [40, 55, 60, 30, 45, 70, 65, 50, 40] })
Attempts:
2 left
💡 Hint
Both conditions must be true: group size at least 3 and max score greater than 50.
✗ Incorrect
Option B correctly uses 'and' with len(g) >= 3 and max score > 50. Option B uses 'or' which is incorrect. Option B uses > 3 instead of >= 3 and >= 50 instead of > 50. Option B uses >= 50 instead of > 50, which changes the condition.
🧠 Conceptual
expert1:30remaining
Understanding filter() behavior on empty groups
If a group in a pandas groupby object is empty, what will be the behavior of filter() when the filtering function returns True for that group?
Attempts:
2 left
💡 Hint
Consider if empty groups have any rows to include.
✗ Incorrect
Empty groups have no rows, so even if the filter function returns True, there are no rows to include. Thus, empty groups are always excluded.