0
0
Data Analysis Pythondata~10 mins

filter() for group-level filtering in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - filter() for group-level filtering
Start with DataFrame
Group data by key
Apply filter function to each group
Keep groups where filter returns True
Combine filtered groups into new DataFrame
Result: Filtered grouped data
We start with data, group it by a key, then apply a filter function to each group. Only groups passing the filter stay in the result.
Execution Sample
Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'Team': ['A', 'A', 'B', 'B', 'C'],
    'Points': [10, 15, 5, 7, 20]
})

filtered = df.groupby('Team').filter(lambda g: g['Points'].sum() > 20)
print(filtered)
This code groups data by 'Team' and keeps only teams with total Points over 20.
Execution Table
StepGroupGroup PointsSum PointsFilter Condition (sum > 20)Keep Group?
1A[10, 15]2525 > 20 is TrueYes
2B[5, 7]1212 > 20 is FalseNo
3C[20]2020 > 20 is FalseNo
4Combine kept groups---Only group A kept
💡 All groups checked; only group A meets condition and is kept.
Variable Tracker
VariableStartAfter Group AAfter Group BAfter Group CFinal
filteredemptycontains rows of Team Ano changeno changeDataFrame with Team A rows only
Key Moments - 2 Insights
Why does group C get removed even though it has 20 points?
Because the filter condition is sum > 20, not >= 20. Group C's sum is exactly 20, so it fails the condition (see execution_table row 3).
Does filter() keep individual rows or whole groups?
filter() keeps or removes entire groups based on the condition applied to the group as a whole, not individual rows (see execution_table rows 1-3).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the sum of points for group B at step 2?
A7
B12
C5
D20
💡 Hint
Check the 'Sum Points' column in execution_table row 2.
At which step does the filter condition evaluate to False for group C?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at the 'Filter Condition' column for group C in execution_table.
If the filter condition changed to sum >= 20, which groups would be kept?
AGroups A and C
BGroups B and C
COnly group A
DAll groups
💡 Hint
Compare sum points and condition in execution_table rows 1 and 3.
Concept Snapshot
filter() for group-level filtering:
- Use df.groupby('key').filter(func)
- func gets each group DataFrame
- Return True to keep group, False to drop
- Keeps whole groups, not rows
- Useful to filter groups by aggregate stats
Full Transcript
We start with a DataFrame and group it by a key column. Then, we apply a filter function to each group. This function checks a condition on the group, like if the sum of a column is greater than a value. Groups passing the condition are kept whole; others are removed. For example, grouping by 'Team' and filtering teams with total points over 20 keeps only those teams. The filter function returns True or False per group, not per row. This helps us keep or remove entire groups based on group-level statistics.