0
0
Pandasdata~10 mins

filter() for group-level filtering in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - filter() for group-level filtering
Start with DataFrame
Group data by key
Apply filter function to each group
Keep groups where filter returns True
Combine filtered groups into new DataFrame
Result
We start with a DataFrame, group it by a key, apply a filter function to each group, keep only groups that pass the filter, and combine them into a new DataFrame.
Execution Sample
Pandas
import pandas as pd

df = pd.DataFrame({
    'Team': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Points': [10, 15, 5, 7, 20, 25]
})

filtered = df.groupby('Team').filter(lambda x: x['Points'].mean() > 10)
This code groups the DataFrame by 'Team' and keeps only teams whose average 'Points' is greater than 10.
Execution Table
StepGroupGroup PointsMean PointsFilter ResultActionOutput Rows
1A[10, 15]12.5TrueKeep groupRows with Team A
2B[5, 7]6.0FalseDrop groupNo rows from Team B
3C[20, 25]22.5TrueKeep groupRows with Team C
4Combine---Combine kept groupsRows from Team A and C
💡 All groups processed; groups B dropped because mean points <= 10
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
filteredemptyRows with Team ARows with Team ARows with Team A and CRows with Team A and C
Key Moments - 2 Insights
Why does the group 'B' disappear from the output?
Because in the execution_table row 2, the mean points for group B is 6.0, which is not greater than 10, so the filter returns False and drops group B.
Does filter() keep individual rows or whole groups?
filter() keeps or drops entire groups based on the condition applied to the group, as shown in execution_table rows 1-3 where groups are either kept or dropped entirely.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the mean points for group 'C' at step 3?
A20.0
B22.5
C25.0
D15.0
💡 Hint
Check the 'Mean Points' column in execution_table row 3.
At which step does the filter decide to drop a group?
AStep 2
BStep 3
CStep 1
DStep 4
💡 Hint
Look for 'Drop group' in the 'Action' column of execution_table.
If the filter condition changed to mean points > 5, which groups would be kept?
AOnly group A
BGroups A and B
CGroups A, B, and C
DOnly group C
💡 Hint
Check mean points values in execution_table rows 1-3 and compare to new condition.
Concept Snapshot
pandas.DataFrame.groupby('key').filter(func)
- Groups data by 'key'
- Applies func to each group
- Keeps groups where func returns True
- Returns combined filtered DataFrame
- Useful for group-level filtering based on aggregate conditions
Full Transcript
We start with a DataFrame containing teams and their points. We group the data by the 'Team' column. For each group, we calculate the average points. We use filter() to keep only groups where the average points are greater than 10. Group A has an average of 12.5, so it is kept. Group B has an average of 6.0, so it is dropped. Group C has an average of 22.5, so it is kept. The final output combines rows from groups A and C only.