Pandasdata~10 mins

filter() for group-level filtering in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - filter() for group-level filtering

Start with DataFrame

↓

Group data by key

↓

Apply filter function to each group

↓

Keep groups where filter returns True

↓

Combine filtered groups into new DataFrame

↓

Result

We start with a DataFrame, group it by a key, apply a filter function to each group, keep only groups that pass the filter, and combine them into a new DataFrame.

Execution Sample

Pandas

import pandas as pd

df = pd.DataFrame({
    'Team': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Points': [10, 15, 5, 7, 20, 25]
})

filtered = df.groupby('Team').filter(lambda x: x['Points'].mean() > 10)

This code groups the DataFrame by 'Team' and keeps only teams whose average 'Points' is greater than 10.

Execution Table

Step	Group	Group Points	Mean Points	Filter Result	Action	Output Rows
1	A	[10, 15]	12.5	True	Keep group	Rows with Team A
2	B	[5, 7]	6.0	False	Drop group	No rows from Team B
3	C	[20, 25]	22.5	True	Keep group	Rows with Team C
4	Combine	-	-	-	Combine kept groups	Rows from Team A and C

💡 All groups processed; groups B dropped because mean points <= 10

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	Final
filtered	empty	Rows with Team A	Rows with Team A	Rows with Team A and C	Rows with Team A and C

Key Moments - 2 Insights

Why does the group 'B' disappear from the output?

Does filter() keep individual rows or whole groups?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the mean points for group 'C' at step 3?

A20.0

B22.5

C25.0

D15.0

Concept Snapshot

pandas.DataFrame.groupby('key').filter(func)
- Groups data by 'key'
- Applies func to each group
- Keeps groups where func returns True
- Returns combined filtered DataFrame
- Useful for group-level filtering based on aggregate conditions

Full Transcript

We start with a DataFrame containing teams and their points. We group the data by the 'Team' column. For each group, we calculate the average points. We use filter() to keep only groups where the average points are greater than 10. Group A has an average of 12.5, so it is kept. Group B has an average of 6.0, so it is dropped. Group C has an average of 22.5, so it is kept. The final output combines rows from groups A and C only.