0
0
Pandasdata~5 mins

filter() for group-level filtering in Pandas

Choose your learning style9 modes available
Introduction

We use filter() to keep or remove whole groups in data based on a rule. It helps us focus on groups that matter.

You want to keep only groups with more than 5 members in a sales dataset.
You want to analyze students who scored above average in each class.
You want to remove product categories with low total sales from your report.
You want to keep only months where total expenses exceeded a budget.
Syntax
Pandas
grouped_data.filter(function)

# where grouped_data is a DataFrameGroupBy object
# function takes a group (DataFrame) and returns True or False

The function receives each group as a small DataFrame.

If the function returns True, the whole group is kept; if False, it is removed.

Examples
Keep groups where the group size is more than 3 rows.
Pandas
df.groupby('Category').filter(lambda x: len(x) > 3)
Keep teams with average score above 80.
Pandas
df.groupby('Team').filter(lambda x: x['Score'].mean() > 80)
Keep months where total sales are more than 1000.
Pandas
df.groupby('Month').filter(lambda x: x['Sales'].sum() > 1000)
Sample Program

This code keeps only teams where the average player score is >= 75. Teams A and B meet this condition, so their rows stay. Team C is removed.

Pandas
import pandas as pd

data = {
    'Team': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
    'Player': ['John', 'Mike', 'Anna', 'Tom', 'Sara', 'Bob', 'Liz', 'Sam', 'Eva'],
    'Score': [90, 85, 70, 75, 80, 60, 65, 70, 55]
}

df = pd.DataFrame(data)

# Group by 'Team' and keep only teams with average Score >= 75
filtered_df = df.groupby('Team').filter(lambda x: x['Score'].mean() >= 75)

print(filtered_df)
OutputSuccess
Important Notes

The filter() method returns a DataFrame with only the groups that pass the test.

Inside the lambda, you can use any condition based on the group's data.

If no groups pass, the result is an empty DataFrame.

Summary

filter() helps keep or remove whole groups based on a condition.

It works on grouped data and returns only groups where the condition is True.

Use it to focus analysis on important groups easily.