0
0
Data Analysis Pythondata~5 mins

filter() for group-level filtering in Data Analysis Python

Choose your learning style9 modes available
Introduction

We use filter() to keep or remove whole groups of data based on a rule. This helps us focus on groups that matter.

You want to keep only groups with more than 5 sales records.
You want to analyze customers who bought more than 3 different products.
You want to remove groups where the average score is below 70.
You want to keep only months where total revenue passed a target.
You want to filter groups of students who attended more than 80% of classes.
Syntax
Data Analysis Python
grouped_data.filter(function)

The function takes a group (a small table) and returns True or False.

If True, the whole group stays; if False, the group is removed.

Examples
Keep groups where the number of rows is more than 3.
Data Analysis Python
df.groupby('Category').filter(lambda x: len(x) > 3)
Keep teams with average score above 70.
Data Analysis Python
df.groupby('Team').filter(lambda x: x['Score'].mean() > 70)
Keep months where total sales are more than 1000.
Data Analysis Python
df.groupby('Month').filter(lambda x: x['Sales'].sum() > 1000)
Sample Program

This code creates a small table with teams and scores. It keeps only teams whose average score is above 70.

Data Analysis Python
import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
        'Score': [80, 85, 60, 70, 65, 90, 95]}
df = pd.DataFrame(data)

# Group by 'Team' and keep only teams with average score > 70
filtered_df = df.groupby('Team').filter(lambda x: x['Score'].mean() > 70)

print(filtered_df)
OutputSuccess
Important Notes

The filter() function works on groups, not individual rows.

Make sure your function returns a single True or False for each group.

Use filter() when you want to keep or drop entire groups, not just rows.

Summary

filter() helps keep or remove whole groups based on a rule.

It works by applying a function to each group and checking if it returns True or False.

Use it to focus your data on groups that meet your conditions.