How to Use Filter in pandas GroupBy: Syntax and Examples
Use
filter on a pandas.DataFrameGroupBy object to keep groups that satisfy a condition. Pass a function to filter that returns True for groups you want to keep and False for those to drop.Syntax
The filter method is called on a groupby object. It takes a function that receives each group as a DataFrame and returns True to keep the group or False to drop it.
grouped.filter(func): Appliesfuncto each group.func: A function that takes a DataFrame (group) and returns a boolean.
python
grouped = df.groupby('column_name') filtered = grouped.filter(lambda x: condition_on_x)
Example
This example groups data by a category and keeps only groups with more than 2 rows.
python
import pandas as pd data = {'Category': ['A', 'A', 'B', 'B', 'B', 'C'], 'Value': [10, 15, 10, 20, 30, 40]} df = pd.DataFrame(data) grouped = df.groupby('Category') filtered = grouped.filter(lambda x: len(x) > 2) print(filtered)
Output
Category Value
2 B 10
3 B 20
4 B 30
Common Pitfalls
Common mistakes when using filter with groupby include:
- Returning a non-boolean value from the function.
- Using
filterwhen you want to transform or aggregate data. - Expecting
filterto modify the original DataFrame instead of returning a filtered copy.
Always ensure your function returns a single boolean per group.
python
import pandas as pd data = {'Category': ['A', 'A', 'B', 'B', 'B', 'C'], 'Value': [10, 15, 10, 20, 30, 40]} df = pd.DataFrame(data) grouped = df.groupby('Category') # Wrong: returns a non-boolean value # filtered = grouped.filter(lambda x: x['Value'].mean()) # This will raise an error # Right: returns boolean filtered = grouped.filter(lambda x: x['Value'].mean() > 15) print(filtered)
Output
Category Value
3 B 20
4 B 30
5 C 40
Quick Reference
- filter(func): Keep groups where
func(group)is True. - func: Function receiving group DataFrame, returns True/False.
- Use for selecting groups, not for modifying data.
- Returns a filtered DataFrame with original index preserved.
Key Takeaways
Use filter on a groupby object to keep groups based on a condition function.
The function passed to filter must return a boolean for each group.
Filter returns a new DataFrame with only the groups that meet the condition.
Do not use filter to modify data; use transform or apply for that.
Common error: returning non-boolean values from the filter function.