How to Iterate Over Groups in pandas: Simple Guide
Use
DataFrame.groupby() to split data into groups, then iterate over these groups with a for loop. Each loop returns a tuple with the group name and the group data as a DataFrame.Syntax
The basic syntax to iterate over groups in pandas is:
grouped = df.groupby('column_name')creates groups based on unique values in the specified column.for group_name, group_data in grouped:loops over each group.group_nameis the unique value of the group.group_datais a DataFrame containing rows of that group.
python
grouped = df.groupby('column_name') for group_name, group_data in grouped: print(group_name) print(group_data)
Example
This example shows how to group a DataFrame by the 'Category' column and iterate over each group to print the group name and its rows.
python
import pandas as pd data = {'Category': ['A', 'B', 'A', 'B', 'C'], 'Value': [10, 20, 15, 25, 30]} df = pd.DataFrame(data) grouped = df.groupby('Category') for group_name, group_data in grouped: print(f"Group: {group_name}") print(group_data) print('---')
Output
Group: A
Category Value
0 A 10
2 A 15
---
Group: B
Category Value
1 B 20
3 B 25
---
Group: C
Category Value
4 C 30
---
Common Pitfalls
Common mistakes when iterating over groups include:
- Forgetting that
group_datais a DataFrame, so you can use all DataFrame operations on it. - Trying to modify the original DataFrame inside the loop without using
applyor other methods. - Assuming groups are sorted; groups appear in sorted order by default.
python
import pandas as pd data = {'Category': ['A', 'B', 'A', 'B'], 'Value': [1, 2, 3, 4]} df = pd.DataFrame(data) grouped = df.groupby('Category') # Wrong: trying to modify original df inside loop for name, group in grouped: group['Value'] = group['Value'] * 2 # This does NOT change df print(df) # Right: use transform to modify original df df['Value'] = grouped['Value'].transform(lambda x: x * 2) print(df)
Output
Category Value
0 A 1
1 B 2
2 A 3
3 B 4
Category Value
0 A 2
1 B 4
2 A 6
3 B 8
Quick Reference
Summary tips for iterating over groups in pandas:
- Use
df.groupby('col')to create groups. - Loop with
for name, group in grouped:. groupis a DataFrame for that group.- Use
applyortransformto modify data efficiently. - Groups are sorted by default.
Key Takeaways
Use DataFrame.groupby() to split data into groups by column values.
Iterate groups with a for loop receiving group name and group DataFrame.
Group data is a DataFrame; you can apply all DataFrame operations on it.
Modifying original data inside a loop requires transform or apply, not direct assignment.
Groups are sorted by default when iterating.