How to Group by Multiple Columns in pandas: Simple Guide
In pandas, you can group data by multiple columns using
df.groupby([col1, col2]). This creates groups based on unique combinations of the specified columns, allowing you to perform aggregate operations on each group.Syntax
The basic syntax to group by multiple columns in pandas is:
df.groupby([col1, col2, ...]): Groups the DataFrame by the listed columns.agg()or other aggregation functions: Apply operations like sum, mean, count on each group.
This groups rows that share the same values in all specified columns.
python
df.groupby(['column1', 'column2'])
Example
This example shows how to group a DataFrame by two columns and calculate the sum of another column for each group.
python
import pandas as pd data = { 'City': ['Paris', 'Paris', 'London', 'London', 'Berlin', 'Berlin'], 'Year': [2020, 2021, 2020, 2021, 2020, 2021], 'Sales': [100, 150, 200, 250, 300, 350] } df = pd.DataFrame(data) grouped = df.groupby(['City', 'Year'])['Sales'].sum() print(grouped)
Output
City Year
Berlin 2020 300
2021 350
London 2020 200
2021 250
Paris 2020 100
2021 150
Name: Sales, dtype: int64
Common Pitfalls
Common mistakes when grouping by multiple columns include:
- Passing a single string instead of a list of columns, which groups by one column only.
- Forgetting to select the column to aggregate after grouping, leading to unexpected results.
- Not resetting the index if you want the grouped columns back as regular columns.
python
import pandas as pd data = {'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one'], 'C': [1, 2, 3]} df = pd.DataFrame(data) # Wrong: grouping by a single string instead of list wrong_group = df.groupby('A')['C'].sum() print(wrong_group) # Right: grouping by multiple columns right_group = df.groupby(['A', 'B'])['C'].sum() print(right_group)
Output
A
bar 3
foo 3
Name: C, dtype: int64
A B
bar one 3
foo one 1
two 2
Name: C, dtype: int64
Quick Reference
| Operation | Description | Example |
|---|---|---|
| Group by multiple columns | Groups data by unique combinations of columns | df.groupby(['col1', 'col2']) |
| Aggregate sum | Sum values in each group | df.groupby(['col1', 'col2'])['col3'].sum() |
| Aggregate mean | Calculate mean of groups | df.groupby(['col1', 'col2'])['col3'].mean() |
| Reset index | Convert grouped index back to columns | df.groupby(['col1', 'col2']).sum().reset_index() |
Key Takeaways
Use a list of column names inside
groupby() to group by multiple columns.After grouping, apply aggregation functions like
sum() or mean() to summarize data.Remember to reset the index if you want grouped columns as regular columns again.
Passing a single string to
groupby() groups by only one column, not multiple.Grouping by multiple columns creates groups based on unique combinations of those columns.