PandasHow-ToBeginner · 3 min read

How to Group by Multiple Columns in pandas: Simple Guide

In pandas, you can group data by multiple columns using df.groupby([col1, col2]). This creates groups based on unique combinations of the specified columns, allowing you to perform aggregate operations on each group.

📐

Syntax

The basic syntax to group by multiple columns in pandas is:

df.groupby([col1, col2, ...]): Groups the DataFrame by the listed columns.
agg() or other aggregation functions: Apply operations like sum, mean, count on each group.

This groups rows that share the same values in all specified columns.

python

df.groupby(['column1', 'column2'])

💻

Example

This example shows how to group a DataFrame by two columns and calculate the sum of another column for each group.

python

import pandas as pd

data = {
    'City': ['Paris', 'Paris', 'London', 'London', 'Berlin', 'Berlin'],
    'Year': [2020, 2021, 2020, 2021, 2020, 2021],
    'Sales': [100, 150, 200, 250, 300, 350]
}

df = pd.DataFrame(data)

grouped = df.groupby(['City', 'Year'])['Sales'].sum()
print(grouped)

Output

City Year Berlin 2020 300 2021 350 London 2020 200 2021 250 Paris 2020 100 2021 150 Name: Sales, dtype: int64

⚠️

Common Pitfalls

Common mistakes when grouping by multiple columns include:

Passing a single string instead of a list of columns, which groups by one column only.
Forgetting to select the column to aggregate after grouping, leading to unexpected results.
Not resetting the index if you want the grouped columns back as regular columns.

python

import pandas as pd

data = {'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one'], 'C': [1, 2, 3]}
df = pd.DataFrame(data)

# Wrong: grouping by a single string instead of list
wrong_group = df.groupby('A')['C'].sum()
print(wrong_group)

# Right: grouping by multiple columns
right_group = df.groupby(['A', 'B'])['C'].sum()
print(right_group)

Output

A bar 3 foo 3 Name: C, dtype: int64 A B bar one 3 foo one 1 two 2 Name: C, dtype: int64

📊

Quick Reference

Operation	Description	Example
Group by multiple columns	Groups data by unique combinations of columns	df.groupby(['col1', 'col2'])
Aggregate sum	Sum values in each group	df.groupby(['col1', 'col2'])['col3'].sum()
Aggregate mean	Calculate mean of groups	df.groupby(['col1', 'col2'])['col3'].mean()
Reset index	Convert grouped index back to columns	df.groupby(['col1', 'col2']).sum().reset_index()

✅

Key Takeaways

Use a list of column names inside groupby() to group by multiple columns.

After grouping, apply aggregation functions like sum() or mean() to summarize data.

Remember to reset the index if you want grouped columns as regular columns again.

Passing a single string to groupby() groups by only one column, not multiple.

Grouping by multiple columns creates groups based on unique combinations of those columns.