0
0
PandasHow-ToBeginner · 3 min read

How to Group by Multiple Columns in pandas: Simple Guide

In pandas, you can group data by multiple columns using df.groupby([col1, col2]). This creates groups based on unique combinations of the specified columns, allowing you to perform aggregate operations on each group.
📐

Syntax

The basic syntax to group by multiple columns in pandas is:

  • df.groupby([col1, col2, ...]): Groups the DataFrame by the listed columns.
  • agg() or other aggregation functions: Apply operations like sum, mean, count on each group.

This groups rows that share the same values in all specified columns.

python
df.groupby(['column1', 'column2'])
💻

Example

This example shows how to group a DataFrame by two columns and calculate the sum of another column for each group.

python
import pandas as pd

data = {
    'City': ['Paris', 'Paris', 'London', 'London', 'Berlin', 'Berlin'],
    'Year': [2020, 2021, 2020, 2021, 2020, 2021],
    'Sales': [100, 150, 200, 250, 300, 350]
}

df = pd.DataFrame(data)

grouped = df.groupby(['City', 'Year'])['Sales'].sum()
print(grouped)
Output
City Year Berlin 2020 300 2021 350 London 2020 200 2021 250 Paris 2020 100 2021 150 Name: Sales, dtype: int64
⚠️

Common Pitfalls

Common mistakes when grouping by multiple columns include:

  • Passing a single string instead of a list of columns, which groups by one column only.
  • Forgetting to select the column to aggregate after grouping, leading to unexpected results.
  • Not resetting the index if you want the grouped columns back as regular columns.
python
import pandas as pd

data = {'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one'], 'C': [1, 2, 3]}
df = pd.DataFrame(data)

# Wrong: grouping by a single string instead of list
wrong_group = df.groupby('A')['C'].sum()
print(wrong_group)

# Right: grouping by multiple columns
right_group = df.groupby(['A', 'B'])['C'].sum()
print(right_group)
Output
A bar 3 foo 3 Name: C, dtype: int64 A B bar one 3 foo one 1 two 2 Name: C, dtype: int64
📊

Quick Reference

OperationDescriptionExample
Group by multiple columnsGroups data by unique combinations of columnsdf.groupby(['col1', 'col2'])
Aggregate sumSum values in each groupdf.groupby(['col1', 'col2'])['col3'].sum()
Aggregate meanCalculate mean of groupsdf.groupby(['col1', 'col2'])['col3'].mean()
Reset indexConvert grouped index back to columnsdf.groupby(['col1', 'col2']).sum().reset_index()

Key Takeaways

Use a list of column names inside groupby() to group by multiple columns.
After grouping, apply aggregation functions like sum() or mean() to summarize data.
Remember to reset the index if you want grouped columns as regular columns again.
Passing a single string to groupby() groups by only one column, not multiple.
Grouping by multiple columns creates groups based on unique combinations of those columns.