Pandasdata~10 mins

Grouping by multiple columns in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Grouping by multiple columns

Start with DataFrame

↓

Select multiple columns to group by

↓

Apply groupby on these columns

↓

Aggregate or summarize each group

↓

Get grouped result with multi-level index or reset index

↓

Use or display grouped data

We start with a table, pick two or more columns to group by, then summarize data in each group.

Execution Sample

Pandas

import pandas as pd

data = {'City': ['NY', 'LA', 'NY', 'LA', 'NY'],
        'Year': [2020, 2020, 2021, 2021, 2020],
        'Sales': [100, 200, 150, 250, 130]}
df = pd.DataFrame(data)
grouped = df.groupby(['City', 'Year'])['Sales'].sum()

This code groups sales data by City and Year, then sums sales in each group.

Execution Table

Step	Action	Group Keys Created	Aggregation Result	Notes
1	Create DataFrame	-	DataFrame with 5 rows and 3 columns	Initial data loaded
2	Select columns 'City' and 'Year' for grouping	Unique pairs: ('LA',2020), ('LA',2021), ('NY',2020), ('NY',2021)	-	Groups identified by these pairs
3	Group data by ['City', 'Year']	Groups formed as above	-	Data grouped but not yet aggregated
4	Sum 'Sales' in each group	-	LA 2020: 200 LA 2021: 250 NY 2020: 230 NY 2021: 150	Sales summed per group
5	Result stored in 'grouped'	MultiIndex with City and Year	Aggregated sums shown	Ready for use or display
6	End	-	-	Grouping and aggregation complete

💡 All groups processed and sales summed; grouping by multiple columns done.

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	Final
df	Empty	DataFrame with 5 rows	Same DataFrame	Same DataFrame	Same DataFrame
grouped	Undefined	Undefined	Grouped object created	Aggregated Series with sums	Aggregated Series with sums

Key Moments - 3 Insights

Why does the grouped result have a MultiIndex?

What happens if we don't aggregate after grouping?

Can we reset the index after grouping?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 4, what is the sum of Sales for City 'NY' in Year 2020?

A100

B230

C150

D130

Concept Snapshot

Grouping by multiple columns in pandas:
- Use df.groupby(['col1', 'col2'])
- Groups are formed by unique pairs/triples...
- Aggregation like sum() summarizes each group
- Result has MultiIndex by default
- Use reset_index() to flatten the result

Full Transcript

We start with a DataFrame containing sales data by city and year. We choose to group by both 'City' and 'Year' columns. This creates groups identified by unique combinations of city and year. Then, we sum the sales within each group. The result is a DataFrame indexed by both city and year, showing total sales per group. This method helps us analyze data across multiple categories at once.