Data Analysis Pythondata~10 mins

Single and multiple column grouping in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Single and multiple column grouping

Start with DataFrame

↓

Choose column(s) to group by

↓

Group rows sharing same value(s)

↓

Apply aggregation function

↓

Get grouped summary DataFrame

↓

End

We start with a table, pick one or more columns to group by, then combine rows with the same values in those columns, and finally calculate summary stats for each group.

Execution Sample

Data Analysis Python

import pandas as pd

data = {'City': ['NY', 'LA', 'NY', 'LA', 'NY'],
        'Year': [2020, 2020, 2021, 2021, 2020],
        'Sales': [100, 200, 150, 250, 300]}
df = pd.DataFrame(data)

result = df.groupby(['City', 'Year']).sum()

This code groups sales data by City and Year, then sums sales for each group.

Execution Table

Step	Action	Group Key	Rows in Group	Aggregation Result
1	Start with DataFrame	-	5 rows	-
2	Group by City and Year	('NY', 2020)	2 rows (index 0,4)	-
3	Group by City and Year	('LA', 2020)	1 row (index 1)	-
4	Group by City and Year	('NY', 2021)	1 row (index 2)	-
5	Group by City and Year	('LA', 2021)	1 row (index 3)	-
6	Sum Sales in group ('NY', 2020)	-	2 rows	Sales = 100 + 300 = 400
7	Sum Sales in group ('LA', 2020)	-	1 row	Sales = 200
8	Sum Sales in group ('NY', 2021)	-	1 row	Sales = 150
9	Sum Sales in group ('LA', 2021)	-	1 row	Sales = 250
10	Create result DataFrame with sums	-	-	4 rows with grouped sums

💡 All groups processed and aggregated, grouping complete.

Variable Tracker

Variable	Start	After Grouping	After Aggregation	Final
df	Original 5-row DataFrame	Grouped by City and Year	Aggregation applied	Grouped summary DataFrame with sums
result	Not defined	Not defined	DataFrame with 4 rows, sums of Sales per group	Same as after aggregation

Key Moments - 3 Insights

Why does grouping by multiple columns create groups with tuples as keys?

Why do some groups have multiple rows and others only one?

What does the aggregation function sum() do after grouping?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 6, what is the sum of Sales for group ('NY', 2020)?

A300

B100

C400

D250

Concept Snapshot

Use df.groupby(['col1', 'col2']) to group data by one or more columns.
Groups combine rows with same values in those columns.
Apply aggregation like sum(), mean() to get summary stats per group.
Result is a new DataFrame indexed by the grouping columns.
Multiple columns create tuple keys for groups.

Full Transcript

This visual execution shows how to group data in a table by one or more columns using Python's pandas library. We start with a DataFrame of sales data with columns City, Year, and Sales. Grouping by City and Year means rows with the same City and Year values are combined into groups. Each group is identified by a tuple key like ('NY', 2020). Then, we apply sum() to add up Sales values in each group. The execution table traces each step: grouping rows, counting rows per group, summing sales, and creating the final grouped summary. The variable tracker shows how the original DataFrame changes into a grouped summary DataFrame. Key moments clarify why group keys are tuples, why groups have different sizes, and what aggregation does. The quiz tests understanding of sums, group identification steps, and effects of grouping by fewer columns. The snapshot summarizes the syntax and behavior of single and multiple column grouping.