0
0
Data Analysis Pythondata~10 mins

Single and multiple column grouping in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Single and multiple column grouping
Start with DataFrame
Choose column(s) to group by
Group rows sharing same value(s)
Apply aggregation function
Get grouped summary DataFrame
End
We start with a table, pick one or more columns to group by, then combine rows with the same values in those columns, and finally calculate summary stats for each group.
Execution Sample
Data Analysis Python
import pandas as pd

data = {'City': ['NY', 'LA', 'NY', 'LA', 'NY'],
        'Year': [2020, 2020, 2021, 2021, 2020],
        'Sales': [100, 200, 150, 250, 300]}
df = pd.DataFrame(data)

result = df.groupby(['City', 'Year']).sum()
This code groups sales data by City and Year, then sums sales for each group.
Execution Table
StepActionGroup KeyRows in GroupAggregation Result
1Start with DataFrame-5 rows-
2Group by City and Year('NY', 2020)2 rows (index 0,4)-
3Group by City and Year('LA', 2020)1 row (index 1)-
4Group by City and Year('NY', 2021)1 row (index 2)-
5Group by City and Year('LA', 2021)1 row (index 3)-
6Sum Sales in group ('NY', 2020)-2 rowsSales = 100 + 300 = 400
7Sum Sales in group ('LA', 2020)-1 rowSales = 200
8Sum Sales in group ('NY', 2021)-1 rowSales = 150
9Sum Sales in group ('LA', 2021)-1 rowSales = 250
10Create result DataFrame with sums--4 rows with grouped sums
💡 All groups processed and aggregated, grouping complete.
Variable Tracker
VariableStartAfter GroupingAfter AggregationFinal
dfOriginal 5-row DataFrameGrouped by City and YearAggregation appliedGrouped summary DataFrame with sums
resultNot definedNot definedDataFrame with 4 rows, sums of Sales per groupSame as after aggregation
Key Moments - 3 Insights
Why does grouping by multiple columns create groups with tuples as keys?
Because each group is defined by the combination of values in all grouping columns, the keys become tuples representing those combined values, as shown in execution_table rows 2-5.
Why do some groups have multiple rows and others only one?
Groups with the same values in the grouping columns combine multiple rows, like ('NY', 2020) with 2 rows (rows 0 and 4), while others like ('LA', 2020) have only one matching row, as seen in execution_table rows 2-5.
What does the aggregation function sum() do after grouping?
It adds up the values in the 'Sales' column for each group, producing a single summary number per group, as shown in execution_table rows 6-9.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 6, what is the sum of Sales for group ('NY', 2020)?
A300
B100
C400
D250
💡 Hint
Check the 'Aggregation Result' column at step 6 in execution_table.
At which step does the grouping identify the group ('LA', 2021)?
AStep 4
BStep 5
CStep 3
DStep 2
💡 Hint
Look at the 'Group Key' column in execution_table rows 2-5.
If we grouped only by 'City' instead of ['City', 'Year'], how would the number of groups change?
AIt would decrease
BIt would increase
CIt would stay the same
DIt would be zero
💡 Hint
Refer to variable_tracker and think about grouping by fewer columns combining more rows.
Concept Snapshot
Use df.groupby(['col1', 'col2']) to group data by one or more columns.
Groups combine rows with same values in those columns.
Apply aggregation like sum(), mean() to get summary stats per group.
Result is a new DataFrame indexed by the grouping columns.
Multiple columns create tuple keys for groups.
Full Transcript
This visual execution shows how to group data in a table by one or more columns using Python's pandas library. We start with a DataFrame of sales data with columns City, Year, and Sales. Grouping by City and Year means rows with the same City and Year values are combined into groups. Each group is identified by a tuple key like ('NY', 2020). Then, we apply sum() to add up Sales values in each group. The execution table traces each step: grouping rows, counting rows per group, summing sales, and creating the final grouped summary. The variable tracker shows how the original DataFrame changes into a grouped summary DataFrame. Key moments clarify why group keys are tuples, why groups have different sizes, and what aggregation does. The quiz tests understanding of sums, group identification steps, and effects of grouping by fewer columns. The snapshot summarizes the syntax and behavior of single and multiple column grouping.