0
0
Pandasdata~10 mins

Grouping by multiple columns in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Grouping by multiple columns
Start with DataFrame
Select multiple columns to group by
Apply groupby on these columns
Aggregate or summarize each group
Get grouped result with multi-level index or reset index
Use or display grouped data
We start with a table, pick two or more columns to group by, then summarize data in each group.
Execution Sample
Pandas
import pandas as pd

data = {'City': ['NY', 'LA', 'NY', 'LA', 'NY'],
        'Year': [2020, 2020, 2021, 2021, 2020],
        'Sales': [100, 200, 150, 250, 130]}
df = pd.DataFrame(data)
grouped = df.groupby(['City', 'Year'])['Sales'].sum()
This code groups sales data by City and Year, then sums sales in each group.
Execution Table
StepActionGroup Keys CreatedAggregation ResultNotes
1Create DataFrame-DataFrame with 5 rows and 3 columnsInitial data loaded
2Select columns 'City' and 'Year' for groupingUnique pairs: ('LA',2020), ('LA',2021), ('NY',2020), ('NY',2021)-Groups identified by these pairs
3Group data by ['City', 'Year']Groups formed as above-Data grouped but not yet aggregated
4Sum 'Sales' in each group-LA 2020: 200 LA 2021: 250 NY 2020: 230 NY 2021: 150Sales summed per group
5Result stored in 'grouped'MultiIndex with City and YearAggregated sums shownReady for use or display
6End--Grouping and aggregation complete
💡 All groups processed and sales summed; grouping by multiple columns done.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4Final
dfEmptyDataFrame with 5 rowsSame DataFrameSame DataFrameSame DataFrame
groupedUndefinedUndefinedGrouped object createdAggregated Series with sumsAggregated Series with sums
Key Moments - 3 Insights
Why does the grouped result have a MultiIndex?
Because grouping by multiple columns creates groups identified by combinations of those columns, pandas uses a MultiIndex to represent these pairs (see execution_table step 4).
What happens if we don't aggregate after grouping?
The groupby object holds groups but no summary; aggregation like sum() is needed to get meaningful results (see execution_table step 3 vs 4).
Can we reset the index after grouping?
Yes, calling reset_index() on the grouped result turns the MultiIndex back into columns for easier viewing or further processing.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4, what is the sum of Sales for City 'NY' in Year 2020?
A100
B230
C150
D130
💡 Hint
Check the Aggregation Result column in step 4 for 'NY 2020'.
At which step does the DataFrame get grouped but not yet aggregated?
AStep 3
BStep 4
CStep 2
DStep 5
💡 Hint
Look for the step where groups are formed but no sum is calculated yet.
If we add another column 'Product' to group by, how would the number of groups change?
AStay the same
BDecrease
CIncrease
DBecome zero
💡 Hint
Adding more columns to group by creates more specific groups (see concept_flow).
Concept Snapshot
Grouping by multiple columns in pandas:
- Use df.groupby(['col1', 'col2'])
- Groups are formed by unique pairs/triples...
- Aggregation like sum() summarizes each group
- Result has MultiIndex by default
- Use reset_index() to flatten the result
Full Transcript
We start with a DataFrame containing sales data by city and year. We choose to group by both 'City' and 'Year' columns. This creates groups identified by unique combinations of city and year. Then, we sum the sales within each group. The result is a DataFrame indexed by both city and year, showing total sales per group. This method helps us analyze data across multiple categories at once.