0
0
Pandasdata~10 mins

pivot_table() for summarization in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - pivot_table() for summarization
Start with DataFrame
Choose index (rows)
Choose columns (optional)
Choose values to summarize
Choose aggregation function
pivot_table() creates summary table
Use or display summarized data
pivot_table() takes a DataFrame and summarizes data by grouping rows and columns, then applying an aggregation like sum or mean.
Execution Sample
Pandas
import pandas as pd

data = {'City': ['NY', 'LA', 'NY', 'LA'],
        'Year': [2020, 2020, 2021, 2021],
        'Sales': [100, 200, 150, 250]}

df = pd.DataFrame(data)

summary = df.pivot_table(index='City', columns='Year', values='Sales', aggfunc='sum')
This code summarizes sales by city and year, showing total sales for each city-year pair.
Execution Table
StepActionDataFrame StateResult
1Create DataFrame{'City': ['NY', 'LA', 'NY', 'LA'], 'Year': [2020, 2020, 2021, 2021], 'Sales': [100, 200, 150, 250]}DataFrame with 4 rows and 3 columns
2Call pivot_table with index='City', columns='Year', values='Sales', aggfunc='sum'Same DataFrameGrouped sales by City and Year, summed values
3Group rows by City: NY and LAGroups: NY (rows 0,2), LA (rows 1,3)Two groups formed
4Group columns by Year: 2020 and 2021Columns split into 2020 and 2021Two columns for years
5Sum Sales for NY in 2020Rows with City=NY and Year=2020100
6Sum Sales for NY in 2021Rows with City=NY and Year=2021150
7Sum Sales for LA in 2020Rows with City=LA and Year=2020200
8Sum Sales for LA in 2021Rows with City=LA and Year=2021250
9Create pivot table with sumsAggregated sumspivot_table with City as index, Year as columns, Sales sums as values
10Display pivot tableFinal summarized tableCity\Year 2020 2021 LA 200 250 NY 100 150
11EndNo further actionExecution complete
💡 All groups processed and summarized, pivot_table created successfully
Variable Tracker
VariableStartAfter Step 2After Step 9Final
dfundefinedDataFrame with 4 rows and 3 columnsSame DataFrameSame DataFrame
summaryundefinedundefinedpivot_table DataFrame with summarized salespivot_table DataFrame with summarized sales
Key Moments - 3 Insights
Why do we need to specify 'index' and 'columns' in pivot_table?
The 'index' defines the rows to group by, and 'columns' define the columns to group by. This creates a grid to summarize data. See execution_table steps 3 and 4.
What happens if we don't specify an aggregation function?
pivot_table defaults to 'mean' aggregation. In this example, we used 'sum' explicitly (step 2). Without it, the result would show average sales instead of total.
Why is the result a new DataFrame and not modifying the original?
pivot_table returns a new summarized DataFrame, leaving the original data unchanged. See variable_tracker where 'summary' is new, original 'df' stays the same.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 5. What is the sum of Sales for NY in 2020?
A100
B150
C200
D250
💡 Hint
Check the 'Result' column at step 5 in the execution_table.
At which step does the pivot_table create the final summarized table?
AStep 3
BStep 7
CStep 9
DStep 11
💡 Hint
Look for the step where the pivot_table with sums is created in the execution_table.
If we change aggfunc from 'sum' to 'mean', how would the summary change?
AIt would show the total sales instead of average
BIt would show the average sales instead of total
CIt would show the maximum sales
DIt would cause an error
💡 Hint
Recall the key_moments explanation about default aggregation and aggfunc parameter.
Concept Snapshot
pivot_table() summarizes data by grouping rows (index) and columns,
then applies an aggregation function (aggfunc) like sum or mean.
Syntax: df.pivot_table(index='row_col', columns='col_col', values='val_col', aggfunc='sum')
Returns a new DataFrame with summarized values.
Useful for quick data summaries and cross-tabulations.
Full Transcript
We start with a DataFrame containing sales data by city and year. Using pivot_table(), we choose 'City' as rows (index), 'Year' as columns, and 'Sales' as values to summarize. The aggregation function 'sum' adds sales for each city-year pair. Step by step, the data groups by city and year, sums sales, and creates a new summarized table. The original DataFrame remains unchanged. This method helps quickly see total sales per city and year in a clear table format.