0
0
Pandasdata~10 mins

Pivot with aggregation functions in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Pivot with aggregation functions
Start with DataFrame
Choose index, columns, values
Apply aggregation function
Create pivot table
View summarized data
We start with a DataFrame, select which columns to use as index, columns, and values, then apply an aggregation function to summarize data in a pivot table.
Execution Sample
Pandas
import pandas as pd

data = {'City': ['NY', 'LA', 'NY', 'LA', 'NY'],
        'Year': [2020, 2020, 2021, 2021, 2021],
        'Sales': [100, 200, 150, 250, 300]}

df = pd.DataFrame(data)

pivot = df.pivot_table(index='City', columns='Year', values='Sales', aggfunc='sum')
This code creates a pivot table that sums sales for each city by year.
Execution Table
StepActionDataFrame StatePivot Table State
1Create DataFrame from data dictionaryCity Year Sales 0 NY 2020 100 1 LA 2020 200 2 NY 2021 150 3 LA 2021 250 4 NY 2021 300N/A
2Call pivot_table with index='City', columns='Year', values='Sales', aggfunc='sum'Same as step 1Year 2020 2021 City LA 200 250 NY 100 450
3Pivot table created with sums of Sales by City and YearSame as step 1Same as step 2
💡 Pivot table created after aggregation sums sales grouped by City and Year.
Variable Tracker
VariableStartAfter Step 1After Step 2Final
dfNoneDataFrame with 5 rows and 3 columnsSameSame
pivotNoneNonePivot table with summed salesSame
Key Moments - 2 Insights
Why do we use aggfunc='sum' in pivot_table?
Because there are multiple sales entries for the same City and Year, aggfunc='sum' tells pandas to add these sales together. See execution_table step 2 where aggregation happens.
What happens if we don't specify values in pivot_table?
Pandas will try to aggregate all numeric columns. Here, specifying values='Sales' focuses aggregation only on sales data, as shown in execution_table step 2.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2, what is the sum of sales for NY in 2021?
A450
B300
C150
D100
💡 Hint
Check the pivot table state in execution_table step 2 under City 'NY' and Year '2021'.
At which step is the pivot table actually created?
AStep 1
BStep 2
CStep 3
DPivot table is never created
💡 Hint
See execution_table rows describing when pivot_table is called and created.
If we change aggfunc from 'sum' to 'mean', what changes in the pivot table?
APivot table will not change
BPivot table will have counts instead of sums
CSales values will be averaged instead of summed
DPivot table will show maximum sales
💡 Hint
aggfunc controls how values are combined; 'mean' calculates average instead of sum.
Concept Snapshot
pivot_table(index, columns, values, aggfunc)
- index: rows to group by
- columns: columns to spread
- values: data to aggregate
- aggfunc: function to summarize (sum, mean, etc.)
Creates a summary table by grouping and aggregating data.
Full Transcript
We start with a DataFrame containing sales data by city and year. Using pandas pivot_table, we select 'City' as index, 'Year' as columns, and 'Sales' as values. We apply the sum aggregation function to add sales for each city-year pair. The pivot table shows total sales per city for each year. This process groups data and summarizes it in a clear table format.