0
0
Data Analysis Pythondata~10 mins

Pivot tables with pivot_table() in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Pivot tables with pivot_table()
Start with DataFrame
Choose index (rows)
Choose columns (optional)
Choose values to aggregate
Choose aggregation function
pivot_table() creates summary table
Use or display pivot table
Pivot tables summarize data by grouping rows and columns and applying aggregation functions.
Execution Sample
Data Analysis Python
import pandas as pd

data = {'City': ['NY', 'LA', 'NY', 'LA'],
        'Year': [2020, 2020, 2021, 2021],
        'Sales': [100, 200, 150, 250]}

df = pd.DataFrame(data)
pivot = df.pivot_table(index='City', columns='Year', values='Sales', aggfunc='sum')
print(pivot)
This code creates a pivot table showing total sales by city and year.
Execution Table
StepActionDataFrame StatePivot Table StateExplanation
1Create DataFrame df{'City': ['NY', 'LA', 'NY', 'LA'], 'Year': [2020, 2020, 2021, 2021], 'Sales': [100, 200, 150, 250]}NoneInitial data with city, year, and sales columns.
2Call pivot_table()df unchangedEmptyStart creating pivot table with index='City', columns='Year', values='Sales', aggfunc='sum'.
3Group data by City and Yeardf unchangedGroups: NY-2020, LA-2020, NY-2021, LA-2021Data grouped by city and year combinations.
4Aggregate Sales with sumdf unchanged{NY: {2020: 100, 2021: 150}, LA: {2020: 200, 2021: 250}}Sum sales for each city-year group.
5Build pivot table DataFramedf unchangedYear 2020 2021 City LA 200 250 NY 100 150Pivot table shows sales by city (rows) and year (columns).
6Print pivot tabledf unchangedSame as step 5Output the pivot table to console.
7Enddf unchangedPivot table readyPivot table creation complete.
💡 All data grouped and aggregated; pivot table created successfully.
Variable Tracker
VariableStartAfter pivot_table callFinal
dfEmpty{'City': ['NY', 'LA', 'NY', 'LA'], 'Year': [2020, 2020, 2021, 2021], 'Sales': [100, 200, 150, 250]}Same as after call
pivotNoneEmptyYear 2020 2021 City LA 200 250 NY 100 150
Key Moments - 3 Insights
Why do we specify 'index' and 'columns' in pivot_table?
The 'index' defines the rows of the pivot table, and 'columns' define the columns. This organizes data into a grid for easy comparison, as shown in execution_table step 4 and 5.
What happens if multiple rows have the same index and column values?
pivot_table groups these rows and applies the aggregation function (like sum) to combine their values, as seen in step 4 where sales are summed.
Why do we use aggfunc='sum'?
Because there can be multiple sales values per group, sum adds them up to give total sales per city-year group, demonstrated in step 4.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4, what is the sum of sales for city 'LA' in year 2021?
A150
B200
C250
D100
💡 Hint
Check the 'Pivot Table State' column at step 4 for LA and year 2021.
At which step does the pivot table get its final shape with rows and columns?
AStep 5
BStep 3
CStep 2
DStep 6
💡 Hint
Look for when the pivot table shows the grid with City as rows and Year as columns.
If we change aggfunc from 'sum' to 'mean', how would the pivot table values change?
AValues would be the total sales per city-year group
BValues would be the average sales per city-year group
CValues would be the count of sales entries
DPivot table would not change
💡 Hint
aggfunc controls how grouped values are combined; 'mean' calculates average.
Concept Snapshot
pivot_table() creates a summary table from data.
Use index= for rows, columns= for columns.
values= selects data to aggregate.
aggfunc= defines how to combine data (sum, mean, etc.).
Result is a DataFrame showing grouped summaries.
Full Transcript
We start with a DataFrame containing city, year, and sales data. Using pivot_table(), we choose 'City' as the index (rows), 'Year' as columns, and 'Sales' as values to summarize. The function groups data by city and year, then sums sales for each group. The result is a new table showing total sales per city for each year. This pivot table helps us quickly compare sales across cities and years. Key points include specifying index and columns to organize data, and using aggfunc to control aggregation. The process ends with a clear summary table ready for analysis.