The split-apply-combine model helps you analyze data by breaking it into groups, doing calculations on each group, and then putting the results back together.
0
0
Split-apply-combine mental model in Pandas
Introduction
You want to find the average sales for each store in a chain.
You need to count how many times each category appears in a list.
You want to sum expenses by month from a big table of transactions.
You want to find the maximum score for each player in a game.
Syntax
Pandas
df.groupby('column_name').agg({'column_to_aggregate': 'aggregation_function'})
groupby() splits the data into groups based on one or more columns.
agg() applies a function like sum, mean, or count to each group.
Examples
Groups data by 'Category' and sums all numeric columns in each group.
Pandas
df.groupby('Category').sum()
Groups data by 'Store' and calculates the average sales for each store.
Pandas
df.groupby('Store')['Sales'].mean()
Groups data by 'Region' and 'Product' and counts rows in each group.
Pandas
df.groupby(['Region', 'Product']).count()
Sample Program
This code creates a small table of sales and expenses for stores A, B, and C. It then groups the data by store and sums the sales and expenses for each store.
Pandas
import pandas as pd data = {'Store': ['A', 'A', 'B', 'B', 'C', 'C'], 'Sales': [100, 150, 200, 250, 300, 350], 'Expenses': [80, 90, 120, 130, 160, 170]} df = pd.DataFrame(data) # Split by 'Store', then sum 'Sales' and 'Expenses' for each store result = df.groupby('Store').sum() print(result)
OutputSuccess
Important Notes
You can group by multiple columns by passing a list to groupby().
The aggregation function can be sum, mean, count, max, min, and others.
After grouping, you get a new DataFrame with the grouped keys as the index.
Summary
Split data into groups using groupby().
Apply calculations to each group with aggregation functions.
Combine the results into a new summarized DataFrame.