0
0
Pandasdata~5 mins

Split-apply-combine mental model in Pandas

Choose your learning style9 modes available
Introduction

The split-apply-combine model helps you analyze data by breaking it into groups, doing calculations on each group, and then putting the results back together.

You want to find the average sales for each store in a chain.
You need to count how many times each category appears in a list.
You want to sum expenses by month from a big table of transactions.
You want to find the maximum score for each player in a game.
Syntax
Pandas
df.groupby('column_name').agg({'column_to_aggregate': 'aggregation_function'})

groupby() splits the data into groups based on one or more columns.

agg() applies a function like sum, mean, or count to each group.

Examples
Groups data by 'Category' and sums all numeric columns in each group.
Pandas
df.groupby('Category').sum()
Groups data by 'Store' and calculates the average sales for each store.
Pandas
df.groupby('Store')['Sales'].mean()
Groups data by 'Region' and 'Product' and counts rows in each group.
Pandas
df.groupby(['Region', 'Product']).count()
Sample Program

This code creates a small table of sales and expenses for stores A, B, and C. It then groups the data by store and sums the sales and expenses for each store.

Pandas
import pandas as pd

data = {'Store': ['A', 'A', 'B', 'B', 'C', 'C'],
        'Sales': [100, 150, 200, 250, 300, 350],
        'Expenses': [80, 90, 120, 130, 160, 170]}
df = pd.DataFrame(data)

# Split by 'Store', then sum 'Sales' and 'Expenses' for each store
result = df.groupby('Store').sum()
print(result)
OutputSuccess
Important Notes

You can group by multiple columns by passing a list to groupby().

The aggregation function can be sum, mean, count, max, min, and others.

After grouping, you get a new DataFrame with the grouped keys as the index.

Summary

Split data into groups using groupby().

Apply calculations to each group with aggregation functions.

Combine the results into a new summarized DataFrame.