0
0
Pandasdata~10 mins

Split-apply-combine mental model in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Split-apply-combine mental model
Start with DataFrame
Split data into groups
Apply function to each group
Combine results into new DataFrame
Output
The data is split into groups, a function is applied to each group, then results are combined into one output.
Execution Sample
Pandas
import pandas as pd

df = pd.DataFrame({'Team': ['A', 'A', 'B', 'B'], 'Points': [10, 15, 10, 20]})
result = df.groupby('Team')['Points'].mean()
print(result)
This code groups data by 'Team', calculates average 'Points' per team, and prints the result.
Execution Table
StepActionData StateResult
1Create DataFramedf with 4 rows, columns Team and PointsDataFrame ready
2Group by 'Team'Groups: 'A' with 2 rows, 'B' with 2 rowsGroups formed
3Apply mean to 'Points' in each groupGroup A Points: [10, 15], Group B Points: [10, 20]Means: A=12.5, B=15.0
4Combine resultsSeries with index Team and mean PointsOutput Series ready
5Print resultSeries displayedTeam A 12.5 B 15.0 Name: Points, dtype: float64
💡 All groups processed and combined, output produced
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4Final
dfNoneDataFrame with 4 rowsSame DataFrameSame DataFrameSame DataFrameSame DataFrame
groupsNoneNoneDataFrameGroupBy object with keys 'A' and 'B'Same groupsSame groupsSame groups
resultNoneNoneNoneSeries with means per groupSame SeriesSame Series
Key Moments - 3 Insights
Why do we split the data before applying the function?
Splitting groups data by a key so the function applies separately to each group, as shown in step 2 and 3 of the execution_table.
What does 'combine' mean in this model?
Combine means putting all group results back together into one structure, like the Series in step 4 and 5.
Is the original DataFrame changed after groupby?
No, the original DataFrame stays the same; groupby creates groups without modifying the original, as seen in variable_tracker for 'df'.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the mean Points for group 'A' after step 3?
A12.5
B15.0
C10
D20
💡 Hint
Check the 'Apply mean' row in execution_table step 3.
At which step are the groups actually formed?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at the 'Group by' action in execution_table.
If we changed the function from mean to sum, what would change in the execution_table?
AGroups would change
BDataFrame would have more rows
CStep 3 result values would be sums instead of means
DNo change at all
💡 Hint
Focus on the 'Apply function' step in execution_table.
Concept Snapshot
Split-apply-combine model:
1. Split data into groups by key
2. Apply function to each group separately
3. Combine results into one output
Use pandas groupby() to do this easily
Example: df.groupby('key')['value'].mean()
Full Transcript
The split-apply-combine model means we start with a DataFrame, split it into groups by a key column, apply a function like mean to each group, then combine the results into a new output. In the example, we group by 'Team', calculate average 'Points' per team, and get a Series with these averages. The original DataFrame stays unchanged. This model helps analyze data by groups simply and clearly.