0
0
Pandasdata~10 mins

GroupBy with custom functions in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - GroupBy with custom functions
Start with DataFrame
Group data by column(s)
Apply custom function to each group
Combine results into new DataFrame
Output grouped summary
We start with a table, group rows by a column, apply a custom function to each group, then combine the results.
Execution Sample
Pandas
import pandas as pd

df = pd.DataFrame({'Team': ['A', 'A', 'B', 'B'], 'Points': [10, 15, 10, 20]})

def range_func(x):
    return x.max() - x.min()

result = df.groupby('Team')['Points'].apply(range_func)
This code groups points by team and applies a custom function to find the range of points in each team.
Execution Table
StepActionGroupData in GroupCustom Function AppliedResult
1Group data by 'Team'A[10, 15]range_func([10, 15])15 - 10 = 5
2Group data by 'Team'B[10, 20]range_func([10, 20])20 - 10 = 10
3Combine results---{'A': 5, 'B': 10}
4End---Grouping and custom function complete
💡 All groups processed, results combined into final output
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
df{'Team': ['A','A','B','B'], 'Points': [10,15,10,20]}Grouped by 'Team'Grouped by 'Team'Grouped by 'Team'Unchanged
group 'A'N/A[10, 15][10, 15][10, 15]Used for calculation
group 'B'N/AN/A[10, 20][10, 20]Used for calculation
resultN/AN/AN/A{'A': 5, 'B': 10}{'A': 5, 'B': 10}
Key Moments - 3 Insights
Why does the custom function receive only the 'Points' data for each group?
Because we select the 'Points' column before applying the function (df.groupby('Team')['Points'].apply), so the function gets only that column's data per group, as shown in execution_table rows 1 and 2.
How does pandas combine the results from each group after applying the custom function?
Pandas collects the output from each group's function call and creates a new Series indexed by the group keys, as seen in execution_table row 3.
What happens if the custom function returns a single value for each group?
Pandas creates a Series with group keys as index and the returned values as data, like the final 'result' variable in variable_tracker.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at Step 2, what is the result of applying the custom function to group 'B'?
A5
B15
C10
D20
💡 Hint
Check the 'Result' column for Step 2 in the execution_table.
At which step does pandas combine the results from all groups?
AStep 3
BStep 1
CStep 2
DStep 4
💡 Hint
Look for the step where the 'Combine results' action happens in the execution_table.
If the custom function returned the sum instead of the range, what would be the result for group 'A'?
A15
B25
C10
D5
💡 Hint
Sum of [10, 15] is 25; check variable_tracker for group 'A' data.
Concept Snapshot
GroupBy with custom functions:
- Use df.groupby('col')['target'].apply(func)
- func gets each group's target column data
- func returns a value per group
- pandas combines results into Series
- Useful for custom summaries beyond built-in agg
Full Transcript
We start with a DataFrame and group rows by a column, here 'Team'. For each group, we select the 'Points' column and apply a custom function that calculates the range (max - min). The function runs separately on each group's points. The results are combined into a new Series indexed by team names. This process lets us summarize data in flexible ways using our own functions.