0
0
Pandasdata~5 mins

transform() for group-level operations in Pandas

Choose your learning style9 modes available
Introduction

We use transform() to change or add data based on groups without losing the original data shape. It helps us apply calculations to each group but keep the same number of rows.

You want to add a new column showing the average value within each group.
You need to normalize data inside groups but keep the original table size.
You want to fill missing values in each group using group-specific statistics.
You want to compare each row to the group's maximum or minimum value.
You want to create a new column that ranks values within each group.
Syntax
Pandas
DataFrame.groupby('column')['column_to_transform'].transform(function)

The function can be a string like 'mean', 'max', or a custom function.

transform() returns a Series or DataFrame with the same shape as the original data.

Examples
Calculate the average score for each team and assign it to each row in the original DataFrame.
Pandas
df.groupby('Team')['Score'].transform('mean')
Subtract the minimum value in each category from each value, keeping the original shape.
Pandas
df.groupby('Category')['Value'].transform(lambda x: x - x.min())
Find the maximum value in each group and assign it to all rows in that group.
Pandas
df.groupby('Group')['Data'].transform('max')
Sample Program

This code creates a small table with teams and scores. It then calculates the average score for each team and adds it as a new column. The original number of rows stays the same.

Pandas
import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'C', 'C'],
        'Score': [10, 15, 10, 20, 5, 7]}
df = pd.DataFrame(data)

# Calculate mean score per team and add as new column
df['Team_Mean_Score'] = df.groupby('Team')['Score'].transform('mean')

print(df)
OutputSuccess
Important Notes

transform() keeps the original DataFrame shape, unlike agg() which reduces it.

You can use built-in functions like 'mean', 'sum', or your own custom functions with transform().

If your function returns a single value per group, transform() repeats it for each row in that group.

Summary

transform() applies a function to each group but keeps the original data size.

It is useful to add group-level calculations as new columns without losing rows.

You can use built-in or custom functions with transform().