What is transform() for group-level operations in Pandas?

Pandasdata~5 mins

transform() for group-level operations in Pandas

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

We use transform() to change or add data based on groups without losing the original data shape. It helps us apply calculations to each group but keep the same number of rows.

You want to add a new column showing the average value within each group.

You need to normalize data inside groups but keep the original table size.

You want to fill missing values in each group using group-specific statistics.

You want to compare each row to the group's maximum or minimum value.

You want to create a new column that ranks values within each group.

Syntax

Pandas

DataFrame.groupby('column')['column_to_transform'].transform(function)

The function can be a string like 'mean', 'max', or a custom function.

transform() returns a Series or DataFrame with the same shape as the original data.

Examples

Calculate the average score for each team and assign it to each row in the original DataFrame.

Pandas

df.groupby('Team')['Score'].transform('mean')

Subtract the minimum value in each category from each value, keeping the original shape.

Pandas

df.groupby('Category')['Value'].transform(lambda x: x - x.min())

Find the maximum value in each group and assign it to all rows in that group.

Pandas

df.groupby('Group')['Data'].transform('max')

Sample Program

This code creates a small table with teams and scores. It then calculates the average score for each team and adds it as a new column. The original number of rows stays the same.

Pandas

import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'C', 'C'],
        'Score': [10, 15, 10, 20, 5, 7]}
df = pd.DataFrame(data)

# Calculate mean score per team and add as new column
df['Team_Mean_Score'] = df.groupby('Team')['Score'].transform('mean')

print(df)

OutputSuccess

Important Notes

transform() keeps the original DataFrame shape, unlike agg() which reduces it.

You can use built-in functions like 'mean', 'sum', or your own custom functions with transform().

If your function returns a single value per group, transform() repeats it for each row in that group.

Summary

transform() applies a function to each group but keeps the original data size.

It is useful to add group-level calculations as new columns without losing rows.

You can use built-in or custom functions with transform().