We use transform() to change or add data based on groups without losing the original data shape. It helps us apply calculations to each group but keep the same number of rows.
transform() for group-level operations in Pandas
DataFrame.groupby('column')['column_to_transform'].transform(function)
The function can be a string like 'mean', 'max', or a custom function.
transform() returns a Series or DataFrame with the same shape as the original data.
df.groupby('Team')['Score'].transform('mean')
df.groupby('Category')['Value'].transform(lambda x: x - x.min())
df.groupby('Group')['Data'].transform('max')
This code creates a small table with teams and scores. It then calculates the average score for each team and adds it as a new column. The original number of rows stays the same.
import pandas as pd data = {'Team': ['A', 'A', 'B', 'B', 'C', 'C'], 'Score': [10, 15, 10, 20, 5, 7]} df = pd.DataFrame(data) # Calculate mean score per team and add as new column df['Team_Mean_Score'] = df.groupby('Team')['Score'].transform('mean') print(df)
transform() keeps the original DataFrame shape, unlike agg() which reduces it.
You can use built-in functions like 'mean', 'sum', or your own custom functions with transform().
If your function returns a single value per group, transform() repeats it for each row in that group.
transform() applies a function to each group but keeps the original data size.
It is useful to add group-level calculations as new columns without losing rows.
You can use built-in or custom functions with transform().