Data Analysis Pythondata~5 mins

transform() for group-level operations in Data Analysis Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

The transform() function helps you change data within groups while keeping the original data shape. It lets you add new info or adjust values based on group details.

You want to add a new column showing each person's score compared to their group's average.

You need to fill missing values in a column using the average of their group.

You want to create a column that shows the rank of each item within its group.

You want to normalize values inside each group without losing the original row structure.

Syntax

Data Analysis Python

DataFrame.groupby('column')['target_column'].transform(function)

The transform() applies a function to each group and returns a result with the same size as the original data.

This means you can add or replace columns without changing the number of rows.

Examples

Calculate the average score for each team and assign it to each member's row.

Data Analysis Python

df.groupby('Team')['Score'].transform('mean')

Subtract the minimum value in each category from each value in that category.

Data Analysis Python

df.groupby('Category')['Value'].transform(lambda x: x - x.min())

Find the maximum sales in each group and assign it to all rows in that group.

Data Analysis Python

df.groupby('Group')['Sales'].transform('max')

Sample Program

This code creates a table of players with their scores and teams. It then adds the average score for each team to every player in that team. Finally, it shows how much each player's score differs from their team's average.

Data Analysis Python

import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'B'],
        'Player': ['John', 'Mike', 'Anna', 'Tom', 'Sara'],
        'Score': [10, 15, 10, 20, 30]}
df = pd.DataFrame(data)

# Calculate average score per team and add as new column
df['Team_Avg'] = df.groupby('Team')['Score'].transform('mean')

# Calculate score difference from team average
df['Diff_from_Avg'] = df['Score'] - df['Team_Avg']

print(df)

OutputSuccess

Important Notes

transform() keeps the original number of rows, unlike agg() which reduces rows.

You can use built-in functions like 'mean', 'max', or your own custom functions with lambda.

It is useful when you want to add group-level info back to the original data.

Summary

transform() applies a function to groups and returns a result matching the original data size.

It helps add or adjust columns based on group calculations without changing row count.

Use it to compare individual values to group stats or fill missing data within groups.