beginner

What does the groupby function do in pandas?

It splits the data into groups based on some criteria, like values in a column, so you can perform operations on each group separately.

Click to reveal answer

beginner

What is the purpose of the transform function after using groupby?

It applies a function to each group and returns a result that has the same shape as the original data, allowing you to keep the original data structure.

Click to reveal answer

intermediate

How can you normalize data within groups using groupby and transform?

You can subtract the group mean and divide by the group standard deviation for each value, using transform('mean') and transform('std') to get group statistics.

Click to reveal answer

beginner

Why is normalization within groups useful in data analysis?

It helps compare values fairly by removing group-specific effects, making patterns clearer when groups have different scales or averages.

Click to reveal answer

intermediate

Example: What does this code do?<br>

df['normalized'] = df.groupby('group')['value'].transform(lambda x: (x - x.mean()) / x.std())

It creates a new column 'normalized' where each 'value' is adjusted by subtracting the mean and dividing by the standard deviation of its group, scaling values within each group.

Click to reveal answer

What does transform return when used after groupby?

AA series with the same length as the original data

BA single aggregated value per group

CA DataFrame with fewer rows

DA list of groups

Which of these is a correct way to normalize values within groups using pandas?

Adf['norm'] = df['value'] / df['value'].max()

Bdf['norm'] = df.groupby('group')['value'].sum()

Cdf['norm'] = df['value'] - df['value'].mean()

Ddf['norm'] = df.groupby('group')['value'].transform(lambda x: (x - x.mean()) / x.std())

Why might you use groupby before normalizing data?

ATo remove missing values

BTo apply normalization across the entire dataset

CTo normalize values within each group separately

DTo sort the data

What happens if you use transform('mean') on a grouped column?

AIt returns the mean of each group repeated for each row in that group

BIt returns the original values unchanged

CIt returns the sum of each group

DIt returns the mean of the entire column

Which pandas method would you use to apply a custom function to each group and keep the original data shape?

Aapply()

Btransform()

Cagg()

Dfilter()

Explain how to normalize data within groups using pandas groupby and transform.

Why is it important to keep the original data shape when normalizing with transform?