Challenge - 5 Problems

🎖️

Transform Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of transform() with group mean

What is the output of the following code snippet using transform() to calculate the group mean?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({'Team': ['A', 'A', 'B', 'B', 'B'], 'Points': [10, 20, 30, 40, 50]})
df['MeanPoints'] = df.groupby('Team')['Points'].transform('mean')
print(df)

A{'Team': ['A', 'A', 'B', 'B', 'B'], 'Points': [10, 20, 30, 40, 50], 'MeanPoints': [10.0, 20.0, 30.0, 40.0, 50.0]}

B{'Team': ['A', 'A', 'B', 'B', 'B'], 'Points': [10, 20, 30, 40, 50], 'MeanPoints': [15.0, 15.0, 40.0, 40.0, 40.0]}

C{'Team': ['A', 'A', 'B', 'B', 'B'], 'Points': [10, 20, 30, 40, 50], 'MeanPoints': [15.0, 20.0, 40.0, 40.0, 50.0]}

D{'Team': ['A', 'A', 'B', 'B', 'B'], 'Points': [10, 20, 30, 40, 50], 'MeanPoints': [10.0, 10.0, 30.0, 30.0, 30.0]}

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of items after transform()

After applying transform() on a grouped DataFrame, how many rows does the resulting Series have compared to the original DataFrame?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({'Category': ['X', 'X', 'Y', 'Y', 'Y'], 'Value': [5, 10, 15, 20, 25]})
result = df.groupby('Category')['Value'].transform('max')
print(len(result))

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in transform() usage

What error does the following code raise?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({'Group': ['G1', 'G1', 'G2'], 'Score': [1, 2, 3]})
df['Result'] = df.groupby('Group')['Score'].transform(lambda x: x + x.shift())
print(df)

ATypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

BKeyError: 'Score'

CValueError: Length of values does not match length of index

DNo error, prints DataFrame with NaN in first row of each group

Attempts:

2 left

🚀 Application

advanced

2:30remaining

Using transform() to normalize data by group

You want to normalize the 'Sales' column within each 'Region' by subtracting the group mean and dividing by the group standard deviation. Which code snippet correctly does this using transform()?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({'Region': ['East', 'East', 'West', 'West', 'West'], 'Sales': [100, 150, 200, 250, 300]})

Adf['Normalized'] = df.groupby('Region')['Sales'].transform(lambda x: (x - x.mean()) / x.std())

Bdf['Normalized'] = df.groupby('Region')['Sales'].apply(lambda x: (x - x.mean()) / x.std())

Cdf['Normalized'] = df['Sales'] / df.groupby('Region')['Sales'].transform('mean')

Ddf['Normalized'] = (df['Sales'] - df.groupby('Region')['Sales'].transform('mean')) / df.groupby('Region')['Sales'].transform('std')

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Why use transform() instead of apply() for group-level operations?

Which statement best explains why transform() is preferred over apply() when you want to add a group-level statistic as a new column to the original DataFrame?

A<code>apply()</code> always returns a scalar value for each group, so it cannot be used to create new columns.

B<code>transform()</code> modifies the original DataFrame in place, while <code>apply()</code> creates a copy.

C<code>transform()</code> returns a Series with the same index and length as the original DataFrame, allowing direct assignment as a new column.

D<code>apply()</code> is slower because it does not support lambda functions, unlike <code>transform()</code>.

Attempts:

2 left