Challenge - 5 Problems

🎖️

Transform Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of group transform with mean subtraction

What is the output of the following code snippet?

Pandas

import pandas as pd

df = pd.DataFrame({'Team': ['A', 'A', 'B', 'B'], 'Points': [10, 20, 30, 40]})
df['AdjPoints'] = df.groupby('Team')['Points'].transform(lambda x: x - x.mean())
print(df)

  Team  Points  AdjPoints
0    A      10       -5.0
1    A      20        5.0
2    B      30       -5.0
3    B      40        5.0

  Team  Points  AdjPoints
0    A      10       10.0
1    A      20       20.0
2    B      30       30.0
3    B      40       40.0

  Team  Points  AdjPoints
0    A      10       15.0
1    A      20       15.0
2    B      30       35.0
3    B      40       35.0

  Team  Points  AdjPoints
0    A      10       -10.0
1    A      20       10.0
2    B      30       -30.0
3    B      40       30.0

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of items after transform with duplicated values

Given the code below, how many rows does the resulting DataFrame have?

Pandas

import pandas as pd

df = pd.DataFrame({'Category': ['X', 'X', 'Y', 'Y', 'Y'], 'Value': [1, 2, 3, 4, 5]})
df['Rank'] = df.groupby('Category')['Value'].transform('rank')
print(len(df))

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in transform usage

What error does the following code raise?

Pandas

import pandas as pd

df = pd.DataFrame({'Group': ['G1', 'G1', 'G2'], 'Score': [10, 20, 30]})
df['Scaled'] = df.groupby('Group')['Score'].transform(lambda x: x / x.sum())
print(df)

ATypeError: unsupported operand type(s) for /: 'int' and 'method'

BNo error, outputs scaled scores per group

CAttributeError: 'SeriesGroupBy' object has no attribute 'sum'

DValueError: Length of values does not match length of index

Attempts:

2 left

🚀 Application

advanced

2:30remaining

Using transform to fill missing values with group mean

You have a DataFrame with missing values in the 'Sales' column. Which code snippet correctly fills missing 'Sales' values with the mean sales of their group in 'Region'?

Pandas

import pandas as pd
import numpy as np

df = pd.DataFrame({'Region': ['East', 'East', 'West', 'West'], 'Sales': [100, np.nan, 200, np.nan]})

Adf['Sales'] = df['Sales'].fillna(df.groupby('Region')['Sales'].mean())

Bdf['Sales'] = df.groupby('Region')['Sales'].transform('mean')

Cdf['Sales'] = df.groupby('Region')['Sales'].apply(lambda x: x.fillna(x.mean()))

Ddf['Sales'] = df.groupby('Region')['Sales'].transform(lambda x: x.fillna(x.mean()))

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Why use transform instead of apply for group-level operations?

Which statement best explains why transform is preferred over apply for group-level operations that return a Series with the same shape as the original DataFrame?

Aapply always returns a DataFrame, transform always returns a scalar.

Bapply is faster than transform but cannot handle group operations.

Ctransform returns a Series aligned with the original DataFrame, preserving index and shape, while apply may return aggregated or differently shaped results.

Dtransform can only be used with numeric data, apply works with all data types.

Attempts:

2 left