0
0
ML Pythonml~10 mins

Target encoding in ML Python - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to calculate the mean target value for each category.

ML Python
mean_target = df.groupby('category')['target'].[1]()
Drag options to blanks, or click blank then click option'
Asum
Bmax
Cmean
Dcount
Attempts:
3 left
💡 Hint
Common Mistakes
Using sum instead of mean
Using count which counts entries, not averages
2fill in blank
medium

Complete the code to map the target mean to the original dataframe's category column.

ML Python
df['category_encoded'] = df['category'].[1](mean_target)
Drag options to blanks, or click blank then click option'
Amap
Bapply
Creplace
Dfillna
Attempts:
3 left
💡 Hint
Common Mistakes
Using apply which is slower and less direct
Using replace which is for replacing values but not with a mapping dictionary
3fill in blank
hard

Fix the error in the code to avoid data leakage by computing target encoding only on the training set.

ML Python
mean_target_train = train_df.groupby('category')['target'].[1]()
train_df['category_encoded'] = train_df['category'].map(mean_target_train)
test_df['category_encoded'] = test_df['category'].map(mean_target_train).fillna([2])
Drag options to blanks, or click blank then click option'
Amean
Bsum
Ccount
D0
Attempts:
3 left
💡 Hint
Common Mistakes
Calculating mean on the whole dataset causing leakage
Not filling missing values in test data causing NaNs
4fill in blank
hard

Fill both blanks to create a target encoding function that fits on training data and transforms new data.

ML Python
def target_encode(train, test, cat_col, target_col):
    mean_target = train.groupby([1])[[2]].mean()
    train_encoded = train[cat_col].map(mean_target)
    test_encoded = test[cat_col].map(mean_target).fillna(mean_target.mean())
    return train_encoded, test_encoded
Drag options to blanks, or click blank then click option'
A'category'
B'target'
C'category_encoded'
D'target_encoded'
Attempts:
3 left
💡 Hint
Common Mistakes
Using encoded column names instead of original columns
Mixing up target and category column names
5fill in blank
hard

Fill all three blanks to apply target encoding with smoothing to reduce noise for categories with few samples.

ML Python
def smooth_target_encode(train, test, cat_col, target_col, m=5):
    mean = train[target_col].mean()
    agg = train.groupby([1])[[2]].agg(['mean', 'count'])
    smooth = (agg['count'] * agg['mean'] + m * mean) / (agg['count'] + m)
    train_encoded = train[cat_col].map(smooth)
    test_encoded = test[cat_col].map(smooth).fillna(mean)
    return train_encoded, test_encoded
Drag options to blanks, or click blank then click option'
A'category'
B'target'
Attempts:
3 left
💡 Hint
Common Mistakes
Swapping category and target columns in groupby or agg
Not filling missing values in test data