0
0
ML Pythonml~20 mins

Target encoding in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Target Encoding Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use target encoding instead of one-hot encoding?

Which of the following is the main advantage of using target encoding for categorical variables compared to one-hot encoding?

ATarget encoding always prevents overfitting better than one-hot encoding.
BTarget encoding converts categorical variables into multiple binary columns.
CTarget encoding reduces dimensionality by converting categories into a single numeric value based on the target variable.
DTarget encoding does not require the target variable during training.
Attempts:
2 left
💡 Hint

Think about how target encoding transforms categories compared to one-hot encoding.

Predict Output
intermediate
2:00remaining
Output of target encoding with smoothing

Given the following data and target encoding with smoothing, what is the encoded value for category 'B'?

ML Python
import pandas as pd

data = pd.DataFrame({'category': ['A', 'B', 'B', 'C', 'A', 'B'], 'target': [1, 0, 1, 0, 1, 1]})

# Global mean of target
global_mean = data['target'].mean()

# Calculate category mean and count
category_stats = data.groupby('category')['target'].agg(['mean', 'count'])

# Smoothing factor
alpha = 2

# Calculate smoothed mean
category_stats['smoothed'] = (category_stats['mean'] * category_stats['count'] + global_mean * alpha) / (category_stats['count'] + alpha)

encoded_value_B = category_stats.loc['B', 'smoothed']
print(round(encoded_value_B, 3))
A0.5
B0.667
C0.75
D0.4
Attempts:
2 left
💡 Hint

Calculate the category mean and count for 'B', then apply the smoothing formula.

Model Choice
advanced
2:00remaining
Choosing target encoding for model type

For which type of model is target encoding most beneficial compared to one-hot encoding?

ANeural networks that require one-hot encoded inputs.
BTree-based models like random forests that handle categorical variables natively.
CK-nearest neighbors models that rely on distance metrics.
DLinear regression models that assume numeric input features.
Attempts:
2 left
💡 Hint

Consider how different models interpret numeric vs categorical features.

Metrics
advanced
2:00remaining
Effect of target encoding on model evaluation metrics

What is a common risk when using target encoding without proper cross-validation, and how does it affect evaluation metrics?

AData leakage causing overly optimistic metrics like accuracy or RMSE on validation data.
BUnderfitting leading to poor training accuracy but good validation accuracy.
CIncreased model bias causing metrics to be consistently low on both train and test sets.
DNo effect on metrics because target encoding is independent of the target variable.
Attempts:
2 left
💡 Hint

Think about what happens if the target information leaks into the features during training.

🔧 Debug
expert
2:00remaining
Debugging target encoding implementation error

Consider this code snippet for target encoding. What error will it raise when run?

ML Python
import pandas as pd

data = pd.DataFrame({'cat': ['x', 'y', 'x', 'z'], 'target': [1, 0, 1, 0]})

means = data.groupby('cat')['target'].mean()

# Incorrect: trying to map means to original data without converting to dict
encoded = data['cat'].map(means)

print(encoded)
ANo error; prints encoded values correctly.
BKeyError because 'cat' values are missing in means index.
CTypeError because map expects a dict but gets a Series.
DValueError due to mismatched index during mapping.
Attempts:
2 left
💡 Hint

Check if pandas Series can be used directly with map for mapping values.