Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is target encoding in machine learning?
Target encoding is a technique that replaces categorical values with the average of the target variable for those categories. It helps models use categorical data effectively.
Click to reveal answer
intermediate
Why do we use target encoding instead of one-hot encoding for high-cardinality features?
Target encoding reduces the number of new features created, avoiding a large increase in data size and helping models learn better from categories with many unique values.
Click to reveal answer
intermediate
How does target encoding help prevent overfitting?
By using techniques like smoothing and cross-validation, target encoding avoids leaking target information from the training set to the model, which helps prevent overfitting.
Click to reveal answer
advanced
What is smoothing in target encoding?
Smoothing blends the category's target mean with the overall target mean to reduce noise from categories with few samples, making the encoding more stable.
Click to reveal answer
beginner
Give a simple example of target encoding for a categorical feature with three categories and their target means.
If a feature has categories A, B, C with target means 0.2, 0.5, and 0.8 respectively, target encoding replaces A with 0.2, B with 0.5, and C with 0.8 in the dataset.
Click to reveal answer
What does target encoding replace a categorical value with?
AA unique integer ID
BA one-hot vector
CThe average target value for that category
DThe frequency of the category
✗ Incorrect
Target encoding replaces each category with the average of the target variable for that category.
Why is smoothing used in target encoding?
ATo reduce noise from categories with few samples
BTo increase the number of features
CTo speed up encoding
DTo convert numerical data to categorical
✗ Incorrect
Smoothing helps reduce noise by blending category target means with the overall mean, especially for rare categories.
Which problem can target encoding help solve better than one-hot encoding?
AHandling missing values
BEncoding high-cardinality categorical features
CScaling numerical features
DReducing dataset size by removing features
✗ Incorrect
Target encoding is useful for high-cardinality categorical features where one-hot encoding would create too many new columns.
What is a risk of using target encoding without precautions?
AModel underfitting
BLoss of categorical information
CSlower training time
DData leakage leading to overfitting
✗ Incorrect
Without careful handling, target encoding can leak target information into features, causing overfitting.
Which method helps avoid overfitting when applying target encoding?
AUsing cross-validation to compute encodings
BIgnoring rare categories
CUsing one-hot encoding instead
DNormalizing the target variable
✗ Incorrect
Computing target encodings using cross-validation prevents the model from seeing the target values of the same data points during training.
Explain what target encoding is and why it is useful for categorical data.
Think about how categorical values can be turned into numbers that carry information about the target.
You got /3 concepts.
Describe how smoothing and cross-validation help make target encoding more reliable.
Consider how to avoid mistakes when using target information to encode features.
You got /3 concepts.
Practice
(1/5)
1. What is the main purpose of target encoding in machine learning?
easy
A. Remove missing values from the dataset
B. Normalize numerical features to a 0-1 scale
C. Create new categorical features by combining existing ones
D. Convert categorical variables into numbers using the average target value
Solution
Step 1: Understand what target encoding does
Target encoding replaces categories with the average value of the target variable for each category.
Step 2: Compare with other options
Normalization scales numbers, missing value removal cleans data, and feature creation combines categories, none of which describe target encoding.
Final Answer:
Convert categorical variables into numbers using the average target value -> Option D
Quick Check:
Target encoding = average target per category [OK]
Hint: Target encoding uses target averages to convert categories [OK]
Common Mistakes:
Confusing target encoding with normalization
Thinking target encoding creates new categories
Assuming target encoding removes missing data
2. Which of the following Python code snippets correctly applies target encoding using pandas for a training dataset train_df with categorical column cat_col and target target?
easy
A. mean_target = train_df.groupby('cat_col')['target'].mean(); train_df['cat_encoded'] = train_df['cat_col'].map(mean_target)
B. train_df['cat_encoded'] = train_df['cat_col'].astype('category').cat.codes
C. train_df['cat_encoded'] = train_df['target'].mean()
D. train_df['cat_encoded'] = train_df['cat_col'].apply(lambda x: len(x))
Solution
Step 1: Identify correct target encoding method
Target encoding maps each category to the mean target value for that category, done by grouping and mapping.
Step 2: Check code correctness
mean_target = train_df.groupby('cat_col')['target'].mean(); train_df['cat_encoded'] = train_df['cat_col'].map(mean_target) groups by category, calculates mean target, then maps it back correctly. Other options do not compute mean target per category.
Final Answer:
mean_target = train_df.groupby('cat_col')['target'].mean(); train_df['cat_encoded'] = train_df['cat_col'].map(mean_target) -> Option A
Quick Check:
Group by category and map mean target [OK]
Hint: Group by category and map mean target for encoding [OK]
Common Mistakes:
Using category codes instead of target mean
Assigning overall mean target to all rows
Mapping category length instead of target mean
3. Given the following code, what will be the output of print(test_df['cat_encoded'].tolist())?
Step 1: Calculate mean target per category from training data
'A' has targets [1,1] mean=1.0, 'B' has [0] mean=0.0, 'C' has [0] mean=0.0.
Step 2: Map test categories and fill missing
Test categories 'A','B','C' map to 1.0,0.0,0.0 respectively. 'D' is missing, so fillna(0.5) sets it to 0.5.
Final Answer:
[1.0, 0.0, 0.0, 0.5] -> Option A
Quick Check:
Map known means, fill unknown with 0.5 [OK]
Hint: Fill missing categories with default value after mapping [OK]
Common Mistakes:
Not filling missing categories, resulting in NaN
Using overall mean instead of per-category mean
Miscomputing mean target values
4. You applied target encoding on your training data and then directly applied the same encoding on test data using the training means. However, your model shows signs of overfitting. What is the most likely mistake?
medium
A. You replaced missing values with zero instead of the mean
B. You did not normalize the target variable before encoding
C. You used target encoding on the entire dataset before splitting into train and test
D. You used one-hot encoding instead of target encoding
Solution
Step 1: Understand overfitting cause in target encoding
Overfitting often happens if target encoding uses information from the test set or entire data before splitting.
Step 2: Identify mistake in data leakage
Encoding before splitting leaks target info from test data into training, causing overfitting. Other options do not explain this leakage.
Final Answer:
You used target encoding on the entire dataset before splitting into train and test -> Option C
Quick Check:
Encoding before split causes data leakage [OK]
Hint: Always fit encoding only on training data to avoid leakage [OK]
Common Mistakes:
Encoding before train-test split causing leakage
Confusing normalization with encoding
Ignoring missing value handling
5. You have a categorical feature with many rare categories in your training data. How can you apply target encoding to reduce overfitting caused by these rare categories?
hard
A. Use one-hot encoding instead of target encoding for rare categories
B. Use smoothing by combining category mean with overall mean weighted by category frequency
C. Apply target encoding only on the most frequent category and ignore others
D. Replace rare categories with a fixed constant before encoding
Solution
Step 1: Understand overfitting from rare categories
Rare categories have few samples, so their target mean can be noisy and cause overfitting.
Step 2: Apply smoothing to reduce noise
Smoothing blends the category mean with the overall mean, weighted by how many samples the category has, reducing noise for rare categories.
Final Answer:
Use smoothing by combining category mean with overall mean weighted by category frequency -> Option B
Quick Check:
Smoothing balances rare category means with global mean [OK]
Hint: Smooth rare categories by mixing with overall mean [OK]
Common Mistakes:
Ignoring rare categories causing noisy means
Replacing rare categories with constants losing info
Using one-hot encoding which increases dimensionality