0
0
ML Pythonml~12 mins

Target encoding in ML Python - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Target encoding

Target encoding replaces categories in a feature with the average value of the target for those categories. This helps models use categorical data better by turning categories into numbers that relate to the prediction goal.

Data Flow - 4 Stages
1Raw data input
1000 rows x 3 columnsOriginal dataset with categorical feature and target1000 rows x 3 columns
Feature: Color = ['Red', 'Blue', 'Green', ...], Target: Sale = [1, 0, 1, ...]
2Calculate target mean per category
1000 rows x 3 columnsGroup by categorical feature and compute mean targetNumber of unique categories x 2 columns
Color: Red -> mean target 0.7, Blue -> 0.4, Green -> 0.5
3Replace categories with target mean
1000 rows x 3 columnsMap each category to its target mean value1000 rows x 3 columns
Color: Red replaced by 0.7, Blue by 0.4, Green by 0.5
4Model training
1000 rows x 3 columnsTrain model using encoded feature and other featuresTrained model
Model learns relationship between encoded color and sale
Training Trace - Epoch by Epoch

Loss
0.7 |****
0.6 |*** 
0.5 |**  
0.4 |*   
0.3 |    
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.650.6Model starts learning, loss is high, accuracy low
20.50.72Loss decreases, accuracy improves as model learns patterns
30.40.8Model continues improving, better fit to data
40.350.85Loss decreases further, accuracy rises
50.320.87Model converges with good accuracy
Prediction Trace - 3 Layers
Layer 1: Input sample
Layer 2: Target encoding mapping
Layer 3: Model prediction
Model Quiz - 3 Questions
Test your understanding
What does target encoding replace in the data?
ATarget values with categories
BCategories with their average target value
CCategories with random numbers
DMissing values with zero
Key Insight
Target encoding helps convert categories into meaningful numbers based on the target. This makes it easier for models to find patterns and improve prediction accuracy.