Target encoding is a way to turn categories into numbers using the target values. The main goal is to improve model predictions. So, the key metrics to check are model accuracy, mean squared error (MSE) for regression, or accuracy, precision, recall, and F1-score for classification. These metrics tell us if the encoding helps the model learn patterns well.
Target encoding in ML Python - Model Metrics & Evaluation
For classification tasks using target encoding, a confusion matrix helps us see how well the model predicts each class.
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | 80 | 20
Negative | 10 | 90
Here, True Positives (TP) = 80, False Positives (FP) = 10, True Negatives (TN) = 90, False Negatives (FN) = 20.
Target encoding can cause overfitting if not done carefully. High precision means the model is good at predicting positives correctly without many false alarms. High recall means it finds most of the positive cases.
For example, in fraud detection, recall is more important because missing fraud is costly. If target encoding leaks information from the target, recall might look very high but the model fails on new data.
Good: Balanced precision and recall, stable accuracy on training and test sets, and low error rates. This means target encoding helped the model learn useful patterns without overfitting.
Bad: Very high accuracy on training but low on test, or very high precision but low recall. This suggests target leakage or overfitting caused by improper target encoding.
- Data leakage: Using target values from the same data row to encode can leak information and inflate metrics.
- Overfitting: Target encoding can cause the model to memorize training data, leading to poor generalization.
- Accuracy paradox: High accuracy might hide poor recall or precision, especially with imbalanced classes.
- Ignoring validation: Not using proper cross-validation when applying target encoding can give misleading metrics.
Your model uses target encoding and shows 98% accuracy but only 12% recall on fraud cases. Is it good for production?
Answer: No. The low recall means the model misses most fraud cases, which is dangerous. The high accuracy is misleading because fraud is rare. You should improve recall before using this model.