0
0
ML Pythonml~8 mins

Target encoding in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Target encoding
Which metric matters for Target Encoding and WHY

Target encoding is a way to turn categories into numbers using the target values. The main goal is to improve model predictions. So, the key metrics to check are model accuracy, mean squared error (MSE) for regression, or accuracy, precision, recall, and F1-score for classification. These metrics tell us if the encoding helps the model learn patterns well.

Confusion Matrix Example

For classification tasks using target encoding, a confusion matrix helps us see how well the model predicts each class.

      Actual \ Predicted | Positive | Negative
      -------------------|----------|---------
      Positive           |    80    |   20
      Negative           |    10    |   90
    

Here, True Positives (TP) = 80, False Positives (FP) = 10, True Negatives (TN) = 90, False Negatives (FN) = 20.

Precision vs Recall Tradeoff with Target Encoding

Target encoding can cause overfitting if not done carefully. High precision means the model is good at predicting positives correctly without many false alarms. High recall means it finds most of the positive cases.

For example, in fraud detection, recall is more important because missing fraud is costly. If target encoding leaks information from the target, recall might look very high but the model fails on new data.

Good vs Bad Metric Values for Target Encoding

Good: Balanced precision and recall, stable accuracy on training and test sets, and low error rates. This means target encoding helped the model learn useful patterns without overfitting.

Bad: Very high accuracy on training but low on test, or very high precision but low recall. This suggests target leakage or overfitting caused by improper target encoding.

Common Pitfalls with Target Encoding Metrics
  • Data leakage: Using target values from the same data row to encode can leak information and inflate metrics.
  • Overfitting: Target encoding can cause the model to memorize training data, leading to poor generalization.
  • Accuracy paradox: High accuracy might hide poor recall or precision, especially with imbalanced classes.
  • Ignoring validation: Not using proper cross-validation when applying target encoding can give misleading metrics.
Self Check

Your model uses target encoding and shows 98% accuracy but only 12% recall on fraud cases. Is it good for production?

Answer: No. The low recall means the model misses most fraud cases, which is dangerous. The high accuracy is misleading because fraud is rare. You should improve recall before using this model.

Key Result
Target encoding improves model metrics only if it avoids data leakage and overfitting; balanced precision and recall are key indicators.