0
0
ML Pythonml~8 mins

Threshold tuning in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Threshold tuning
Which metric matters for Threshold tuning and WHY

Threshold tuning changes the cutoff point for deciding if a prediction is positive or negative. This affects Precision and Recall. We focus on these because changing the threshold shifts how many true positives, false positives, and false negatives we get. Accuracy alone can hide these changes. So, Precision and Recall help us understand the balance between catching positives and avoiding false alarms.

Confusion matrix example
      +-----------------------+
      |       Confusion Matrix |
      +-----------------------+
      |          | Predicted  |
      | Actual   | Pos | Neg |
      +----------+-----+-----+
      | Pos      | TP=80 | FN=20 |
      | Neg      | FP=10 | TN=90 |
      +-----------------------+

      Total samples = 80 + 20 + 10 + 90 = 200

      Precision = TP / (TP + FP) = 80 / (80 + 10) = 0.89
      Recall = TP / (TP + FN) = 80 / (80 + 20) = 0.80
    
Precision vs Recall tradeoff with examples

When you lower the threshold, the model predicts more positives:

  • Recall increases: You catch more true positives (good for disease detection).
  • Precision decreases: You get more false positives (more healthy people flagged).

When you raise the threshold, the model predicts fewer positives:

  • Precision increases: Most predicted positives are correct (good for spam filters).
  • Recall decreases: You miss some true positives.

Threshold tuning helps find the best balance for your problem.

Good vs Bad metric values for Threshold tuning

Good: Precision and Recall values close to each other and high (e.g., both above 0.8) show a balanced threshold.

Bad: Very high Precision but very low Recall means many positives are missed. Very high Recall but very low Precision means many false alarms.

Example: Precision=0.95 and Recall=0.30 is bad if missing positives is costly.

Common pitfalls in Threshold tuning metrics
  • Ignoring class imbalance: Accuracy can be misleading if one class is much bigger.
  • Overfitting threshold: Tuning threshold on test data leaks information and inflates performance.
  • Using only one metric: Focusing only on Precision or Recall hides the full picture.
  • Not validating threshold: Threshold should be chosen using validation data, not training or test data.
Self-check question

Your model has 98% accuracy but 12% recall on fraud detection. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall (12%) means the model misses most fraud cases, which is dangerous. High accuracy is misleading because fraud is rare, so the model mostly predicts no fraud correctly but fails to catch fraud.

Key Result
Threshold tuning balances Precision and Recall by adjusting the cutoff, crucial for matching model behavior to real-world needs.