Threshold tuning changes the cutoff point for deciding if a prediction is positive or negative. This affects Precision and Recall. We focus on these because changing the threshold shifts how many true positives, false positives, and false negatives we get. Accuracy alone can hide these changes. So, Precision and Recall help us understand the balance between catching positives and avoiding false alarms.
Threshold tuning in ML Python - Model Metrics & Evaluation
+-----------------------+
| Confusion Matrix |
+-----------------------+
| | Predicted |
| Actual | Pos | Neg |
+----------+-----+-----+
| Pos | TP=80 | FN=20 |
| Neg | FP=10 | TN=90 |
+-----------------------+
Total samples = 80 + 20 + 10 + 90 = 200
Precision = TP / (TP + FP) = 80 / (80 + 10) = 0.89
Recall = TP / (TP + FN) = 80 / (80 + 20) = 0.80
When you lower the threshold, the model predicts more positives:
- Recall increases: You catch more true positives (good for disease detection).
- Precision decreases: You get more false positives (more healthy people flagged).
When you raise the threshold, the model predicts fewer positives:
- Precision increases: Most predicted positives are correct (good for spam filters).
- Recall decreases: You miss some true positives.
Threshold tuning helps find the best balance for your problem.
Good: Precision and Recall values close to each other and high (e.g., both above 0.8) show a balanced threshold.
Bad: Very high Precision but very low Recall means many positives are missed. Very high Recall but very low Precision means many false alarms.
Example: Precision=0.95 and Recall=0.30 is bad if missing positives is costly.
- Ignoring class imbalance: Accuracy can be misleading if one class is much bigger.
- Overfitting threshold: Tuning threshold on test data leaks information and inflates performance.
- Using only one metric: Focusing only on Precision or Recall hides the full picture.
- Not validating threshold: Threshold should be chosen using validation data, not training or test data.
Your model has 98% accuracy but 12% recall on fraud detection. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall (12%) means the model misses most fraud cases, which is dangerous. High accuracy is misleading because fraud is rare, so the model mostly predicts no fraud correctly but fails to catch fraud.