The ROC curve shows how well a model can tell apart positive and negative cases at different decision thresholds. The AUC (Area Under the Curve) summarizes this ability into one number between 0 and 1. A higher AUC means the model is better at ranking positive cases higher than negative ones. This is important when you want to understand the model's overall ability to distinguish classes, regardless of a specific cutoff.
ROC and AUC curves in TensorFlow - Model Metrics & Evaluation
ROC curve plots True Positive Rate (Recall) vs False Positive Rate:
Threshold: varies from 1.0 to 0.0
For each threshold:
TPR = TP / (TP + FN)
FPR = FP / (FP + TN)
Example confusion matrix at one threshold:
| Predicted Positive | Predicted Negative |
----|--------------------|--------------------|
Pos | TP=80 | FN=20 |
Neg | FP=10 | TN=90 |
TPR = 80 / (80 + 20) = 0.8
FPR = 10 / (10 + 90) = 0.1
The ROC curve connects these (FPR, TPR) points for all thresholds.
ROC and AUC focus on TPR (Recall) and FPR, not precision. But understanding tradeoffs helps:
- High Recall (TPR): Important in cancer detection to catch all sick patients, even if some healthy are flagged.
- Low False Positive Rate (FPR): Important in spam filters to avoid marking good emails as spam.
The ROC curve shows how changing the threshold affects this balance. A model with a curve closer to the top-left corner has better tradeoffs.
- Good AUC: Close to 1.0 means the model ranks positives above negatives almost perfectly.
- Random model: AUC around 0.5 means no better than guessing.
- Bad AUC: Close to 0 means the model ranks negatives above positives (worse than guessing).
Example: AUC = 0.85 is good, showing strong class separation. AUC = 0.55 is weak, barely better than random.
- Ignoring class imbalance: AUC can look good even if the model performs poorly on the minority class.
- Overfitting: High AUC on training but low on test means the model memorized data, not learned general patterns.
- Data leakage: If test data leaks into training, AUC will be unrealistically high.
- Misinterpretation: AUC does not tell you the best threshold to use; it only measures ranking ability.
Your model has 98% accuracy but an AUC of 0.6 on the test set. Is it good for production? Why or why not?
Answer: No, it is not good. The high accuracy might be due to class imbalance (many negatives). The AUC of 0.6 shows the model barely distinguishes positives from negatives. This means it may miss many positive cases or wrongly flag negatives, so it is unreliable.