When you replace a classifier head in a model, you want to check how well the new head predicts the correct classes. The key metrics are accuracy to see overall correctness, and precision and recall to understand how well it finds true positives without too many mistakes. This helps you know if the new head is learning properly and making good predictions.
Replacing classifier head in PyTorch - Model Metrics & Evaluation
Confusion Matrix (for 100 samples):
Predicted
Pos Neg
Actual
Pos 40 10
Neg 5 45
TP = 40, FP = 5, TN = 45, FN = 10
Precision = TP / (TP + FP) = 40 / (40 + 5) = 0.89
Recall = TP / (TP + FN) = 40 / (40 + 10) = 0.80
Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.84
If your new classifier head is used for spam detection, high precision is important. You want to avoid marking good emails as spam (false positives). So, fewer false alarms matter more.
If it is used for medical diagnosis, like cancer detection, high recall is critical. You want to catch as many true cases as possible, even if some false alarms happen.
Replacing the classifier head can change this balance. You must check which metric fits your goal and tune the head accordingly.
Good: Accuracy above 80%, precision and recall both above 75%, and balanced F1 score. This means the new head predicts well and finds most true cases without many mistakes.
Bad: Accuracy below 60%, or very low precision (e.g., 30%) or recall (e.g., 20%). This shows the new head is not learning well or is biased, missing many true cases or making many wrong predictions.
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, if 90% of data is one class, predicting that class always gives 90% accuracy but poor real performance.
- Data leakage: If test data leaks into training, metrics look too good but model fails in real use.
- Overfitting: New head may memorize training data but perform poorly on new data. Watch for big gaps between training and validation metrics.
- Ignoring class balance: Metrics like precision and recall per class matter more than overall accuracy when classes differ in size.
Your model with the replaced classifier head has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the model misses most positive cases (fraud). Even with high accuracy, it fails to find the important cases. For fraud detection, high recall is critical to catch frauds.