Linear layers are often used in classification or regression tasks. For classification, accuracy, precision, recall, and F1 score are important to understand how well the model predicts classes. For regression, mean squared error (MSE) or mean absolute error (MAE) show how close predictions are to true values. These metrics help us know if the linear layer is learning useful patterns.
Linear (fully connected) layers in PyTorch - Model Metrics & Evaluation
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | 50 | 10
Negative | 5 | 35
Total samples = 50 + 10 + 5 + 35 = 100
Precision = TP / (TP + FP) = 50 / (50 + 5) = 0.91
Recall = TP / (TP + FN) = 50 / (50 + 10) = 0.83
Accuracy = (TP + TN) / Total = (50 + 35) / 100 = 0.85
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.87
Imagine a spam email detector using a linear layer. If it marks too many emails as spam (high recall), it might wrongly block good emails (low precision). If it marks very few emails as spam (high precision), it might miss many spam emails (low recall). Depending on what matters more, we adjust the model or threshold.
For example, in medical diagnosis, missing a disease (low recall) is worse than false alarms (low precision). So, recall is prioritized. Linear layers can be tuned to balance this tradeoff.
Good metrics: Accuracy above 80%, precision and recall balanced above 0.8, and F1 score close to 0.85 or higher indicate the linear layer is learning well.
Bad metrics: Accuracy near 50% (random guessing), precision or recall below 0.5, or very unbalanced precision and recall (e.g., precision 0.9 but recall 0.2) show the model struggles to learn or is biased.
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, 95% accuracy if 95% of data is one class.
- Data leakage: If test data leaks into training, metrics look unrealistically good.
- Overfitting: Very high training accuracy but low test accuracy means the linear layer memorized training data but won't generalize.
- Ignoring class imbalance: Not using precision, recall, or F1 when classes are uneven can hide poor performance on minority classes.
Your linear model has 98% accuracy but only 12% recall on fraud cases. Is it good for production? Why or why not?
Answer: No, it is not good. The model misses 88% of fraud cases (low recall), which is dangerous. Even though accuracy is high, it mostly predicts non-fraud correctly but fails to catch fraud. For fraud detection, recall is critical.