When you define a model class in PyTorch, the key metric to focus on is the training loss. This tells you how well your model is learning from the data during training. Loss measures the difference between the model's predictions and the actual answers. A lower loss means the model is getting better at making predictions. Once training loss improves, you can check accuracy or other metrics on test data to see if the model generalizes well.
Defining a model class in PyTorch - Model Metrics & Evaluation
For classification models defined in PyTorch, a confusion matrix helps visualize how well the model predicts each class. It shows counts of:
- True Positives (TP): Correct positive predictions
- False Positives (FP): Incorrect positive predictions
- True Negatives (TN): Correct negative predictions
- False Negatives (FN): Missed positive cases
Confusion Matrix Example:
Predicted
P N
Actual P 50 10
N 5 35
Total samples = 50 + 10 + 5 + 35 = 100This matrix helps calculate precision, recall, and accuracy to evaluate the model defined in your class.
When defining a model class, you decide how it learns and predicts. Depending on the task, you might want to optimize for:
- Precision: How many predicted positives are actually correct? Important when false alarms are costly. Example: Email spam filter - you don't want to mark good emails as spam.
- Recall: How many actual positives did the model find? Important when missing positives is risky. Example: Cancer detection - you want to catch as many cases as possible.
Adjusting your model's structure or training can shift this balance. For example, changing the output activation or loss function in your model class affects precision and recall.
For a well-defined PyTorch model class:
- Good: Training loss steadily decreases, validation loss decreases or stabilizes, accuracy improves, and precision/recall are balanced based on the task.
- Bad: Training loss stays high or fluctuates wildly, validation loss increases (overfitting), accuracy is low, or precision/recall are very unbalanced (e.g., high precision but very low recall).
Good metrics mean your model class is correctly defined and learning well. Bad metrics suggest bugs in the model code or poor design choices.
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, if 95% of data is one class, predicting that class always gives 95% accuracy but no real learning.
- Data leakage: If your model class accidentally uses test data during training, metrics will look unrealistically good.
- Overfitting indicators: Training loss very low but validation loss high means the model memorizes training data but fails on new data.
- Incorrect metric calculation: Using wrong formulas for precision or recall leads to wrong conclusions about model quality.
Your PyTorch model class training shows 98% accuracy but only 12% recall on fraud cases. Is it good for production? Why or why not?
Answer: No, it is not good. The model misses 88% of fraud cases (low recall), which is dangerous because many frauds go undetected. High accuracy is misleading here because fraud cases are rare. You should improve recall to catch more fraud.