PyTorchml~8 mins

Defining a model class in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Defining a model class

Which metric matters for Defining a model class and WHY

When you define a model class in PyTorch, the key metric to focus on is the training loss. This tells you how well your model is learning from the data during training. Loss measures the difference between the model's predictions and the actual answers. A lower loss means the model is getting better at making predictions. Once training loss improves, you can check accuracy or other metrics on test data to see if the model generalizes well.

Confusion matrix or equivalent visualization

For classification models defined in PyTorch, a confusion matrix helps visualize how well the model predicts each class. It shows counts of:

True Positives (TP): Correct positive predictions
False Positives (FP): Incorrect positive predictions
True Negatives (TN): Correct negative predictions
False Negatives (FN): Missed positive cases

Confusion Matrix Example:
          Predicted
          P     N
Actual P  50    10
       N  5     35

Total samples = 50 + 10 + 5 + 35 = 100

This matrix helps calculate precision, recall, and accuracy to evaluate the model defined in your class.

Precision vs Recall tradeoff with concrete examples

When defining a model class, you decide how it learns and predicts. Depending on the task, you might want to optimize for:

Precision: How many predicted positives are actually correct? Important when false alarms are costly. Example: Email spam filter - you don't want to mark good emails as spam.
Recall: How many actual positives did the model find? Important when missing positives is risky. Example: Cancer detection - you want to catch as many cases as possible.

Adjusting your model's structure or training can shift this balance. For example, changing the output activation or loss function in your model class affects precision and recall.

What "good" vs "bad" metric values look like for this use case

For a well-defined PyTorch model class:

Good: Training loss steadily decreases, validation loss decreases or stabilizes, accuracy improves, and precision/recall are balanced based on the task.
Bad: Training loss stays high or fluctuates wildly, validation loss increases (overfitting), accuracy is low, or precision/recall are very unbalanced (e.g., high precision but very low recall).

Good metrics mean your model class is correctly defined and learning well. Bad metrics suggest bugs in the model code or poor design choices.

Metrics pitfalls

Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, if 95% of data is one class, predicting that class always gives 95% accuracy but no real learning.
Data leakage: If your model class accidentally uses test data during training, metrics will look unrealistically good.
Overfitting indicators: Training loss very low but validation loss high means the model memorizes training data but fails on new data.
Incorrect metric calculation: Using wrong formulas for precision or recall leads to wrong conclusions about model quality.

Self-check question

Your PyTorch model class training shows 98% accuracy but only 12% recall on fraud cases. Is it good for production? Why or why not?

Answer: No, it is not good. The model misses 88% of fraud cases (low recall), which is dangerous because many frauds go undetected. High accuracy is misleading here because fraud cases are rare. You should improve recall to catch more fraud.

Key Result

Training loss guides model learning; precision and recall reveal prediction quality and tradeoffs.