Data augmentation helps models learn better by showing more varied examples. The key metrics to watch are validation accuracy and validation loss. These show if the model is improving on new, unseen data, not just memorizing training data. A lower validation loss and higher validation accuracy mean the augmentation is helping the model generalize well.
Data augmentation in PyTorch - Model Metrics & Evaluation
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | 85 | 15
Negative | 10 | 90
This confusion matrix shows the model's predictions after training with data augmentation. The numbers add up to 200 samples. From this, we can calculate precision, recall, and F1 score to see how well the model performs.
Data augmentation can help improve both precision and recall by making the model see more varied examples. For example, in a face recognition app, high precision means fewer wrong matches, while high recall means fewer missed faces. Augmentation helps balance these by reducing overfitting and making the model robust to changes like lighting or angle.
- Good: Validation accuracy steadily improves or stays stable, validation loss decreases, and confusion matrix shows balanced true positives and true negatives.
- Bad: Validation accuracy drops or fluctuates wildly, validation loss increases, or confusion matrix shows many false positives or false negatives, indicating the model is confused despite augmentation.
- Accuracy Paradox: High accuracy but poor recall or precision can hide problems. For example, if data is imbalanced, accuracy alone is misleading.
- Data Leakage: Augmented data too similar to test data can inflate metrics falsely.
- Overfitting Indicators: Training accuracy much higher than validation accuracy means augmentation might not be enough or is not diverse.
Your model trained with data augmentation has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production?
No. The low recall means the model misses most positive cases, which is critical in fraud detection. Despite high accuracy, the model fails to catch important examples. You should improve augmentation or model to raise recall.