When using data augmentation in computer vision, the key metric to watch is generalization performance, often measured by validation accuracy or validation loss. Augmentation creates new, varied images from existing ones, helping the model learn to recognize objects under different conditions. This reduces overfitting and improves how well the model works on new, unseen images. So, metrics that show how well the model performs on data it has never seen before are the most important.
0
0
Why augmentation multiplies training data in Computer Vision - Why Metrics Matter
Metrics & Evaluation - Why augmentation multiplies training data
Which metric matters for this concept and WHY
Confusion matrix or equivalent visualization (ASCII)
Confusion Matrix Example (Augmented Data Model):
Predicted
Cat Dog
Actual Cat 45 5
Dog 4 46
Total samples: 100
- True Positives (TP): 45 (correctly predicted Cat)
- True Negatives (TN): 46 (correctly predicted Dog)
- False Positives (FP): 4 (Dog predicted as Cat)
- False Negatives (FN): 5 (Cat predicted as Dog)
Precision = TP / (TP + FP) = 45 / (45 + 4) = 0.918
Recall = TP / (TP + FN) = 45 / (45 + 5) = 0.9
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.909Precision vs Recall tradeoff with concrete examples
Imagine a model trained with and without augmentation:
- Without augmentation: The model sees fewer variations, so it might memorize training images but fail on new ones. This can cause high precision but low recall because it only confidently predicts what it has seen.
- With augmentation: The model learns to recognize objects in many forms (rotated, brightened, flipped). This usually improves recall because it finds more true positives, even if precision slightly drops due to more challenging cases.
For example, in a dog vs cat classifier, augmentation helps the model spot a dog even if the photo is blurry or rotated, increasing recall.
What "good" vs "bad" metric values look like for this use case
Good metrics:
- Validation accuracy above 85% on varied images
- Balanced precision and recall around 90%
- F1 score close to precision and recall, showing consistent performance
Bad metrics:
- High training accuracy but low validation accuracy (overfitting)
- Very high precision but very low recall (model misses many true cases)
- Validation accuracy below 70%, showing poor generalization
Metrics pitfalls (accuracy paradox, data leakage, overfitting indicators)
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, if 90% of images are cats, predicting all cats yields 90% accuracy but no real learning.
- Data leakage: Augmented images too similar to training images in validation can inflate metrics falsely.
- Overfitting indicators: Large gap between training and validation accuracy means the model memorizes training data but fails to generalize.
Self-check question
Your model trained with augmentation has 98% training accuracy but only 12% recall on the dog class in validation. Is it good for production? Why or why not?
Answer: No, it is not good. The very low recall means the model misses most dogs, which is critical if detecting dogs is important. Despite high training accuracy, the model fails to generalize and recognize dogs in new images. This suggests overfitting or ineffective augmentation.
Key Result
Data augmentation improves model generalization by increasing validation accuracy and recall, reducing overfitting.