MixUp is a data augmentation method that blends images and their labels. It helps models generalize better. Because it changes training data, the key metrics to watch are validation accuracy and validation loss. These show if the model learns well on new, unseen data. Also, robustness metrics like accuracy on noisy or adversarial examples matter, since MixUp aims to improve model stability.
MixUp strategy in Computer Vision - Model Metrics & Evaluation
Actual \ Predicted | Cat | Dog | Bird
-------------------|-----|-----|-----
Cat | 45 | 3 | 2
Dog | 4 | 43 | 3
Bird | 1 | 2 | 47
Total samples = 150
TP (example for Cat) = 45
FP (Cat predicted but not Cat) = 4 + 1 = 5
FN (Cat actual but not predicted) = 3 + 2 = 5
Precision (Cat) = 45 / (45 + 5) = 0.9
Recall (Cat) = 45 / (45 + 5) = 0.9
This matrix shows balanced precision and recall, indicating MixUp helped the model generalize well across classes.
MixUp encourages smoother decision boundaries, which can improve both precision and recall. But sometimes, increasing recall (catching more true positives) may lower precision (more false positives). For example:
- High precision, low recall: Model is very sure but misses some true cases.
- High recall, low precision: Model catches most true cases but also many wrong ones.
MixUp helps balance this tradeoff by making the model less confident on ambiguous samples, improving overall robustness.
Good:
- Validation accuracy steadily improves or stays stable compared to baseline.
- Validation loss decreases smoothly without sudden jumps.
- Precision and recall are balanced and high (e.g., above 85%).
- Model performs well on noisy or mixed inputs.
Bad:
- Validation accuracy drops or fluctuates wildly.
- Validation loss is high or unstable.
- Precision or recall is very low, showing poor generalization.
- Model fails on augmented or mixed samples, indicating overfitting.
- Accuracy paradox: Accuracy may look good but model might not handle real mixed data well.
- Data leakage: Mixing training and validation data accidentally can inflate metrics.
- Overfitting signs: Training loss very low but validation loss high means model memorizes instead of generalizing.
- Ignoring robustness: Only checking accuracy on clean data misses MixUp benefits on noisy inputs.
Your model trained with MixUp has 98% accuracy but only 12% recall on a rare class. Is it good for production? Why or why not?
Answer: No, it is not good. High accuracy can be misleading if the rare class is small. Low recall means the model misses most true cases of that class, which can be critical depending on the task. You should improve recall to catch more true positives, even if accuracy drops slightly.