Data augmentation helps models see more varied examples by changing images slightly. This usually improves accuracy and generalization. We focus on validation accuracy and validation loss to check if the model learns well on new, unseen images. Higher accuracy and lower loss on validation data mean augmentation is helping the model avoid overfitting and perform better in real life.
Data augmentation importance in Computer Vision - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Imagine a model classifying images into cats and dogs. After training with augmentation, the confusion matrix might look like this:
| Predicted Cat | Predicted Dog |
|--------------|---------------|
| True Cat: 45 | False Dog: 5 |
| False Cat: 3 | True Dog: 47 |
Total samples = 45 + 5 + 3 + 47 = 100
From this, we calculate:
- Precision (Cat) = 45 / (45 + 3) = 0.94
- Recall (Cat) = 45 / (45 + 5) = 0.90
- Accuracy = (45 + 47) / 100 = 0.92
This shows the model is good at recognizing cats and dogs after augmentation.
Data augmentation can help balance precision and recall by making the model robust to variations.
- High Precision, Low Recall: Model is very sure when it predicts a class but misses many true cases. For example, it only labels very clear cat images as cats, missing some cats that look different.
- High Recall, Low Precision: Model finds most cats but sometimes mistakes dogs for cats.
Augmentation helps increase both by showing the model many versions of cats and dogs, so it learns to recognize them better in different conditions.
Good:
- Validation accuracy improves or stays stable compared to no augmentation.
- Validation loss decreases, showing better learning on new data.
- Balanced precision and recall above 85% for key classes.
Bad:
- Validation accuracy drops significantly, meaning augmentation is hurting learning.
- Validation loss increases or fluctuates wildly.
- Precision or recall very low, showing model confusion.
- Overfitting despite augmentation: Augmentation is not a fix-all; if the model is too complex, it can still memorize training data.
- Data leakage: Augmented images too similar to validation images can give false high accuracy.
- Ignoring class imbalance: Augmentation might increase some classes more than others, skewing metrics.
- Accuracy paradox: High accuracy can hide poor performance on rare classes; always check precision and recall.
Your model trained with data augmentation shows 98% accuracy but only 12% recall on a rare class like fraud detection. Is it good?
Answer: No, it is not good. The low recall means the model misses most fraud cases, which is critical. High accuracy is misleading because most data is non-fraud. You need to improve recall, possibly by better augmentation or other techniques.
Practice
Solution
Step 1: Understand data augmentation purpose
Data augmentation creates new images by slightly changing existing ones to increase variety.Step 2: Connect augmentation to model learning
More variety helps the model learn features that work on new, unseen images, improving generalization.Final Answer:
It increases the variety of training images to help the model generalize better. -> Option AQuick Check:
Data augmentation = better generalization [OK]
- Confusing augmentation with data reduction
- Believing augmentation removes bad images
- Assuming augmentation guarantees perfect accuracy
Solution
Step 1: Recall torchvision syntax for horizontal flip
The correct transform is RandomHorizontalFlip with a probability parameter p.Step 2: Check each option's correctness
Only transforms.RandomHorizontalFlip(p=0.5) matches the correct syntax and parameter name.Final Answer:
transforms.RandomHorizontalFlip(p=0.5) -> Option CQuick Check:
Correct torchvision flip syntax = transforms.RandomHorizontalFlip(p=0.5) [OK]
- Using wrong class names like HorizontalFlip
- Incorrect parameter names like prob instead of p
- Missing the probability parameter
transform = transforms.Compose([ transforms.Resize((128, 128)), transforms.RandomRotation(30), transforms.ToTensor() ]) augmented_image = transform(original_image)
Solution
Step 1: Analyze the transform steps
Resize changes image to 128x128 pixels. RandomRotation keeps size same. ToTensor converts image to tensor with channels first.Step 2: Determine tensor shape format
PyTorch tensors from images have shape [channels, height, width]. For RGB images, channels=3.Final Answer:
[3, 128, 128] -> Option DQuick Check:
PyTorch image tensor shape = [channels, height, width] [OK]
- Confusing channel order with height and width
- Assuming rotation changes image size
- Mixing up tensor shape formats
transform = transforms.Compose([ transforms.RandomRotation(45), transforms.RandomHorizontalFlip(0.3), transforms.ToTensor() ])What is the likely cause?
Solution
Step 1: Check RandomHorizontalFlip usage
RandomHorizontalFlip requires the probability parameter as a keyword argument p=, not a positional argument.Step 2: Verify other transform usages
RandomRotation accepts float degrees, ToTensor can be last, Compose supports these transforms.Final Answer:
RandomHorizontalFlip expects a keyword argument p, not a positional float. -> Option AQuick Check:
RandomHorizontalFlip(p=0.3) correct syntax [OK]
- Passing probability as positional argument
- Thinking rotation degrees must be integer
- Misordering transforms in Compose
Solution
Step 1: Consider dataset size and augmentation needs
Small datasets benefit from augmentations that create varied views of images to prevent overfitting.Step 2: Evaluate augmentation types
Random flips, rotations, and brightness changes simulate real-world variations, improving generalization better than noise alone or no augmentation.Final Answer:
Apply random flips, rotations up to 30 degrees, and brightness changes during training. -> Option BQuick Check:
Varied augmentations = better generalization on small data [OK]
- Ignoring augmentation on small datasets
- Using only noise without geometric changes
- Relying on bigger models instead of data variety
