Image augmentation helps models learn better by showing more varied pictures. The key metrics to check if augmentation works are validation accuracy and validation loss. These show if the model is learning to recognize images well on new, unseen data. If augmentation is good, validation accuracy should improve or stay stable while training accuracy might be lower. This means the model is not just memorizing but generalizing better.
Image augmentation transforms in Computer Vision - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Imagine a model classifying cats and dogs. After training with augmentation, the confusion matrix might look like this:
| Predicted Cat | Predicted Dog |
|---------------|---------------|
| 45 (TP) | 5 (FN) |
| 3 (FP) | 47 (TN) |
Total samples = 45 + 5 + 3 + 47 = 100
Precision = 45 / (45 + 3) = 0.94
Recall = 45 / (45 + 5) = 0.90
F1 Score = 2 * (0.94 * 0.90) / (0.94 + 0.90) ≈ 0.92
Augmentation can help balance precision and recall. For example:
- High precision: The model rarely mistakes dogs for cats, so predictions are trustworthy.
- High recall: The model finds most cats, even if some dogs are wrongly labeled as cats.
If augmentation is too weak, recall might be low because the model misses varied cat images. If augmentation is too strong or unrealistic, precision might drop because the model gets confused.
Good: Validation accuracy close to training accuracy, high F1 score (above 0.85), and balanced precision and recall.
Bad: Large gap between training and validation accuracy (overfitting), low recall (missing many true images), or very low precision (many wrong predictions).
- Accuracy paradox: High accuracy but poor recall if classes are imbalanced (e.g., many dog images, few cat images).
- Data leakage: Augmented images too similar to training images can inflate validation scores falsely.
- Overfitting indicators: Training accuracy very high but validation accuracy low, meaning augmentation is not helping generalization.
Your model trained with image augmentation has 98% training accuracy but only 60% validation accuracy. What does this mean?
Answer: The model is overfitting. It learned the training images too well but cannot generalize to new images. The augmentation might be too weak or not diverse enough to help the model learn general features.
Practice
image augmentation in training machine learning models?Solution
Step 1: Understand image augmentation
Image augmentation means making small changes to original images to create new ones.Step 2: Purpose in training
This helps models see more variety and learn better, avoiding overfitting.Final Answer:
To create more varied training images by modifying originals -> Option CQuick Check:
Image augmentation = create varied images [OK]
- Thinking augmentation reduces dataset size
- Confusing augmentation with noise removal
- Assuming augmentation only changes color
Solution
Step 1: Recall torchvision syntax
PyTorch usestransforms.RandomHorizontalFlip(p=probability)to flip images horizontally.Step 2: Check options
Only transforms.RandomHorizontalFlip(p=1.0) matches the correct function and parameter style.Final Answer:
transforms.RandomHorizontalFlip(p=1.0) -> Option AQuick Check:
Correct PyTorch flip = RandomHorizontalFlip [OK]
- Using non-existent transform names
- Missing the probability parameter
- Confusing horizontal with vertical flip
transform = transforms.Compose([
transforms.Resize((128, 128)),
transforms.RandomCrop(100),
transforms.ToTensor()
])
image = Image.open('sample.jpg')
output = transform(image)
print(output.shape)Solution
Step 1: Analyze each transform step
First, image is resized to 128x128 pixels with 3 color channels (RGB). Then a random crop of size 100x100 is taken.Step 2: Determine output tensor shape
After cropping, the image size is 100x100 with 3 channels.ToTensor()converts it to a tensor with shape [channels, height, width] = [3, 100, 100].Final Answer:
[3, 100, 100] -> Option BQuick Check:
Resize then crop = final size 100x100 [OK]
- Ignoring the crop step size
- Confusing channel dimension with batch size
- Assuming crop keeps original size
transform = transforms.Compose([
transforms.Rotate(45),
transforms.ToTensor()
])
image = Image.open('sample.jpg')
output = transform(image)Solution
Step 1: Check torchvision transform names
There is notransforms.Rotateclass. Rotation is done withtransforms.RandomRotationor using functional API.Step 2: Identify correct usage
To rotate by a fixed angle, usetransforms.RandomRotation([45, 45])ortransforms.functional.rotate. The code as is will cause an AttributeError.Final Answer:
transforms.Rotate doesn't exist; should use transforms.functional.rotate or transforms.RandomRotation -> Option AQuick Check:
No transforms.Rotate in torchvision [OK]
- Using non-existent transform classes
- Confusing degrees and radians
- Wrong order of transforms
Solution
Step 1: Understand augmentation goals
We want to simulate real-world changes like size, flip, and color while keeping output size fixed.Step 2: Evaluate options
transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ColorJitter(brightness=0.2, contrast=0.2) resizes and crops randomly to 224x224, flips horizontally, and changes brightness/contrast, all common augmentations that keep size constant.Step 3: Check other options
transforms.Resize(256), transforms.CenterCrop(224), transforms.RandomVerticalFlip() only flips vertically and crops but lacks color changes. transforms.RandomRotation(90), transforms.RandomCrop(200), transforms.ToTensor() changes size unpredictably and transforms.RandomCrop(224), transforms.RandomRotation(180), transforms.Resize(128) resizes after cropping, changing size.Final Answer:
transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ColorJitter(brightness=0.2, contrast=0.2) -> Option DQuick Check:
Best augmentations keep size fixed and add variety [OK]
- Choosing transforms that change image size unpredictably
- Ignoring color augmentations
- Using only vertical flips which are less common
