0
0
Computer-visionHow-ToBeginner ยท 4 min read

How to Augment Images for Training in Computer Vision

To augment images for training in computer vision, use image transformations like rotation, flipping, scaling, and color changes to create varied versions of your images. This helps your model learn better by seeing more diverse examples without needing more data.
๐Ÿ“

Syntax

Image augmentation typically involves applying a series of transformations to input images during training. Common transformations include:

  • Rotation: Rotating images by a certain angle.
  • Flipping: Horizontally or vertically flipping images.
  • Scaling: Zooming in or out.
  • Translation: Shifting images along width or height.
  • Color jitter: Changing brightness, contrast, or saturation.

In Python with TensorFlow/Keras, you can use tf.keras.preprocessing.image.ImageDataGenerator or tf.keras.layers.RandomFlip and similar layers.

python
from tensorflow.keras.preprocessing.image import ImageDataGenerator

data_gen = ImageDataGenerator(
    rotation_range=20,      # rotate images up to 20 degrees
    width_shift_range=0.1,  # shift images horizontally by 10%
    height_shift_range=0.1, # shift images vertically by 10%
    shear_range=0.1,        # shear transformation
    zoom_range=0.1,         # zoom in/out by 10%
    horizontal_flip=True,   # flip images horizontally
    fill_mode='nearest'     # fill missing pixels
)
๐Ÿ’ป

Example

This example shows how to augment images using TensorFlow's ImageDataGenerator. It loads a sample image, applies random transformations, and displays the augmented images.

python
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img

# Load sample image
img = load_img('elephant.jpg')  # replace with your image path
x = img_to_array(img)           # convert to numpy array
x = x.reshape((1,) + x.shape)   # reshape to (1, height, width, channels)

# Create ImageDataGenerator with augmentations
datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Generate and plot 5 augmented images
plt.figure(figsize=(10, 10))
for i, batch in enumerate(datagen.flow(x, batch_size=1)):
    plt.subplot(1, 5, i + 1)
    plt.imshow(batch[0].astype('uint8'))
    plt.axis('off')
    if i == 4:
        break
plt.show()
Output
A window displaying 5 different augmented versions of the input image with random rotations, shifts, zooms, and flips.
โš ๏ธ

Common Pitfalls

Common mistakes when augmenting images include:

  • Applying augmentations that distort the image too much, making it unrealistic.
  • Using augmentations that change the label meaning (e.g., flipping digits like 6 and 9).
  • Not applying augmentations consistently during training and validation.
  • Augmenting images after normalization, which can cause unexpected results.

Always check that augmented images still represent the correct class and keep augmentations reasonable.

python
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Wrong: flipping digits can change label meaning
wrong_datagen = ImageDataGenerator(horizontal_flip=True)

# Right: avoid flipping for digit recognition
right_datagen = ImageDataGenerator()  # no flip

# Use augmentation only on training data, not validation
๐Ÿ“Š

Quick Reference

Here is a quick cheat-sheet for common image augmentations:

AugmentationDescriptionUse Case
RotationRotate image by degreesGeneral variation
Horizontal FlipFlip image left-rightObjects symmetric horizontally
Vertical FlipFlip image up-downRare, use carefully
ZoomScale image in/outSimulate distance changes
ShiftMove image horizontally/verticallySimulate position changes
ShearSlant imageSimulate perspective
Color JitterChange brightness/contrastLighting variation
โœ…

Key Takeaways

Use image augmentation to create varied training data and improve model generalization.
Apply realistic transformations that keep the image label valid and meaningful.
Use libraries like TensorFlow's ImageDataGenerator for easy augmentation pipelines.
Avoid augmenting validation/test data to get accurate performance metrics.
Check augmented images visually to ensure quality and correctness.