Computer-visionHow-ToBeginner · 4 min read

How to Augment Images for Training in Computer Vision

To augment images for training in computer vision, use image transformations like rotation, flipping, scaling, and color changes to create varied versions of your images. This helps your model learn better by seeing more diverse examples without needing more data.

📐

Syntax

Image augmentation typically involves applying a series of transformations to input images during training. Common transformations include:

Rotation: Rotating images by a certain angle.
Flipping: Horizontally or vertically flipping images.
Scaling: Zooming in or out.
Translation: Shifting images along width or height.
Color jitter: Changing brightness, contrast, or saturation.

In Python with TensorFlow/Keras, you can use tf.keras.preprocessing.image.ImageDataGenerator or tf.keras.layers.RandomFlip and similar layers.

python

from tensorflow.keras.preprocessing.image import ImageDataGenerator

data_gen = ImageDataGenerator(
    rotation_range=20,      # rotate images up to 20 degrees
    width_shift_range=0.1,  # shift images horizontally by 10%
    height_shift_range=0.1, # shift images vertically by 10%
    shear_range=0.1,        # shear transformation
    zoom_range=0.1,         # zoom in/out by 10%
    horizontal_flip=True,   # flip images horizontally
    fill_mode='nearest'     # fill missing pixels
)

💻

Example

This example shows how to augment images using TensorFlow's ImageDataGenerator. It loads a sample image, applies random transformations, and displays the augmented images.

python

import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img

# Load sample image
img = load_img('elephant.jpg')  # replace with your image path
x = img_to_array(img)           # convert to numpy array
x = x.reshape((1,) + x.shape)   # reshape to (1, height, width, channels)

# Create ImageDataGenerator with augmentations
datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Generate and plot 5 augmented images
plt.figure(figsize=(10, 10))
for i, batch in enumerate(datagen.flow(x, batch_size=1)):
    plt.subplot(1, 5, i + 1)
    plt.imshow(batch[0].astype('uint8'))
    plt.axis('off')
    if i == 4:
        break
plt.show()

Output

A window displaying 5 different augmented versions of the input image with random rotations, shifts, zooms, and flips.

⚠️

Common Pitfalls

Common mistakes when augmenting images include:

Applying augmentations that distort the image too much, making it unrealistic.
Using augmentations that change the label meaning (e.g., flipping digits like 6 and 9).
Not applying augmentations consistently during training and validation.
Augmenting images after normalization, which can cause unexpected results.

Always check that augmented images still represent the correct class and keep augmentations reasonable.

python

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Wrong: flipping digits can change label meaning
wrong_datagen = ImageDataGenerator(horizontal_flip=True)

# Right: avoid flipping for digit recognition
right_datagen = ImageDataGenerator()  # no flip

# Use augmentation only on training data, not validation

📊

Quick Reference

Here is a quick cheat-sheet for common image augmentations:

Augmentation	Description	Use Case
Rotation	Rotate image by degrees	General variation
Horizontal Flip	Flip image left-right	Objects symmetric horizontally
Vertical Flip	Flip image up-down	Rare, use carefully
Zoom	Scale image in/out	Simulate distance changes
Shift	Move image horizontally/vertically	Simulate position changes
Shear	Slant image	Simulate perspective
Color Jitter	Change brightness/contrast	Lighting variation

✅

Key Takeaways

Use image augmentation to create varied training data and improve model generalization.

Apply realistic transformations that keep the image label valid and meaningful.

Use libraries like TensorFlow's ImageDataGenerator for easy augmentation pipelines.

Avoid augmenting validation/test data to get accurate performance metrics.

Check augmented images visually to ensure quality and correctness.