Computer-visionHow-ToBeginner · 4 min read

How to Preprocess Image for Deep Learning in Computer Vision

To preprocess images for deep learning in computer vision, first resize images to a fixed size, then normalize pixel values typically to the range 0-1 or -1 to 1. Optionally, apply data augmentation like flipping or rotation to improve model robustness.

📐

Syntax

Image preprocessing usually involves these steps:

Resize: Change image size to a fixed shape for the model.
Normalize: Scale pixel values to a standard range like 0 to 1.
Augment (optional): Apply random changes like flips or rotations to increase data variety.

python

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import img_to_array, load_img

# Load image
image = load_img('path_to_image.jpg')  # Load image from file

# Resize image
image = image.resize((224, 224))  # Resize to 224x224 pixels

# Convert to array
image_array = img_to_array(image)  # Convert image to numpy array

# Normalize pixel values
image_array = image_array / 255.0  # Scale pixels to 0-1 range

# Data augmentation example
datagen = ImageDataGenerator(horizontal_flip=True, rotation_range=20)
augmented_images = datagen.flow(image_array.reshape((1, 224, 224, 3)))

💻

Example

This example shows how to load an image, resize it, normalize pixel values, and apply simple data augmentation using TensorFlow Keras utilities.

python

import numpy as np
from tensorflow.keras.preprocessing.image import load_img, img_to_array, ImageDataGenerator

# Load and resize image
image = load_img('sample.jpg', target_size=(128, 128))

# Convert to numpy array
image_array = img_to_array(image)

# Normalize pixels to 0-1
image_array = image_array / 255.0

print(f'Image shape after resize: {image_array.shape}')
print(f'Min pixel value: {np.min(image_array):.3f}, Max pixel value: {np.max(image_array):.3f}')

# Setup augmentation
datagen = ImageDataGenerator(horizontal_flip=True, rotation_range=30)

# Prepare image for augmentation (add batch dimension)
image_batch = image_array.reshape((1, 128, 128, 3))

# Generate one augmented image
aug_iter = datagen.flow(image_batch, batch_size=1)
augmented_image = next(aug_iter)[0]

print(f'Augmented image shape: {augmented_image.shape}')
print(f'Augmented image pixel range: {augmented_image.min():.3f} to {augmented_image.max():.3f}')

Output

Image shape after resize: (128, 128, 3) Min pixel value: 0.000, Max pixel value: 1.000 Augmented image shape: (128, 128, 3) Augmented image pixel range: 0.000 to 1.000

⚠️

Common Pitfalls

Common mistakes when preprocessing images include:

Not resizing images to a consistent size, causing errors in model input.
Forgetting to normalize pixel values, which slows down training or causes poor results.
Applying augmentation incorrectly, such as augmenting test data or using unrealistic transformations.
Mixing color channels order (RGB vs BGR) which can confuse pretrained models.

python

from tensorflow.keras.preprocessing.image import load_img, img_to_array

# Wrong: Not resizing image
image = load_img('sample.jpg')  # Original size unknown
image_array = img_to_array(image) / 255.0
print(f'Image shape without resize: {image_array.shape}')

# Right: Resize before converting
image_resized = load_img('sample.jpg', target_size=(128, 128))
image_array_resized = img_to_array(image_resized) / 255.0
print(f'Image shape after resize: {image_array_resized.shape}')

Output

Image shape without resize: (500, 400, 3) Image shape after resize: (128, 128, 3)

📊

Quick Reference

Summary tips for image preprocessing in deep learning:

Always resize images to the model's expected input size.
Normalize pixel values to 0-1 or -1 to 1 for faster training.
Use data augmentation only on training data to improve generalization.
Check color channel order matches model requirements (usually RGB).
Convert images to arrays before feeding into models.

✅

Key Takeaways

Resize all images to a fixed size before feeding into the model.

Normalize pixel values to a standard range like 0-1 for better training.

Apply data augmentation only on training data to improve model robustness.

Ensure color channels match model expectations (usually RGB).

Convert images to arrays after resizing and before normalization.