How to Preprocess Image for Deep Learning in Computer Vision
To preprocess images for deep learning in computer vision, first resize images to a fixed size, then normalize pixel values typically to the range
0-1 or -1 to 1. Optionally, apply data augmentation like flipping or rotation to improve model robustness.Syntax
Image preprocessing usually involves these steps:
- Resize: Change image size to a fixed shape for the model.
- Normalize: Scale pixel values to a standard range like 0 to 1.
- Augment (optional): Apply random changes like flips or rotations to increase data variety.
python
from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.utils import img_to_array, load_img # Load image image = load_img('path_to_image.jpg') # Load image from file # Resize image image = image.resize((224, 224)) # Resize to 224x224 pixels # Convert to array image_array = img_to_array(image) # Convert image to numpy array # Normalize pixel values image_array = image_array / 255.0 # Scale pixels to 0-1 range # Data augmentation example datagen = ImageDataGenerator(horizontal_flip=True, rotation_range=20) augmented_images = datagen.flow(image_array.reshape((1, 224, 224, 3)))
Example
This example shows how to load an image, resize it, normalize pixel values, and apply simple data augmentation using TensorFlow Keras utilities.
python
import numpy as np from tensorflow.keras.preprocessing.image import load_img, img_to_array, ImageDataGenerator # Load and resize image image = load_img('sample.jpg', target_size=(128, 128)) # Convert to numpy array image_array = img_to_array(image) # Normalize pixels to 0-1 image_array = image_array / 255.0 print(f'Image shape after resize: {image_array.shape}') print(f'Min pixel value: {np.min(image_array):.3f}, Max pixel value: {np.max(image_array):.3f}') # Setup augmentation datagen = ImageDataGenerator(horizontal_flip=True, rotation_range=30) # Prepare image for augmentation (add batch dimension) image_batch = image_array.reshape((1, 128, 128, 3)) # Generate one augmented image aug_iter = datagen.flow(image_batch, batch_size=1) augmented_image = next(aug_iter)[0] print(f'Augmented image shape: {augmented_image.shape}') print(f'Augmented image pixel range: {augmented_image.min():.3f} to {augmented_image.max():.3f}')
Output
Image shape after resize: (128, 128, 3)
Min pixel value: 0.000, Max pixel value: 1.000
Augmented image shape: (128, 128, 3)
Augmented image pixel range: 0.000 to 1.000
Common Pitfalls
Common mistakes when preprocessing images include:
- Not resizing images to a consistent size, causing errors in model input.
- Forgetting to normalize pixel values, which slows down training or causes poor results.
- Applying augmentation incorrectly, such as augmenting test data or using unrealistic transformations.
- Mixing color channels order (RGB vs BGR) which can confuse pretrained models.
python
from tensorflow.keras.preprocessing.image import load_img, img_to_array # Wrong: Not resizing image image = load_img('sample.jpg') # Original size unknown image_array = img_to_array(image) / 255.0 print(f'Image shape without resize: {image_array.shape}') # Right: Resize before converting image_resized = load_img('sample.jpg', target_size=(128, 128)) image_array_resized = img_to_array(image_resized) / 255.0 print(f'Image shape after resize: {image_array_resized.shape}')
Output
Image shape without resize: (500, 400, 3)
Image shape after resize: (128, 128, 3)
Quick Reference
Summary tips for image preprocessing in deep learning:
- Always resize images to the model's expected input size.
- Normalize pixel values to 0-1 or -1 to 1 for faster training.
- Use data augmentation only on training data to improve generalization.
- Check color channel order matches model requirements (usually RGB).
- Convert images to arrays before feeding into models.
Key Takeaways
Resize all images to a fixed size before feeding into the model.
Normalize pixel values to a standard range like 0-1 for better training.
Apply data augmentation only on training data to improve model robustness.
Ensure color channels match model expectations (usually RGB).
Convert images to arrays after resizing and before normalization.