How to use VGG for classification in computer vision

Computer-visionHow-ToBeginner · 4 min read

How to Use VGG for Image Classification in Computer Vision

To use VGG for classification in computer vision, load a pre-trained VGG model (like VGG16 or VGG19) from a deep learning library such as tf.keras.applications. Then, preprocess your images to match VGG's input format, run predictions, and interpret the output class probabilities for classification.

📐

Syntax

The typical steps to use VGG for classification are:

Import the VGG model (e.g., VGG16) from a library.
Load the model with pre-trained weights (usually on ImageNet).
Preprocess input images to the required size and format.
Use the model to predict class probabilities.
Decode predictions to human-readable labels.

python

from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np

# Load VGG16 model with pretrained ImageNet weights
model = VGG16(weights='imagenet')

# Load and preprocess image
img = image.load_img('path_to_image.jpg', target_size=(224, 224))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
img_array = preprocess_input(img_array)

# Predict
preds = model.predict(img_array)

# Decode predictions
print(decode_predictions(preds, top=3)[0])

💻

Example

This example shows how to classify an image using VGG16 pretrained on ImageNet. It loads an image, preprocesses it, runs the model, and prints the top 3 predicted classes with probabilities.

python

from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np

# Load VGG16 model
model = VGG16(weights='imagenet')

# Load example image from URL
import urllib.request
img_url = 'https://upload.wikimedia.org/wikipedia/commons/9/9a/Pug_600.jpg'
img_path = 'pug.jpg'
urllib.request.urlretrieve(img_url, img_path)

# Load and preprocess image
img = image.load_img(img_path, target_size=(224, 224))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
img_array = preprocess_input(img_array)

# Predict
preds = model.predict(img_array)

# Decode and print top 3 predictions
results = decode_predictions(preds, top=3)[0]
for i, (imagenet_id, label, prob) in enumerate(results):
    print(f"{i+1}. {label}: {prob:.4f}")

Output

1. pug: 0.9275 2. bull_mastiff: 0.0342 3. boxer: 0.0123

⚠️

Common Pitfalls

Common mistakes when using VGG for classification include:

Not resizing images to 224x224 pixels, which VGG expects.
Failing to preprocess images with preprocess_input, causing wrong input scaling.
Using the wrong model weights or architecture variant.
Not expanding image dimensions to include batch size (should be 4D tensor).
Ignoring the need to decode predictions to get readable labels.

python

from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing import image
import numpy as np

# Wrong: Not resizing image
img = image.load_img('pug.jpg')  # No target_size
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)

model = VGG16(weights='imagenet')
# This will likely cause an error or wrong predictions
try:
    preds = model.predict(img_array)
except Exception as e:
    print(f"Error: {e}")

# Right: Resize and preprocess
from tensorflow.keras.applications.vgg16 import preprocess_input
img = image.load_img('pug.jpg', target_size=(224, 224))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
img_array = preprocess_input(img_array)
preds = model.predict(img_array)
print("Prediction successful after correct preprocessing.")

Output

Error: Input size must be at least 32x32; got (None, None, 3) Prediction successful after correct preprocessing.

📊

Quick Reference

Key points to remember when using VGG for classification:

Input images must be 224x224 pixels.
Use preprocess_input to scale pixel values correctly.
Load pretrained weights with weights='imagenet' for transfer learning.
Expand image dimensions to include batch size before prediction.
Use decode_predictions to convert model output to labels.

✅

Key Takeaways

Always resize input images to 224x224 pixels before feeding them to VGG.

Use the preprocess_input function to prepare images correctly for VGG.

Load VGG with pretrained ImageNet weights for effective classification.

Expand image dimensions to include batch size (shape: 1, 224, 224, 3).

Decode model predictions to get human-readable class labels.