How to use pretrained model for image classification in computer vision

Computer-visionHow-ToBeginner · 4 min read

How to Use Pretrained Models for Image Classification in Computer Vision

To use a pretrained model for image classification, load the model with pretrained weights, preprocess your input images to match the model's requirements, and then run the model to get predictions. Popular libraries like TensorFlow and PyTorch provide easy APIs to load models like ResNet or MobileNet and classify images quickly.

📐

Syntax

Using a pretrained model typically involves these steps:

Load the model: Import the model architecture with pretrained weights.
Preprocess input: Resize, normalize, and format images as the model expects.
Predict: Pass the processed image to the model to get class probabilities or labels.

python

from torchvision import models, transforms
from PIL import Image
import torch

# Load pretrained model
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
model.eval()  # Set to evaluation mode

# Define preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# Load image
img = Image.open("path_to_image.jpg")
input_tensor = preprocess(img)
input_batch = input_tensor.unsqueeze(0)  # Create batch dimension

# Predict
with torch.no_grad():
    output = model(input_batch)

# Output is raw scores for each class

💻

Example

This example shows how to classify an image using PyTorch's pretrained ResNet18 model. It loads an image, preprocesses it, runs the model, and prints the top predicted class.

python

from torchvision import models, transforms
from PIL import Image
import torch
import urllib.request

# Download labels for ImageNet classes
url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
labels_path = "imagenet_classes.txt"
urllib.request.urlretrieve(url, labels_path)

with open(labels_path) as f:
    labels = [line.strip() for line in f.readlines()]

# Load pretrained model
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
model.eval()

# Preprocessing pipeline
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# Load an example image
img_url = "https://upload.wikimedia.org/wikipedia/commons/2/26/YellowLabradorLooking_new.jpg"
img_path = "dog.jpg"
urllib.request.urlretrieve(img_url, img_path)
img = Image.open(img_path)

# Prepare input
input_tensor = preprocess(img)
input_batch = input_tensor.unsqueeze(0)

# Run prediction
with torch.no_grad():
    output = model(input_batch)

# Get probabilities
probabilities = torch.nn.functional.softmax(output[0], dim=0)

# Get top 1 prediction
top_prob, top_catid = torch.topk(probabilities, 1)

print(f"Predicted class: {labels[top_catid]} with probability {top_prob.item():.4f}")

Output

Predicted class: Labrador_retriever with probability 0.8423

⚠️

Common Pitfalls

Not preprocessing input correctly: Models expect images resized, cropped, and normalized in a specific way.
Forgetting to set model to evaluation mode: Use model.eval() to disable training behaviors like dropout.
Passing wrong input shape: Models expect a batch dimension, so add one if needed.
Ignoring device placement: For GPU use, move model and data to the same device.

python

from torchvision import models
import torch

# Wrong: Not setting eval mode
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
# model.eval()  # Missing this causes wrong predictions

# Wrong: Passing input without batch dimension
input_tensor = torch.randn(3, 224, 224)  # Missing batch
# output = model(input_tensor)  # This will error

# Right way:
model.eval()
input_batch = input_tensor.unsqueeze(0)  # Add batch dimension
output = model(input_batch)

📊

Quick Reference

Tips for using pretrained models:

Always preprocess images as the model expects (resize, crop, normalize).
Set the model to evaluation mode with model.eval().
Add a batch dimension to inputs before prediction.
Use softmax on outputs to get probabilities.
Use official pretrained weights from trusted libraries.

✅

Key Takeaways

Load pretrained models with pretrained weights from libraries like PyTorch or TensorFlow.

Preprocess input images correctly to match model requirements before prediction.

Always set the model to evaluation mode using model.eval() before inference.

Add a batch dimension to input tensors to avoid shape errors.

Use softmax on model outputs to interpret predictions as probabilities.