How to use pretrained object detection model in computer vision

Computer-visionHow-ToBeginner · 4 min read

How to Use Pretrained Object Detection Models in Computer Vision

To use a pretrained object detection model, load the model with pretrained weights, prepare your input image, and run the model to get detected objects with bounding boxes and labels. Libraries like PyTorch provide easy access to pretrained models such as Faster R-CNN that you can use directly for inference.

📐

Syntax

Using a pretrained object detection model typically involves these steps:

Load the model: Import and load a pretrained model with weights.
Prepare input: Transform your image into the format the model expects.
Run inference: Pass the image through the model to get predictions.
Process output: Extract bounding boxes, labels, and scores from the model output.

python

import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
from PIL import Image

# Load pretrained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # Set to evaluation mode

# Load and prepare image
image = Image.open('path_to_image.jpg').convert('RGB')
image_tensor = F.to_tensor(image)

# Run inference
with torch.no_grad():
    predictions = model([image_tensor])

# Extract predictions
boxes = predictions[0]['boxes']
labels = predictions[0]['labels']
scores = predictions[0]['scores']

💻

Example

This example shows how to load a pretrained Faster R-CNN model, run it on an image, and print detected objects with their confidence scores.

python

import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
from PIL import Image

# Load pretrained model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Load image
image = Image.open('test_image.jpg').convert('RGB')
image_tensor = F.to_tensor(image)

# Run inference
with torch.no_grad():
    outputs = model([image_tensor])

# Print detected objects with scores above 0.8
labels_map = {1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle', 5: 'airplane', 6: 'bus', 7: 'train', 8: 'truck', 9: 'boat', 10: 'traffic light'}

for box, label, score in zip(outputs[0]['boxes'], outputs[0]['labels'], outputs[0]['scores']):
    if score > 0.8:
        print(f"Detected {labels_map.get(label.item(), 'unknown')} with confidence {score:.2f} at {box.tolist()}")

Output

Detected person with confidence 0.95 at [34.0, 50.0, 200.0, 400.0] Detected car with confidence 0.87 at [220.0, 80.0, 400.0, 300.0]

⚠️

Common Pitfalls

Not setting the model to evaluation mode: Always call model.eval() before inference to disable training behaviors like dropout.
Incorrect image preprocessing: The model expects images as tensors normalized and in the right shape; skipping this causes errors or bad results.
Ignoring device placement: For faster inference, move model and tensors to GPU if available using to('cuda').
Misinterpreting output: The model returns raw tensors; you must extract boxes, labels, and scores properly.

python

import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
from PIL import Image

# Wrong: Not setting eval mode
model = fasterrcnn_resnet50_fpn(pretrained=True)

image = Image.open('test_image.jpg').convert('RGB')
image_tensor = F.to_tensor(image)

with torch.no_grad():
    outputs = model([image_tensor])  # This may give unreliable results

# Right: Set eval mode
model.eval()
with torch.no_grad():
    outputs = model([image_tensor])  # Reliable inference

📊

Quick Reference

Tips for using pretrained object detection models:

Always use model.eval() before inference.
Convert images to tensors with torchvision.transforms.functional.to_tensor().
Use torch.no_grad() to save memory during inference.
Filter predictions by confidence score to get reliable detections.
Use GPU if available for faster processing with model.to('cuda') and image_tensor.to('cuda').

✅

Key Takeaways

Load pretrained models and set them to evaluation mode before inference.

Prepare input images as tensors in the correct format the model expects.

Use torch.no_grad() to run inference efficiently without tracking gradients.

Extract bounding boxes, labels, and confidence scores from model outputs.

Filter predictions by confidence to keep only reliable detections.