What is torchvision detection models in PyTorch?

PyTorchml~5 mins

torchvision detection models in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Detection models help computers find and label objects in pictures, like spotting cars or people.

You want to find all the faces in a photo.

You need to detect cars and pedestrians in street images for a self-driving car.

You want to count how many animals are in a wildlife photo.

You want to highlight objects in a video for security cameras.

Syntax

PyTorch

import torchvision.models.detection as detection
model = detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

Use pretrained=True to load a model already trained on common objects.

Call model.eval() to set the model for evaluation (not training).

Examples

Loads a Faster R-CNN model with a ResNet-50 backbone, ready to detect objects.

PyTorch

import torchvision.models.detection as detection
model = detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

Loads a Mask R-CNN model that detects objects and also creates masks showing object shapes.

PyTorch

import torchvision.models.detection as detection
model = detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

Loads a RetinaNet model, which is fast and good for detecting many objects.

PyTorch

import torchvision.models.detection as detection
model = detection.retinanet_resnet50_fpn(pretrained=True)
model.eval()

Sample Model

This program loads an image, applies a pretrained Faster R-CNN model to detect objects, and prints labels and confidence scores for detections above 50%.

PyTorch

import torch
from PIL import Image
import torchvision.transforms as T
import torchvision.models.detection as detection

# Load a sample image
img = Image.open("sample.jpg")

# Transform image to tensor
transform = T.Compose([T.ToTensor()])
img_tensor = transform(img)

# Load pretrained Faster R-CNN model
model = detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Run detection
with torch.no_grad():
    predictions = model([img_tensor])

# Print detected labels and scores
labels = predictions[0]['labels']
scores = predictions[0]['scores']
print("Detected objects and scores:")
for label, score in zip(labels, scores):
    if score > 0.5:
        print(f"Label: {label.item()}, Score: {score.item():.2f}")

OutputSuccess

Important Notes

Labels are numbers that correspond to object categories (like 1 = person, 3 = car).

You can find label names in COCO dataset documentation to understand what each number means.

Make sure your input images are converted to tensors and normalized if needed.

Summary

torchvision provides ready-to-use detection models like Faster R-CNN, Mask R-CNN, and RetinaNet.

These models help find and label objects in images with good accuracy.

Use pretrained models for quick results without training from scratch.