0
0
PyTorchml~5 mins

torchvision detection models in PyTorch

Choose your learning style9 modes available
Introduction
Detection models help computers find and label objects in pictures, like spotting cars or people.
You want to find all the faces in a photo.
You need to detect cars and pedestrians in street images for a self-driving car.
You want to count how many animals are in a wildlife photo.
You want to highlight objects in a video for security cameras.
Syntax
PyTorch
import torchvision.models.detection as detection
model = detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
Use pretrained=True to load a model already trained on common objects.
Call model.eval() to set the model for evaluation (not training).
Examples
Loads a Faster R-CNN model with a ResNet-50 backbone, ready to detect objects.
PyTorch
import torchvision.models.detection as detection
model = detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
Loads a Mask R-CNN model that detects objects and also creates masks showing object shapes.
PyTorch
import torchvision.models.detection as detection
model = detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
Loads a RetinaNet model, which is fast and good for detecting many objects.
PyTorch
import torchvision.models.detection as detection
model = detection.retinanet_resnet50_fpn(pretrained=True)
model.eval()
Sample Model
This program loads an image, applies a pretrained Faster R-CNN model to detect objects, and prints labels and confidence scores for detections above 50%.
PyTorch
import torch
from PIL import Image
import torchvision.transforms as T
import torchvision.models.detection as detection

# Load a sample image
img = Image.open("sample.jpg")

# Transform image to tensor
transform = T.Compose([T.ToTensor()])
img_tensor = transform(img)

# Load pretrained Faster R-CNN model
model = detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Run detection
with torch.no_grad():
    predictions = model([img_tensor])

# Print detected labels and scores
labels = predictions[0]['labels']
scores = predictions[0]['scores']
print("Detected objects and scores:")
for label, score in zip(labels, scores):
    if score > 0.5:
        print(f"Label: {label.item()}, Score: {score.item():.2f}")
OutputSuccess
Important Notes
Labels are numbers that correspond to object categories (like 1 = person, 3 = car).
You can find label names in COCO dataset documentation to understand what each number means.
Make sure your input images are converted to tensors and normalized if needed.
Summary
torchvision provides ready-to-use detection models like Faster R-CNN, Mask R-CNN, and RetinaNet.
These models help find and label objects in images with good accuracy.
Use pretrained models for quick results without training from scratch.