Faster R-CNN helps find and label objects in pictures quickly and accurately.
Faster R-CNN usage in PyTorch
import torchvision model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) model.eval() # To predict on an image: # 1. Load and transform the image # 2. Pass it to the model # 3. Get predictions with boxes, labels, and scores
Use pretrained=True to get a model already trained on common objects.
Call model.eval() to set the model for prediction (not training).
import torchvision model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) model.eval()
from PIL import Image from torchvision import transforms image = Image.open('image.jpg') transform = transforms.Compose([ transforms.ToTensor() ]) image_tensor = transform(image)
import torch with torch.no_grad(): predictions = model([image_tensor])
This code loads a pretrained Faster R-CNN model, creates a blank image, transforms it, and runs prediction. It prints the keys in the prediction dictionary and some example outputs.
import torch from PIL import Image from torchvision import transforms import torchvision # Load pretrained Faster R-CNN model model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) model.eval() # Load and transform image image = Image.new('RGB', (300, 300), color='white') # blank white image transform = transforms.Compose([ transforms.ToTensor() ]) image_tensor = transform(image) # Predict with torch.no_grad(): predictions = model([image_tensor]) # Print prediction keys and example outputs print('Prediction keys:', predictions[0].keys()) print('Boxes shape:', predictions[0]['boxes'].shape) print('Labels:', predictions[0]['labels']) print('Scores:', predictions[0]['scores'])
Faster R-CNN expects input images as tensors with shape [3, height, width] and values between 0 and 1.
Predictions include bounding boxes, labels (class IDs), and confidence scores.
On images without detectable objects, predictions may be empty (zero boxes).
Faster R-CNN finds objects in images and gives their locations and labels.
Use pretrained models for quick start and evaluation mode for prediction.
Transform images to tensors before passing to the model.