How to Use Pretrained Object Detection Models in Computer Vision
To use a
pretrained object detection model, load the model with pretrained weights, prepare your input image, and run the model to get detected objects with bounding boxes and labels. Libraries like PyTorch provide easy access to pretrained models such as Faster R-CNN that you can use directly for inference.Syntax
Using a pretrained object detection model typically involves these steps:
- Load the model: Import and load a pretrained model with weights.
- Prepare input: Transform your image into the format the model expects.
- Run inference: Pass the image through the model to get predictions.
- Process output: Extract bounding boxes, labels, and scores from the model output.
python
import torch from torchvision.models.detection import fasterrcnn_resnet50_fpn from torchvision.transforms import functional as F from PIL import Image # Load pretrained Faster R-CNN model model = fasterrcnn_resnet50_fpn(pretrained=True) model.eval() # Set to evaluation mode # Load and prepare image image = Image.open('path_to_image.jpg').convert('RGB') image_tensor = F.to_tensor(image) # Run inference with torch.no_grad(): predictions = model([image_tensor]) # Extract predictions boxes = predictions[0]['boxes'] labels = predictions[0]['labels'] scores = predictions[0]['scores']
Example
This example shows how to load a pretrained Faster R-CNN model, run it on an image, and print detected objects with their confidence scores.
python
import torch from torchvision.models.detection import fasterrcnn_resnet50_fpn from torchvision.transforms import functional as F from PIL import Image # Load pretrained model model = fasterrcnn_resnet50_fpn(pretrained=True) model.eval() # Load image image = Image.open('test_image.jpg').convert('RGB') image_tensor = F.to_tensor(image) # Run inference with torch.no_grad(): outputs = model([image_tensor]) # Print detected objects with scores above 0.8 labels_map = {1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle', 5: 'airplane', 6: 'bus', 7: 'train', 8: 'truck', 9: 'boat', 10: 'traffic light'} for box, label, score in zip(outputs[0]['boxes'], outputs[0]['labels'], outputs[0]['scores']): if score > 0.8: print(f"Detected {labels_map.get(label.item(), 'unknown')} with confidence {score:.2f} at {box.tolist()}")
Output
Detected person with confidence 0.95 at [34.0, 50.0, 200.0, 400.0]
Detected car with confidence 0.87 at [220.0, 80.0, 400.0, 300.0]
Common Pitfalls
- Not setting the model to evaluation mode: Always call
model.eval()before inference to disable training behaviors like dropout. - Incorrect image preprocessing: The model expects images as tensors normalized and in the right shape; skipping this causes errors or bad results.
- Ignoring device placement: For faster inference, move model and tensors to GPU if available using
to('cuda'). - Misinterpreting output: The model returns raw tensors; you must extract boxes, labels, and scores properly.
python
import torch from torchvision.models.detection import fasterrcnn_resnet50_fpn from torchvision.transforms import functional as F from PIL import Image # Wrong: Not setting eval mode model = fasterrcnn_resnet50_fpn(pretrained=True) image = Image.open('test_image.jpg').convert('RGB') image_tensor = F.to_tensor(image) with torch.no_grad(): outputs = model([image_tensor]) # This may give unreliable results # Right: Set eval mode model.eval() with torch.no_grad(): outputs = model([image_tensor]) # Reliable inference
Quick Reference
Tips for using pretrained object detection models:
- Always use
model.eval()before inference. - Convert images to tensors with
torchvision.transforms.functional.to_tensor(). - Use
torch.no_grad()to save memory during inference. - Filter predictions by confidence score to get reliable detections.
- Use GPU if available for faster processing with
model.to('cuda')andimage_tensor.to('cuda').
Key Takeaways
Load pretrained models and set them to evaluation mode before inference.
Prepare input images as tensors in the correct format the model expects.
Use torch.no_grad() to run inference efficiently without tracking gradients.
Extract bounding boxes, labels, and confidence scores from model outputs.
Filter predictions by confidence to keep only reliable detections.