Instance Segmentation in Python for Computer Vision
To do
instance segmentation in Python, use a pre-trained model like Mask R-CNN from the torchvision library. Load the model, prepare your image, and run the model to get masks, bounding boxes, and labels for each object instance.Syntax
Here is the basic syntax to perform instance segmentation using a pre-trained Mask R-CNN model from torchvision:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True): Load the pre-trained Mask R-CNN model.model.eval(): Set the model to evaluation mode.image_tensor = transform(image): Convert the input image to a tensor and normalize it.outputs = model([image_tensor]): Run the model on the image tensor to get predictions.outputscontains masks, boxes, and labels for each detected object.
python
import torchvision from torchvision import transforms from PIL import Image # Load pre-trained Mask R-CNN model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True) model.eval() # Prepare image transform transform = transforms.Compose([ transforms.ToTensor(), ]) # Load image image = Image.open('your_image.jpg') image_tensor = transform(image) # Run model outputs = model([image_tensor])
Example
This example loads an image, runs instance segmentation with Mask R-CNN, and prints the number of detected objects and their labels.
python
import torchvision from torchvision import transforms from PIL import Image import torch # Load pre-trained Mask R-CNN model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True) model.eval() # Prepare image transform transform = transforms.Compose([ transforms.ToTensor(), ]) # Load image image = Image.open('your_image.jpg') image_tensor = transform(image) # Run model with torch.no_grad(): outputs = model([image_tensor]) # Extract results masks = outputs[0]['masks'] # Masks for each detected object labels = outputs[0]['labels'] # Class labels scores = outputs[0]['scores'] # Confidence scores # Filter out low confidence detections threshold = 0.5 keep = scores > threshold num_objects = keep.sum().item() filtered_labels = labels[keep] print(f"Detected {num_objects} objects with confidence > {threshold}.") print(f"Labels: {filtered_labels.tolist()}")
Output
Detected 3 objects with confidence > 0.5.
Labels: [1, 18, 3]
Common Pitfalls
- Not setting the model to evaluation mode: Forgetting
model.eval()can cause inconsistent results. - Incorrect image preprocessing: The model expects images as tensors normalized between 0 and 1.
- Ignoring confidence scores: Always filter predictions by confidence to avoid false positives.
- Using CPU without disabling gradients: Use
torch.no_grad()to speed up inference and reduce memory.
python
import torchvision from torchvision import transforms from PIL import Image import torch # Wrong: missing model.eval() and torch.no_grad() model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True) transform = transforms.Compose([ transforms.ToTensor(), ]) image = Image.open('your_image.jpg') image_tensor = transform(image) outputs = model([image_tensor]) # This runs with gradients and training mode # Right way: model.eval() with torch.no_grad(): outputs = model([image_tensor])
Quick Reference
Key steps for instance segmentation in Python:
- Load a pre-trained Mask R-CNN model from
torchvision. - Set the model to evaluation mode with
model.eval(). - Transform input images to tensors normalized between 0 and 1.
- Run the model inside
torch.no_grad()for inference. - Filter predictions by confidence scores to keep reliable detections.
Key Takeaways
Use a pre-trained Mask R-CNN model from torchvision for easy instance segmentation.
Always set the model to evaluation mode with model.eval() before inference.
Transform images to tensors normalized between 0 and 1 before passing to the model.
Run inference inside torch.no_grad() to save memory and speed up prediction.
Filter predictions by confidence scores to avoid false positives.