How to do instance segmentation python in computer vision

Computer-visionHow-ToBeginner · 3 min read

Instance Segmentation in Python for Computer Vision

To do instance segmentation in Python, use a pre-trained model like Mask R-CNN from the torchvision library. Load the model, prepare your image, and run the model to get masks, bounding boxes, and labels for each object instance.

📐

Syntax

Here is the basic syntax to perform instance segmentation using a pre-trained Mask R-CNN model from torchvision:

model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True): Load the pre-trained Mask R-CNN model.
model.eval(): Set the model to evaluation mode.
image_tensor = transform(image): Convert the input image to a tensor and normalize it.
outputs = model([image_tensor]): Run the model on the image tensor to get predictions.
outputs contains masks, boxes, and labels for each detected object.

python

import torchvision
from torchvision import transforms
from PIL import Image

# Load pre-trained Mask R-CNN
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Prepare image transform
transform = transforms.Compose([
    transforms.ToTensor(),
])

# Load image
image = Image.open('your_image.jpg')
image_tensor = transform(image)

# Run model
outputs = model([image_tensor])

💻

Example

This example loads an image, runs instance segmentation with Mask R-CNN, and prints the number of detected objects and their labels.

python

import torchvision
from torchvision import transforms
from PIL import Image
import torch

# Load pre-trained Mask R-CNN
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Prepare image transform
transform = transforms.Compose([
    transforms.ToTensor(),
])

# Load image
image = Image.open('your_image.jpg')
image_tensor = transform(image)

# Run model
with torch.no_grad():
    outputs = model([image_tensor])

# Extract results
masks = outputs[0]['masks']  # Masks for each detected object
labels = outputs[0]['labels']  # Class labels
scores = outputs[0]['scores']  # Confidence scores

# Filter out low confidence detections
threshold = 0.5
keep = scores > threshold
num_objects = keep.sum().item()
filtered_labels = labels[keep]

print(f"Detected {num_objects} objects with confidence > {threshold}.")
print(f"Labels: {filtered_labels.tolist()}")

Output

Detected 3 objects with confidence > 0.5. Labels: [1, 18, 3]

⚠️

Common Pitfalls

Not setting the model to evaluation mode: Forgetting model.eval() can cause inconsistent results.
Incorrect image preprocessing: The model expects images as tensors normalized between 0 and 1.
Ignoring confidence scores: Always filter predictions by confidence to avoid false positives.
Using CPU without disabling gradients: Use torch.no_grad() to speed up inference and reduce memory.

python

import torchvision
from torchvision import transforms
from PIL import Image
import torch

# Wrong: missing model.eval() and torch.no_grad()
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

transform = transforms.Compose([
    transforms.ToTensor(),
])
image = Image.open('your_image.jpg')
image_tensor = transform(image)
outputs = model([image_tensor])  # This runs with gradients and training mode

# Right way:
model.eval()
with torch.no_grad():
    outputs = model([image_tensor])

📊

Quick Reference

Key steps for instance segmentation in Python:

Load a pre-trained Mask R-CNN model from torchvision.
Set the model to evaluation mode with model.eval().
Transform input images to tensors normalized between 0 and 1.
Run the model inside torch.no_grad() for inference.
Filter predictions by confidence scores to keep reliable detections.

✅

Key Takeaways

Use a pre-trained Mask R-CNN model from torchvision for easy instance segmentation.

Always set the model to evaluation mode with model.eval() before inference.

Transform images to tensors normalized between 0 and 1 before passing to the model.

Run inference inside torch.no_grad() to save memory and speed up prediction.

Filter predictions by confidence scores to avoid false positives.