0
0
Computer-visionHow-ToBeginner ยท 3 min read

Instance Segmentation in Python for Computer Vision

To do instance segmentation in Python, use a pre-trained model like Mask R-CNN from the torchvision library. Load the model, prepare your image, and run the model to get masks, bounding boxes, and labels for each object instance.
๐Ÿ“

Syntax

Here is the basic syntax to perform instance segmentation using a pre-trained Mask R-CNN model from torchvision:

  • model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True): Load the pre-trained Mask R-CNN model.
  • model.eval(): Set the model to evaluation mode.
  • image_tensor = transform(image): Convert the input image to a tensor and normalize it.
  • outputs = model([image_tensor]): Run the model on the image tensor to get predictions.
  • outputs contains masks, boxes, and labels for each detected object.
python
import torchvision
from torchvision import transforms
from PIL import Image

# Load pre-trained Mask R-CNN
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Prepare image transform
transform = transforms.Compose([
    transforms.ToTensor(),
])

# Load image
image = Image.open('your_image.jpg')
image_tensor = transform(image)

# Run model
outputs = model([image_tensor])
๐Ÿ’ป

Example

This example loads an image, runs instance segmentation with Mask R-CNN, and prints the number of detected objects and their labels.

python
import torchvision
from torchvision import transforms
from PIL import Image
import torch

# Load pre-trained Mask R-CNN
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Prepare image transform
transform = transforms.Compose([
    transforms.ToTensor(),
])

# Load image
image = Image.open('your_image.jpg')
image_tensor = transform(image)

# Run model
with torch.no_grad():
    outputs = model([image_tensor])

# Extract results
masks = outputs[0]['masks']  # Masks for each detected object
labels = outputs[0]['labels']  # Class labels
scores = outputs[0]['scores']  # Confidence scores

# Filter out low confidence detections
threshold = 0.5
keep = scores > threshold
num_objects = keep.sum().item()
filtered_labels = labels[keep]

print(f"Detected {num_objects} objects with confidence > {threshold}.")
print(f"Labels: {filtered_labels.tolist()}")
Output
Detected 3 objects with confidence > 0.5. Labels: [1, 18, 3]
โš ๏ธ

Common Pitfalls

  • Not setting the model to evaluation mode: Forgetting model.eval() can cause inconsistent results.
  • Incorrect image preprocessing: The model expects images as tensors normalized between 0 and 1.
  • Ignoring confidence scores: Always filter predictions by confidence to avoid false positives.
  • Using CPU without disabling gradients: Use torch.no_grad() to speed up inference and reduce memory.
python
import torchvision
from torchvision import transforms
from PIL import Image
import torch

# Wrong: missing model.eval() and torch.no_grad()
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

transform = transforms.Compose([
    transforms.ToTensor(),
])
image = Image.open('your_image.jpg')
image_tensor = transform(image)
outputs = model([image_tensor])  # This runs with gradients and training mode

# Right way:
model.eval()
with torch.no_grad():
    outputs = model([image_tensor])
๐Ÿ“Š

Quick Reference

Key steps for instance segmentation in Python:

  • Load a pre-trained Mask R-CNN model from torchvision.
  • Set the model to evaluation mode with model.eval().
  • Transform input images to tensors normalized between 0 and 1.
  • Run the model inside torch.no_grad() for inference.
  • Filter predictions by confidence scores to keep reliable detections.
โœ…

Key Takeaways

Use a pre-trained Mask R-CNN model from torchvision for easy instance segmentation.
Always set the model to evaluation mode with model.eval() before inference.
Transform images to tensors normalized between 0 and 1 before passing to the model.
Run inference inside torch.no_grad() to save memory and speed up prediction.
Filter predictions by confidence scores to avoid false positives.