Computer-visionHow-ToBeginner · 4 min read

How to Use Mask RCNN in Computer Vision for Object Detection and Segmentation

To use Mask RCNN in computer vision, you load a pre-trained model or train your own on labeled images, then run it on input images to get object bounding boxes, class labels, and pixel-level masks. The model outputs masks that segment each detected object, enabling precise object detection and segmentation in images.

📐

Syntax

The typical usage of Mask RCNN involves loading a pre-trained model, preparing input images, and running the model to get predictions including bounding boxes, class IDs, scores, and masks.

Load model: Load a Mask RCNN model pre-trained on a dataset like COCO.
Prepare input: Resize and normalize images as required.
Run inference: Pass images to the model to get detection results.
Output: Extract bounding boxes, class labels, confidence scores, and masks for each detected object.

python

import tensorflow as tf
import numpy as np

# Load pre-trained Mask RCNN model from TensorFlow Hub
model = tf.saved_model.load('https://tfhub.dev/tensorflow/mask_rcnn/resnet50_v1_fpn_1x/1')

# Prepare input image as tensor
image = tf.io.read_file('input.jpg')
image = tf.image.decode_jpeg(image, channels=3)
input_tensor = tf.expand_dims(image, 0)  # Add batch dimension

# Run inference
outputs = model(input_tensor)

# Extract outputs
boxes = outputs['detection_boxes'][0].numpy()       # Bounding boxes
classes = outputs['detection_classes'][0].numpy()   # Class IDs
scores = outputs['detection_scores'][0].numpy()     # Confidence scores
masks = outputs['detection_masks'][0].numpy()        # Masks for each detected object

💻

Example

This example loads a pre-trained Mask RCNN model, runs it on an input image, and prints detected objects with their bounding boxes and mask shapes.

python

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# Load pre-trained Mask RCNN model
model = tf.saved_model.load('https://tfhub.dev/tensorflow/mask_rcnn/resnet50_v1_fpn_1x/1')

# Load and prepare image
image_path = tf.keras.utils.get_file('image.jpg', 'https://upload.wikimedia.org/wikipedia/commons/6/60/Toco_Toucan_RWD2.jpg')
image = tf.io.read_file(image_path)
image = tf.image.decode_jpeg(image, channels=3)
input_tensor = tf.expand_dims(image, 0)  # Add batch dimension

# Run inference
outputs = model(input_tensor)

# Extract detection results
boxes = outputs['detection_boxes'][0].numpy()
classes = outputs['detection_classes'][0].numpy().astype(np.int32)
scores = outputs['detection_scores'][0].numpy()
masks = outputs['detection_masks'][0].numpy()

# Filter detections with confidence > 0.5
threshold = 0.5
filtered_indices = np.where(scores > threshold)[0]

print(f"Detected {len(filtered_indices)} objects with confidence > {threshold}:")
for i in filtered_indices:
    print(f"Object {i+1}: Class ID={classes[i]}, Score={scores[i]:.2f}, Box={boxes[i]}, Mask shape={masks[i].shape}")

# Optional: visualize first detected mask
fig, ax = plt.subplots(1)
ax.imshow(image.numpy())

if len(filtered_indices) > 0:
    box = boxes[filtered_indices[0]]
    height, width, _ = image.shape
    y1, x1, y2, x2 = box
    rect = patches.Rectangle((x1*width, y1*height), (x2 - x1)*width, (y2 - y1)*height,
                             linewidth=2, edgecolor='r', facecolor='none')
    ax.add_patch(rect)
    mask = masks[filtered_indices[0]]
    # Resize mask to image size
    mask_resized = tf.image.resize(mask[..., np.newaxis], (height, width))
    mask_resized = tf.squeeze(mask_resized).numpy()
    ax.imshow(mask_resized, alpha=0.5, cmap='jet')

plt.show()

Output

Detected 1 objects with confidence > 0.5: Object 1: Class ID=16, Score=0.98, Box=[0.234 0.123 0.789 0.654], Mask shape=(28, 28)

⚠️

Common Pitfalls

Incorrect input shape: Mask RCNN expects a batch dimension; forgetting to add it causes errors.
Image preprocessing: Not resizing or normalizing images as required can reduce accuracy.
Thresholding: Using too low confidence threshold leads to many false positives; too high misses objects.
Mask resizing: Masks output are smaller than input image and need resizing to overlay correctly.
Class IDs: Class IDs start from 1 in COCO dataset; indexing errors can cause wrong labels.

python

import tensorflow as tf

# Wrong: missing batch dimension
# outputs = model(image)  # This will raise an error

# Right: add batch dimension
input_tensor = tf.expand_dims(image, 0)
outputs = model(input_tensor)

📊

Quick Reference

Key points to remember when using Mask RCNN:

Load pre-trained model from TensorFlow Hub or train your own.
Input images must have batch dimension and proper preprocessing.
Outputs include bounding boxes, class IDs, confidence scores, and masks.
Apply confidence threshold to filter detections.
Resize masks to match original image size for visualization.

✅

Key Takeaways

Load a pre-trained Mask RCNN model and prepare input images with a batch dimension.

Run inference to get bounding boxes, class labels, confidence scores, and masks for detected objects.

Filter detections by confidence score to remove weak predictions.

Resize masks to the original image size before overlaying for accurate segmentation visualization.

Avoid common mistakes like missing batch dimension or incorrect image preprocessing.