How to Use Mask RCNN in Computer Vision for Object Detection and Segmentation
To use
Mask RCNN in computer vision, you load a pre-trained model or train your own on labeled images, then run it on input images to get object bounding boxes, class labels, and pixel-level masks. The model outputs masks that segment each detected object, enabling precise object detection and segmentation in images.Syntax
The typical usage of Mask RCNN involves loading a pre-trained model, preparing input images, and running the model to get predictions including bounding boxes, class IDs, scores, and masks.
- Load model: Load a Mask RCNN model pre-trained on a dataset like COCO.
- Prepare input: Resize and normalize images as required.
- Run inference: Pass images to the model to get detection results.
- Output: Extract bounding boxes, class labels, confidence scores, and masks for each detected object.
python
import tensorflow as tf import numpy as np # Load pre-trained Mask RCNN model from TensorFlow Hub model = tf.saved_model.load('https://tfhub.dev/tensorflow/mask_rcnn/resnet50_v1_fpn_1x/1') # Prepare input image as tensor image = tf.io.read_file('input.jpg') image = tf.image.decode_jpeg(image, channels=3) input_tensor = tf.expand_dims(image, 0) # Add batch dimension # Run inference outputs = model(input_tensor) # Extract outputs boxes = outputs['detection_boxes'][0].numpy() # Bounding boxes classes = outputs['detection_classes'][0].numpy() # Class IDs scores = outputs['detection_scores'][0].numpy() # Confidence scores masks = outputs['detection_masks'][0].numpy() # Masks for each detected object
Example
This example loads a pre-trained Mask RCNN model, runs it on an input image, and prints detected objects with their bounding boxes and mask shapes.
python
import tensorflow as tf import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as patches # Load pre-trained Mask RCNN model model = tf.saved_model.load('https://tfhub.dev/tensorflow/mask_rcnn/resnet50_v1_fpn_1x/1') # Load and prepare image image_path = tf.keras.utils.get_file('image.jpg', 'https://upload.wikimedia.org/wikipedia/commons/6/60/Toco_Toucan_RWD2.jpg') image = tf.io.read_file(image_path) image = tf.image.decode_jpeg(image, channels=3) input_tensor = tf.expand_dims(image, 0) # Add batch dimension # Run inference outputs = model(input_tensor) # Extract detection results boxes = outputs['detection_boxes'][0].numpy() classes = outputs['detection_classes'][0].numpy().astype(np.int32) scores = outputs['detection_scores'][0].numpy() masks = outputs['detection_masks'][0].numpy() # Filter detections with confidence > 0.5 threshold = 0.5 filtered_indices = np.where(scores > threshold)[0] print(f"Detected {len(filtered_indices)} objects with confidence > {threshold}:") for i in filtered_indices: print(f"Object {i+1}: Class ID={classes[i]}, Score={scores[i]:.2f}, Box={boxes[i]}, Mask shape={masks[i].shape}") # Optional: visualize first detected mask fig, ax = plt.subplots(1) ax.imshow(image.numpy()) if len(filtered_indices) > 0: box = boxes[filtered_indices[0]] height, width, _ = image.shape y1, x1, y2, x2 = box rect = patches.Rectangle((x1*width, y1*height), (x2 - x1)*width, (y2 - y1)*height, linewidth=2, edgecolor='r', facecolor='none') ax.add_patch(rect) mask = masks[filtered_indices[0]] # Resize mask to image size mask_resized = tf.image.resize(mask[..., np.newaxis], (height, width)) mask_resized = tf.squeeze(mask_resized).numpy() ax.imshow(mask_resized, alpha=0.5, cmap='jet') plt.show()
Output
Detected 1 objects with confidence > 0.5:
Object 1: Class ID=16, Score=0.98, Box=[0.234 0.123 0.789 0.654], Mask shape=(28, 28)
Common Pitfalls
- Incorrect input shape: Mask RCNN expects a batch dimension; forgetting to add it causes errors.
- Image preprocessing: Not resizing or normalizing images as required can reduce accuracy.
- Thresholding: Using too low confidence threshold leads to many false positives; too high misses objects.
- Mask resizing: Masks output are smaller than input image and need resizing to overlay correctly.
- Class IDs: Class IDs start from 1 in COCO dataset; indexing errors can cause wrong labels.
python
import tensorflow as tf # Wrong: missing batch dimension # outputs = model(image) # This will raise an error # Right: add batch dimension input_tensor = tf.expand_dims(image, 0) outputs = model(input_tensor)
Quick Reference
Key points to remember when using Mask RCNN:
- Load pre-trained model from TensorFlow Hub or train your own.
- Input images must have batch dimension and proper preprocessing.
- Outputs include bounding boxes, class IDs, confidence scores, and masks.
- Apply confidence threshold to filter detections.
- Resize masks to match original image size for visualization.
Key Takeaways
Load a pre-trained Mask RCNN model and prepare input images with a batch dimension.
Run inference to get bounding boxes, class labels, confidence scores, and masks for detected objects.
Filter detections by confidence score to remove weak predictions.
Resize masks to the original image size before overlaying for accurate segmentation visualization.
Avoid common mistakes like missing batch dimension or incorrect image preprocessing.