Mask R-CNN is a popular model in computer vision. What is its main task?
Think about what extra output Mask R-CNN produces beyond bounding boxes.
Mask R-CNN extends object detection by adding a mask prediction branch that outputs a pixel-level mask for each detected object.
Mask R-CNN builds on Faster R-CNN. Which part is added uniquely in Mask R-CNN?
Consider what extra output Mask R-CNN produces that Faster R-CNN does not.
Mask R-CNN adds a mask prediction branch that outputs a binary mask for each detected object, which Faster R-CNN lacks.
Assume Mask R-CNN outputs a mask tensor for one detected object. If the mask size is 28x28 pixels, what is the shape of this mask output?
mask_output = model.predict_mask(single_object_region)
print(mask_output.shape)Mask R-CNN outputs masks with a batch or channel dimension.
The mask output shape includes a channel dimension for the single mask, so it is (1, 28, 28).
When assessing Mask R-CNN's performance, which metric specifically measures how well the predicted masks match the true object shapes?
Think about a metric that compares overlap between predicted and true masks.
IoU measures the overlap between predicted and ground truth masks, making it ideal for evaluating mask quality.
After training Mask R-CNN, you notice bounding boxes are accurate but masks are poor. Which issue is the most likely cause?
Consider what affects mask detail quality in the network.
If the mask branch uses low resolution features, it cannot capture fine details, leading to poor masks despite good boxes.