Computer Visionml~8 mins

Mask R-CNN overview in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Mask R-CNN overview

Which metric matters for Mask R-CNN and WHY

Mask R-CNN is used to find objects and their exact shapes in images. So, we care about how well it finds the right objects and how well it draws their shapes.

The key metrics are:

Mean Average Precision (mAP): Measures how well the model finds and correctly labels objects. It combines precision and recall over different thresholds.
Intersection over Union (IoU): Measures how closely the predicted mask matches the true object shape. Higher IoU means better shape accuracy.
Precision and Recall: Precision tells us how many predicted objects are correct. Recall tells us how many true objects were found.

We use these because Mask R-CNN does two things: detect objects and segment their shapes. Both need to be accurate.

Confusion matrix for object detection

For each object class, we can build a confusion matrix like this:

      | Predicted Object | No Object |
      |------------------|-----------|
      | True Positive (TP) | False Positive (FP) |
      | False Negative (FN) | True Negative (TN) |

Example for one class:

      TP = 80 (correctly detected objects)
      FP = 20 (wrongly detected objects)
      FN = 10 (missed objects)
      TN = not usually counted in detection

Precision = TP / (TP + FP) = 80 / (80 + 20) = 0.8

Recall = TP / (TP + FN) = 80 / (80 + 10) = 0.89

Precision vs Recall tradeoff in Mask R-CNN

Imagine a security camera detecting people wearing masks. If the model is very strict, it finds fewer people but is usually right (high precision, low recall).

If it is very loose, it finds almost everyone but sometimes mistakes objects for people (high recall, low precision).

For Mask R-CNN, balancing precision and recall is important. Too many false positives (low precision) means wrong objects or masks. Too many false negatives (low recall) means missing objects.

We adjust thresholds to find the best balance for the task.

Good vs Bad metric values for Mask R-CNN

Good values:

mAP above 0.7 means the model finds and labels objects well.
IoU above 0.5 means the predicted masks closely match true shapes.
Precision and recall both above 0.7 means balanced detection.

Bad values:

mAP below 0.4 means poor detection and labeling.
IoU below 0.3 means masks are not accurate.
Precision very low (<0.5) means many false detections.
Recall very low (<0.5) means many objects missed.

Common pitfalls in Mask R-CNN metrics

Ignoring mask quality: Only checking bounding box accuracy misses mask shape errors.
Data leakage: Testing on images seen during training inflates metrics.
Overfitting: High training mAP but low test mAP means model memorizes training data.
Imbalanced classes: Rare objects may have low recall but high overall accuracy.
Threshold choice: Changing detection confidence threshold affects precision and recall.

Self-check question

Your Mask R-CNN model has 98% accuracy but only 12% recall on small objects. Is it good for production?

Answer: No. The model misses most small objects (low recall), so it is not reliable for tasks needing those detections, despite high overall accuracy.

Key Result

Mask R-CNN performance is best judged by mean Average Precision (mAP) and Intersection over Union (IoU), balancing precision and recall for accurate object detection and mask quality.