Computer Visionml~8 mins

Semantic segmentation vs instance segmentation in Computer Vision - Metrics Comparison

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Semantic segmentation vs instance segmentation

Which metric matters and WHY

For semantic segmentation, the key metric is mean Intersection over Union (mIoU). It measures how well the model labels each pixel correctly for each class, ignoring individual object instances. This is important because semantic segmentation cares about classifying every pixel into a category.

For instance segmentation, metrics like Average Precision (AP) at different Intersection over Union (IoU) thresholds matter. AP measures how well the model detects and segments each individual object instance. This is crucial because instance segmentation must separate objects of the same class.

Confusion matrix or equivalent visualization

Semantic segmentation confusion matrix example (pixel-level):

      | Predicted Cat | Predicted Dog | Predicted Background |
      |---------------|---------------|----------------------|
      | Cat pixels:  80 (TP) | 10 (FN)       | 5 (FN)               |
      | Dog pixels:  8 (FP)  | 70 (TP)       | 7 (FN)               |
      | Background:  3 (FP)  | 5 (FP)        | 900 (TN)             |

Instance segmentation confusion is more complex, involving matching predicted masks to ground truth masks and calculating IoU overlap. A simplified example:

      Ground Truth Instances: 3
      Predicted Instances: 4
      Matches (IoU > 0.5): 2 (TP)
      False Positives: 2
      False Negatives: 1

Precision vs Recall tradeoff with examples

In semantic segmentation, high recall means most pixels of a class are found, but precision ensures pixels are not wrongly labeled. For example, in medical scans, missing tumor pixels (low recall) is worse than some extra pixels labeled (lower precision).

In instance segmentation, high precision means detected objects are mostly correct, while high recall means most objects are found. For example, in self-driving cars, missing a pedestrian (low recall) is dangerous, so recall is prioritized even if some false detections occur.

What "good" vs "bad" metric values look like

Semantic segmentation:

Good mIoU: > 70% means most pixels are correctly labeled.
Bad mIoU: < 40% means many pixels are mislabeled.

Instance segmentation:

Good AP: > 50% means most objects are correctly detected and segmented.
Bad AP: < 20% means many objects are missed or wrongly segmented.

Common pitfalls in metrics

Accuracy paradox: High pixel accuracy in semantic segmentation can be misleading if background dominates.
Data leakage: Training and test images overlap can inflate metrics falsely.
Overfitting: Very high training mIoU or AP but low test scores means model memorizes data, not generalizes.
Ignoring instance separation: Using semantic metrics for instance tasks misses errors in separating objects.

Self-check question

Your instance segmentation model has 98% pixel accuracy but only 12% recall on detecting objects. Is it good for production? Why or why not?

Answer: No, it is not good. High pixel accuracy can come from correctly labeling background pixels, but 12% recall means the model misses most objects. For instance segmentation, finding objects (high recall) is critical, so this model would fail in real use.

Key Result

Semantic segmentation focuses on pixel-level accuracy (mIoU), while instance segmentation requires precise object detection and separation (AP), balancing precision and recall differently.