Computer Visionml~8 mins

Medical image segmentation basics in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Medical image segmentation basics

Which metric matters for medical image segmentation and WHY

In medical image segmentation, we want to measure how well the model separates important areas, like tumors, from the rest. The key metrics are Dice coefficient and Intersection over Union (IoU). These metrics compare the overlap between the model's predicted area and the true area marked by doctors.

Dice and IoU tell us how much the predicted shape matches the real shape. High values mean the model is good at finding the exact regions, which is critical for treatment planning.

Confusion matrix for segmentation (pixel-wise)

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP)  | False Negative (FN) |
      | False Positive (FP) | True Negative (TN)  |

    TP: Pixels correctly predicted as part of the target region.
    FP: Pixels wrongly predicted as part of the target region.
    FN: Pixels missed by the model but actually part of the target.
    TN: Pixels correctly predicted as background.

From these, Dice = 2*TP / (2*TP + FP + FN) and IoU = TP / (TP + FP + FN).

Precision vs Recall tradeoff with examples

Precision measures how many predicted pixels are truly part of the target. High precision means few false alarms.

Recall measures how many true target pixels the model found. High recall means few misses.

In medical segmentation, missing a tumor pixel (low recall) can be dangerous. So recall is often more important.

Example: If the model marks many pixels as tumor but some are wrong (low precision), doctors can review and remove mistakes. But if the model misses tumor pixels (low recall), it risks missing disease.

What good vs bad metric values look like

Good: Dice and IoU above 0.8 show strong overlap, meaning the model closely matches the true region.

Bad: Dice below 0.5 means poor overlap, so the model misses or wrongly marks many pixels.

Precision and recall should both be high (above 0.8) for reliable segmentation.

Common pitfalls in segmentation metrics

Accuracy paradox: Since most pixels are background, a model predicting all background can have high accuracy but zero usefulness.
Data leakage: Using images from the same patient in training and testing can inflate metrics falsely.
Overfitting: Very high training Dice but low test Dice means the model memorizes training images but fails on new ones.
Ignoring class imbalance: Small target regions need metrics like Dice, not accuracy, to fairly evaluate.

Self-check question

Your model has 98% accuracy but 12% recall on tumor pixels. Is it good for production? Why not?

Answer: No, it is not good. The high accuracy is misleading because most pixels are background. The very low recall means the model misses almost all tumor pixels, which is dangerous in medical use.

Key Result

Dice coefficient and recall are key metrics to ensure accurate and safe medical image segmentation.