U-Net is mainly used for image segmentation. This means it labels each pixel as part of an object or background. The key metrics are Dice coefficient and Intersection over Union (IoU). They measure how well the predicted mask matches the true mask. Accuracy alone can be misleading because most pixels might be background. Dice and IoU focus on overlap, which is what matters for segmentation quality.
U-Net architecture in Computer Vision - Model Metrics & Evaluation
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP) | False Negative (FN) |
| False Positive (FP) | True Negative (TN) |
Total pixels = TP + FP + TN + FN
Dice coefficient = 2 * TP / (2 * TP + FP + FN)
IoU = TP / (TP + FP + FN)
This matrix counts pixels, not images. TP means pixels correctly labeled as object. FP means pixels wrongly labeled as object. FN means object pixels missed.
Precision means how many predicted object pixels are correct. High precision means few false positives (wrongly labeled pixels).
Recall means how many true object pixels were found. High recall means few false negatives (missed pixels).
For medical images, missing a tumor pixel (low recall) is worse than marking some extra pixels (lower precision). So recall is more important.
For autonomous driving, marking too many pixels as obstacles (low precision) can cause unnecessary stops. So precision matters more.
Dice and IoU balance precision and recall by measuring overlap.
- Good: Dice > 0.8 and IoU > 0.7 usually mean the model segments objects well.
- Bad: Dice < 0.5 or IoU < 0.4 means poor overlap, many pixels are wrong.
- High accuracy (>90%) can be misleading if background dominates and object pixels are few.
- Accuracy paradox: High accuracy but poor segmentation if background pixels dominate.
- Data leakage: Using test images similar to training can inflate metrics.
- Overfitting: Very high training Dice but low test Dice means model memorizes training masks.
- Ignoring class imbalance: Small objects may be missed if metrics do not focus on them.
No, it is not good. The high accuracy likely comes from many background pixels correctly labeled. But 12% recall means the model misses 88% of tumor pixels. This is dangerous in medical use because tumors are not detected. You need to improve recall even if accuracy drops.