Computer Visionml~8 mins

Why detection localizes objects in images in Computer Vision - Why Metrics Matter

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Why detection localizes objects in images

Which metric matters for this concept and WHY

In object detection, the key metric is Intersection over Union (IoU). It measures how well the predicted box matches the true object box. A higher IoU means the model localizes the object more accurately. Along with IoU, mean Average Precision (mAP) is used to evaluate both detection and localization quality together. These metrics matter because detection is not just about saying "there is an object" but also showing exactly where it is in the image.

Confusion matrix or equivalent visualization (ASCII)

True Positive (TP): Correctly detected and localized objects (IoU > threshold)
False Positive (FP): Detected boxes with no matching true object or low IoU
False Negative (FN): True objects missed by the detector

Example confusion matrix for detection:

           | Detected Object | No Detection |
-----------|-----------------|--------------|
Object     |       TP        |      FN      |
No Object  |       FP        |      TN      |

Note: TN (True Negative) is less meaningful in detection because the background is large.

Precision vs Recall tradeoff with concrete examples

Precision means when the model says "object here," it is usually right. High precision means few false alarms.

Recall means the model finds most of the objects present. High recall means few missed objects.

For example, in self-driving cars, missing a pedestrian (low recall) is dangerous, so recall is critical. But too many false alarms (low precision) can also confuse the system.

Balancing precision and recall depends on the use case. IoU thresholds affect this balance: a higher IoU threshold demands more accurate localization, which can lower recall but improve precision.

What "good" vs "bad" metric values look like for this use case

Good detection model:

IoU > 0.5 for most detected objects
High mAP (e.g., > 0.7) showing good detection and localization
Precision and recall both above 0.8, meaning few false alarms and few misses

Bad detection model:

Low IoU (e.g., < 0.3) meaning boxes do not match objects well
Low mAP (e.g., < 0.4) indicating poor detection or localization
High false positives (low precision) or many missed objects (low recall)

Metrics pitfalls

Accuracy paradox: In images with few objects, a model that predicts no objects can have high accuracy but is useless.
Ignoring IoU: Counting detections without checking localization quality can mislead about model performance.
Data leakage: Testing on images very similar to training can inflate metrics falsely.
Overfitting: Very high training mAP but low test mAP means the model memorizes training images, not generalizing.

Self-check question

Your object detection model has 98% accuracy but only 12% recall on detecting pedestrians. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy is misleading because most of the image is background (no pedestrians). The very low recall means the model misses almost all pedestrians, which is dangerous especially in safety-critical applications like self-driving cars.

Key Result

IoU and mAP are key metrics showing how well detection models find and localize objects accurately.