In object detection, bounding boxes show where objects are in images. The key metric is Intersection over Union (IoU). It measures how much the predicted box overlaps the true box. A higher IoU means better prediction. IoU helps us know if the box is placed well and sized correctly.
Bounding box representation in Computer Vision - Model Metrics & Evaluation
Bounding box evaluation uses IoU threshold to decide if a prediction is correct (True Positive) or wrong (False Positive). For example:
Ground Truth Box: [x1=30, y1=40, x2=70, y2=80]
Predicted Box: [x1=35, y1=45, x2=75, y2=85]
IoU = Area of Overlap / Area of Union
If IoU >= 0.5, count as True Positive (TP)
Else, count as False Positive (FP)
Confusion counts:
TP = 1
FP = 0
FN = 0 (missed boxes)
TN = Not used in bounding box detection
Precision means how many predicted boxes are correct. High precision means few false boxes.
Recall means how many true boxes are found. High recall means few missed objects.
Example:
- If you want to avoid false alarms (like detecting a cat where there is none), focus on high precision.
- If you want to find all objects (like spotting every car in traffic), focus on high recall.
Usually, increasing recall lowers precision and vice versa. IoU threshold tuning affects this tradeoff.
Good: IoU >= 0.7 means boxes overlap well. Precision and recall above 0.8 show reliable detection.
Bad: IoU < 0.5 means poor overlap. Precision or recall below 0.5 means many wrong or missed boxes.
Example: IoU = 0.3 means predicted box barely covers the object, so detection is poor.
- Ignoring IoU threshold: Counting all predicted boxes as correct without overlap check leads to falsely high accuracy.
- Data leakage: Testing on images seen during training inflates metrics.
- Overfitting: Model predicts training boxes perfectly but fails on new images, causing low recall.
- Confusing precision and recall: High precision but low recall means many objects missed.
Your model has 98% accuracy but average IoU of 0.4 and recall of 30%. Is it good?
Answer: No. High accuracy here is misleading because accuracy does not reflect bounding box quality. IoU of 0.4 is low, meaning boxes poorly overlap objects. Recall of 30% means many objects are missed. The model needs improvement to detect objects correctly and completely.