In object tracking, we want to know how well the model follows the object over time. Key metrics are Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP). MOTA measures how many mistakes the tracker makes, like missing objects or wrong matches. MOTP measures how close the predicted object positions are to the true positions. These metrics help us understand both the correctness and the precision of tracking.
Object tracking basics in Computer Vision - Model Metrics & Evaluation
Tracking Confusion Matrix (simplified):
| Predicted Object Present | Predicted Object Absent |
------|--------------------------|-------------------------|
True | True Positive (TP) | False Negative (FN) |
False | False Positive (FP) | True Negative (TN) |
TP: Correctly tracked objects
FP: Tracker reports object but none exists
FN: Tracker misses an object
Total objects = TP + FN
Total predictions = TP + FP
Precision tells us how many tracked objects are actually correct. High precision means few false alarms.
Recall tells us how many true objects were found by the tracker. High recall means few missed objects.
Example: In a security camera, if the tracker has high precision but low recall, it rarely mistakes background for people but misses some people. If it has high recall but low precision, it finds almost all people but sometimes thinks shadows are people.
Balancing precision and recall depends on the use case. For safety, high recall is often more important to avoid missing objects.
Good tracking:
- MOTA close to 1.0 (near 100%) means very few errors.
- MOTP low error distance (small pixel difference) means precise location.
- High precision and recall (both above 0.8) means tracker finds most objects and rarely makes mistakes.
Bad tracking:
- MOTA below 0.5 means many missed or wrong tracks.
- MOTP high error distance means poor localization.
- Low precision means many false alarms; low recall means many missed objects.
- Ignoring ID switches: Tracker may confuse object identities, which hurts tracking quality but may not show in simple accuracy.
- Overfitting to training videos: Tracker works well on known scenes but fails in new environments.
- Data leakage: Using future frames or ground truth in training can inflate metrics unfairly.
- Accuracy paradox: High accuracy can be misleading if many frames have no objects (tracker predicts no object and is 'correct').
Your object tracker has 98% accuracy but only 12% recall on objects. Is it good for production? Why or why not?
Answer: No, it is not good. The high accuracy likely comes from many frames without objects, so predicting no object is often correct. But 12% recall means the tracker misses 88% of objects, which is poor for tracking. The tracker fails to find most objects, so it is unreliable.