Computer Visionml~8 mins

Object tracking basics in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Object tracking basics

Which metric matters for Object tracking basics and WHY

In object tracking, we want to know how well the model follows the object over time. Key metrics are Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP). MOTA measures how many mistakes the tracker makes, like missing objects or wrong matches. MOTP measures how close the predicted object positions are to the true positions. These metrics help us understand both the correctness and the precision of tracking.

Confusion matrix or equivalent visualization

    Tracking Confusion Matrix (simplified):

          | Predicted Object Present | Predicted Object Absent |
    ------|--------------------------|-------------------------|
    True  | True Positive (TP)        | False Negative (FN)      |
    False | False Positive (FP)       | True Negative (TN)       |

    TP: Correctly tracked objects
    FP: Tracker reports object but none exists
    FN: Tracker misses an object

    Total objects = TP + FN
    Total predictions = TP + FP

Precision vs Recall tradeoff with concrete examples

Precision tells us how many tracked objects are actually correct. High precision means few false alarms.

Recall tells us how many true objects were found by the tracker. High recall means few missed objects.

Example: In a security camera, if the tracker has high precision but low recall, it rarely mistakes background for people but misses some people. If it has high recall but low precision, it finds almost all people but sometimes thinks shadows are people.

Balancing precision and recall depends on the use case. For safety, high recall is often more important to avoid missing objects.

What "good" vs "bad" metric values look like for object tracking

Good tracking:

MOTA close to 1.0 (near 100%) means very few errors.
MOTP low error distance (small pixel difference) means precise location.
High precision and recall (both above 0.8) means tracker finds most objects and rarely makes mistakes.

Bad tracking:

MOTA below 0.5 means many missed or wrong tracks.
MOTP high error distance means poor localization.
Low precision means many false alarms; low recall means many missed objects.

Metrics pitfalls in object tracking

Ignoring ID switches: Tracker may confuse object identities, which hurts tracking quality but may not show in simple accuracy.
Overfitting to training videos: Tracker works well on known scenes but fails in new environments.
Data leakage: Using future frames or ground truth in training can inflate metrics unfairly.
Accuracy paradox: High accuracy can be misleading if many frames have no objects (tracker predicts no object and is 'correct').

Self-check question

Your object tracker has 98% accuracy but only 12% recall on objects. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy likely comes from many frames without objects, so predicting no object is often correct. But 12% recall means the tracker misses 88% of objects, which is poor for tracking. The tracker fails to find most objects, so it is unreliable.

Key Result

MOTA and MOTP are key metrics showing tracking correctness and precision; balance precision and recall to avoid missing or falsely detecting objects.