0
0
Computer Visionml~8 mins

3D object detection in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - 3D object detection
Which metric matters for 3D object detection and WHY

In 3D object detection, we want to find objects in space accurately. The key metrics are Average Precision (AP) and Intersection over Union (IoU). AP tells us how well the model finds objects without many mistakes. IoU measures how much the predicted 3D box overlaps with the true box. A higher IoU means better localization. We also look at Recall to see if the model finds most objects, and Precision to check if the found objects are correct. These metrics help us understand both detection accuracy and location quality.

Confusion matrix for 3D object detection

3D object detection is more complex than simple classification, but we can think of detections as:

      +----------------+----------------+
      |                | Predicted Box  |
      |                | Present | None |
      +----------------+---------+------+
      | True Box       | TP      | FN   |
      | Present        |         |      |
      +----------------+---------+------+
      | True Box       | FP      | TN   |
      | Absent         |         |      |
      +----------------+---------+------+
    

Here, TP means the model correctly found a 3D box matching a real object with enough overlap (IoU above threshold). FP means the model found a box where no object exists. FN means the model missed a real object. TN is less common in detection but means correctly not detecting where no object is.

Precision vs Recall tradeoff with examples

Imagine a self-driving car detecting pedestrians in 3D space:

  • High Precision, Low Recall: The car only signals pedestrians when very sure. Few false alarms, but it might miss some pedestrians. This is safer for avoiding false stops but risky if it misses people.
  • High Recall, Low Precision: The car signals many possible pedestrians, catching almost all real ones but also many false alarms. This avoids missing anyone but may cause unnecessary stops.

We want a balance depending on the use case. For safety, high recall is often more important to avoid missing objects.

What "good" vs "bad" metric values look like for 3D object detection

Good 3D detection models typically have:

  • Average Precision (AP): Above 70% is good; below 50% is poor.
  • IoU Threshold: Usually 0.5 or 0.7; higher means stricter matching.
  • Recall: Above 80% means most objects are found; below 50% means many misses.
  • Precision: Above 80% means few false detections; below 50% means many false alarms.

Bad models might have low AP, low recall (missing objects), or low precision (many false boxes). Good models balance these well.

Common pitfalls in 3D object detection metrics
  • Ignoring IoU thresholds: Reporting AP without a clear IoU cutoff can mislead about localization quality.
  • Data leakage: Using test data in training inflates metrics falsely.
  • Overfitting: Very high training AP but low test AP means the model memorizes training data, not generalizing.
  • Class imbalance: Many background points but few objects can make accuracy look high but detection poor.
  • Confusing precision and recall: Precision is about false alarms, recall about missed objects. Mixing them leads to wrong conclusions.
Self-check question

Your 3D object detection model has 98% accuracy but only 12% recall on detecting pedestrians. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy is misleading because most points are background (no pedestrians), so the model guesses "no pedestrian" most times correctly. But 12% recall means it misses 88% of pedestrians, which is dangerous for safety. High recall is critical to detect almost all pedestrians.

Key Result
Average Precision with IoU threshold and recall are key to evaluate 3D object detection quality.