For object detection models like those in torchvision, the key metric is Mean Average Precision (mAP). This metric measures how well the model finds and correctly labels objects in images. It balances both precision (how many detected objects are correct) and recall (how many real objects are found). Since detection involves locating objects and classifying them, mAP gives a clear picture of overall performance.
Other useful metrics include Precision and Recall at different Intersection over Union (IoU) thresholds, which show how strict the model is about matching predicted boxes to real boxes.