In computer vision, the right metric depends on the task. For image classification, accuracy shows how many images were correctly labeled. For object detection, precision and recall matter because you want to find all objects (high recall) but avoid false alarms (high precision). For segmentation, metrics like IoU (Intersection over Union) measure how well predicted shapes match real shapes. Choosing the right metric helps you understand if your model solves the problem well.
Model evaluation best practices in Computer Vision - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Confusion Matrix for Image Classification (3 classes):
Predicted
Cat Dog Bird
Actual
Cat 50 2 3
Dog 4 45 1
Bird 2 3 40
Total samples = 150
From this:
- Accuracy = (50+45+40)/150 = 135/150 = 0.9 (90%)
- Precision for Cat = TP/(TP+FP) = 50/(50+4+2) = 50/56 ≈ 0.89
- Recall for Cat = TP/(TP+FN) = 50/(50+2+3) = 50/55 ≈ 0.91
Imagine a face recognition system unlocking your phone. You want high precision so it doesn't unlock for strangers (few false positives). But if it misses your face sometimes, that is okay (lower recall). On the other hand, a security camera detecting intruders needs high recall to catch all threats, even if it sometimes raises false alarms (lower precision). Understanding this tradeoff helps pick the right balance for your use case.
Good metrics depend on the task. For example, in medical image diagnosis, a recall above 0.95 is good because missing a disease is dangerous. In a photo tagging app, an accuracy above 0.85 is good for user satisfaction. Bad metrics are low precision or recall, like precision below 0.5 means many wrong detections, or recall below 0.5 means many misses. Always compare metrics to your problem's needs.
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, 95% accuracy on 95% background images means the model ignores rare objects.
- Data leakage: Using test images in training inflates metrics falsely.
- Overfitting: Very high training accuracy but low test accuracy means the model memorizes training images, not generalizing well.
- Ignoring metric context: Using only accuracy when recall matters can hide poor performance.
Your object detection model has 98% accuracy but only 12% recall on detecting rare objects. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the model misses most rare objects, which could be critical. High accuracy is misleading here because most images do not have the rare object, so the model is mostly correct by saying "no object". Improving recall is essential.
Practice
Solution
Step 1: Understand the purpose of a test set
The test set is data the model has never seen before, used to check real-world performance.Step 2: Compare test set role with other options
Options B, C, and D do not relate to evaluation but to training or model design.Final Answer:
To check how well the model performs on new, unseen data -> Option AQuick Check:
Test set = unseen data check [OK]
- Confusing test set with training set
- Thinking test set speeds up training
- Believing test set changes model size
Solution
Step 1: Recall the correct function name in scikit-learn
The function to split data is calledtrain_test_splitwith parameters liketest_sizeandrandom_state.Step 2: Check the options for correct syntax
Only train_test_split(data, test_size=0.2, random_state=42) uses the correct function name and parameters; others are invalid or do not exist.Final Answer:
train_test_split(data, test_size=0.2, random_state=42) -> Option BQuick Check:
Correct function = train_test_split [OK]
- Using wrong function names
- Missing required parameters
- Confusing order of train and test
from sklearn.metrics import accuracy_score
true_labels = [1, 0, 1, 1, 0]
pred_labels = [1, 0, 0, 1, 0]
accuracy = accuracy_score(true_labels, pred_labels)
print("{:.2f}".format(round(accuracy, 2)))Solution
Step 1: Compare true and predicted labels
True: [1, 0, 1, 1, 0], Predicted: [1, 0, 0, 1, 0]. Matches at positions 0,1,3,4 (4 correct out of 5).Step 2: Calculate accuracy
Accuracy = correct predictions / total = 4/5 = 0.8. Rounded to 2 decimals is 0.80.Final Answer:
0.80 -> Option CQuick Check:
Accuracy = 4/5 = 0.80 [OK]
- Counting wrong matches
- Not rounding accuracy
- Confusing accuracy with precision
Solution
Step 1: Understand unusual accuracy pattern
Test accuracy higher than training is unusual and often means test data was seen during training.Step 2: Identify cause from options
Data leakage means test data accidentally used in training, causing inflated test accuracy.Final Answer:
Data leakage between training and test sets -> Option AQuick Check:
High test accuracy > training = data leakage [OK]
- Assuming underfitting causes higher test accuracy
- Ignoring data leakage possibility
- Blaming test set size without evidence
Solution
Step 1: Understand the problem of rare object detection
Rare objects mean data is imbalanced; many negatives, few positives.Step 2: Choose metric suitable for imbalanced data
F1 score balances precision (correct positive predictions) and recall (finding all positives), ideal for rare classes.Final Answer:
F1 score, because it balances precision and recall for imbalanced data -> Option DQuick Check:
Rare class? Use F1 score [OK]
- Using accuracy which hides imbalance
- Confusing regression metrics with classification
- Misunderstanding confusion matrix purpose
