In computer vision, the choice of metric depends on the task. For image classification, accuracy is common because it shows how often the model guesses right. For object detection, precision and recall matter more because we want to find all objects (high recall) but avoid false alarms (high precision). For segmentation, metrics like Intersection over Union (IoU) measure how well the predicted area matches the real object. Choosing the right metric helps us know if the model is truly good at its job.
What computer vision encompasses - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - What computer vision encompasses
Which metric matters for this concept and WHY
Confusion matrix or equivalent visualization (ASCII)
For image classification (e.g., cat vs dog):
Predicted
Cat Dog
Actual Cat 50 5
Dog 3 42
TP (Cat) = 50, FP (Cat) = 3, FN (Cat) = 5, TN (Cat) = 42
This matrix helps calculate precision and recall for each class.
Precision vs Recall tradeoff with concrete examples
Imagine a security camera detecting people entering a store:
- High precision: The camera rarely mistakes objects for people. Few false alarms. Good if you want to avoid bothering staff with false alerts.
- High recall: The camera catches almost every person, even if some false alarms happen. Good if missing a person is costly, like for safety monitoring.
Balancing precision and recall depends on what matters more: avoiding false alarms or missing real detections.
What "good" vs "bad" metric values look like for this use case
For a face recognition system:
- Good: Accuracy above 95%, precision and recall above 90%. The system correctly identifies faces with few mistakes.
- Bad: Accuracy around 60%, precision or recall below 50%. The system often misses faces or wrongly identifies people.
Good metrics mean the system is reliable and useful in real life.
Metrics pitfalls (accuracy paradox, data leakage, overfitting indicators)
- Accuracy paradox: In unbalanced data (e.g., 99% background, 1% object), a model guessing only background gets high accuracy but is useless.
- Data leakage: If test images are too similar to training images, metrics look better but model won't work well on new data.
- Overfitting: Very high training accuracy but low test accuracy means the model memorizes training images, not learning general patterns.
Self-check: Your model has 98% accuracy but 12% recall on detecting rare objects. Is it good?
No, it is not good. The high accuracy likely comes from many images without the rare object. The very low recall means the model misses most of the rare objects, which defeats the purpose of detection. You need to improve recall to catch more rare objects.
Key Result
In computer vision, choosing metrics like accuracy, precision, recall, or IoU depends on the task to properly evaluate model performance.
Practice
1. What is the main goal of computer vision?
easy
Solution
Step 1: Understand the purpose of computer vision
Computer vision is about making computers see and understand visual data like images and videos.Step 2: Compare options with this purpose
Only To help computers understand images and videos matches this goal; others are unrelated to computer vision.Final Answer:
To help computers understand images and videos -> Option AQuick Check:
Computer vision = understanding images/videos [OK]
Hint: Remember: computer vision means 'computer sees' [OK]
Common Mistakes:
- Confusing computer vision with programming speed
- Thinking it's about internet or games
2. Which of these is a common task in computer vision?
easy
Solution
Step 1: Identify tasks related to computer vision
Computer vision tasks include recognizing objects, faces, and reading text from images or videos.Step 2: Match options to these tasks
Only Recognizing objects in images fits as it involves recognizing objects in images.Final Answer:
Recognizing objects in images -> Option DQuick Check:
Object recognition = computer vision task [OK]
Hint: Think about what computers 'see' in pictures [OK]
Common Mistakes:
- Choosing unrelated tasks like compiling or emailing
- Confusing computer vision with other computer tasks
3. Given this code snippet, what will it print?
import cv2
image = cv2.imread('cat.jpg')
print(type(image))medium
Solution
Step 1: Understand cv2.imread output
cv2.imread reads an image file and returns a numpy array representing the image pixels.Step 2: Check the type printed
Printing type(image) will show <class 'numpy.ndarray'> if the image loads correctly.Final Answer:
<class 'numpy.ndarray'> -> Option AQuick Check:
cv2.imread returns numpy array [OK]
Hint: cv2.imread returns image as numpy array [OK]
Common Mistakes:
- Thinking it returns NoneType if file exists
- Confusing with string type
- Assuming cv2 is missing
4. This code tries to detect faces. What is wrong?
import cv2
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface.xml')
image = cv2.imread('people.jpg')
faces = face_cascade.detectMultiScale(image)
print(len(faces))medium
Solution
Step 1: Check input type for detectMultiScale
detectMultiScale requires a grayscale image, but the code passes a color image.Step 2: Identify the fix
Convert image to grayscale using cv2.cvtColor before detection.Final Answer:
detectMultiScale needs a grayscale image -> Option CQuick Check:
Face detection needs grayscale input [OK]
Hint: Face detection works on grayscale images only [OK]
Common Mistakes:
- Wrong cascade filename
- Using wrong cv2 function name
- Incorrect print syntax
5. You want to build a system that reads text from photos of street signs. Which computer vision task should you use?
hard
Solution
Step 1: Understand the task requirement
Reading text from images means extracting characters and words from pictures.Step 2: Match task to computer vision methods
OCR is the process of recognizing text in images, perfect for reading street signs.Final Answer:
Optical character recognition (OCR) -> Option BQuick Check:
Text reading = OCR task [OK]
Hint: Text in images? Use OCR technology [OK]
Common Mistakes:
- Choosing object detection for text
- Confusing classification with text reading
- Using segmentation which separates regions
