In computer vision, common tasks include recognizing objects, detecting faces, or segmenting images. The key metrics to evaluate these tasks are accuracy, precision, recall, and F1 score. These metrics tell us how well the machine "sees" and understands images. For example, precision shows how many detected objects are actually correct, while recall shows how many real objects the machine found. We use these metrics because they help us measure if the machine is making good decisions when interpreting images.
Why computer vision teaches machines to see - Why Metrics Matter
Start learning this pattern below
Jump into concepts and practice - no test required
Confusion Matrix Example for Object Detection:
Predicted
Yes No
Actual
Yes TP=80 FN=20
No FP=10 TN=90
Total samples = 80 + 20 + 10 + 90 = 200
Precision = TP / (TP + FP) = 80 / (80 + 10) = 0.89
Recall = TP / (TP + FN) = 80 / (80 + 20) = 0.80
F1 Score = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84
Imagine a self-driving car that uses computer vision to detect pedestrians. Here, high recall is very important because missing a pedestrian (false negative) can cause accidents. So, the system should find almost all pedestrians, even if it sometimes mistakes other objects for people (lower precision).
On the other hand, a photo app that tags friends in pictures needs high precision. It should avoid tagging the wrong person (false positive) to keep users happy, even if it misses some friends (lower recall).
Balancing precision and recall depends on the goal. Computer vision models must be tuned to fit the real-life needs of their task.
Good metrics: Precision and recall above 0.85 usually mean the model sees well. For example, precision = 0.90 and recall = 0.88 means the model finds most objects and is mostly correct.
Bad metrics: Precision or recall below 0.50 means the model struggles. For example, precision = 0.40 means many false alarms, and recall = 0.45 means many objects are missed.
Accuracy alone can be misleading if the dataset is unbalanced (e.g., many images without objects). So, precision and recall give a clearer picture.
- Accuracy paradox: If most images have no objects, a model that always says "no object" can have high accuracy but is useless.
- Data leakage: If test images are too similar to training images, metrics look great but the model fails on new images.
- Overfitting: Very high training accuracy but low test accuracy means the model memorizes images instead of learning to see.
No, it is not good. The model finds only 12% of actual stop signs, which is very low recall. Even though accuracy is high, the model misses most stop signs, which is dangerous for real driving. High recall is critical here to avoid accidents.
Practice
Solution
Step 1: Understand the purpose of computer vision
Computer vision is about teaching machines to see and understand visual data like images and videos.Step 2: Identify the correct goal
The goal is not about speed, storage, or battery but about interpreting visual information.Final Answer:
To help machines understand and interpret images and videos -> Option BQuick Check:
Computer vision = understanding images/videos [OK]
- Confusing computer vision with hardware improvements
- Thinking it only stores data
- Mixing vision with battery or speed
Solution
Step 1: Recall how images are stored digitally
Images are stored as grids of pixels, each with color or brightness values, forming a matrix.Step 2: Match the correct representation
Only a matrix of pixel values correctly represents image data for machines.Final Answer:
A matrix of pixel values -> Option CQuick Check:
Image data = pixel matrix [OK]
- Choosing text descriptions instead of pixel data
- Thinking images are single numbers
- Confusing images with sounds
edges if the input image shape is (100, 100)?
import cv2
image = cv2.imread('photo.jpg', 0)
edges = cv2.Canny(image, 100, 200)
print(edges.shape)Solution
Step 1: Understand Canny edge detection output size
Canny edge detection returns an image of the same size as the input image.Step 2: Check input image shape
The input image shape is (100, 100), so the output edges will also have shape (100, 100).Final Answer:
(100, 100) -> Option DQuick Check:
Canny output shape = input shape [OK]
- Assuming edges shrink image size
- Thinking edges enlarge image
- Confusing shape with number of edges
import cv2
image = cv2.imread('photo.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('Gray Image', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()Solution
Step 1: Check image reading method
cv2.imread reads the image in color by default, which is fine for conversion.Step 2: Verify color conversion usage
cv2.cvtColor with cv2.COLOR_BGR2GRAY correctly converts color image to grayscale.Step 3: Confirm display functions
cv2.imshow, cv2.waitKey, and cv2.destroyAllWindows are used properly to show the image.Final Answer:
No error, code works correctly -> Option AQuick Check:
Correct grayscale conversion code [OK]
- Thinking cv2.imread needs grayscale flag always
- Misusing cv2.cvtColor parameters
- Forgetting to call cv2.waitKey
Solution
Step 1: Identify useful preprocessing steps for digit recognition
Converting to grayscale simplifies data, normalizing scales pixel values, and edge detection highlights important features.Step 2: Evaluate other options
Color conversion and noise addition can confuse the model; resizing too large or converting to text is not helpful; raw images may have noise and irrelevant info.Final Answer:
Convert images to grayscale, normalize pixel values, and detect edges -> Option AQuick Check:
Preprocessing = grayscale + normalize + edges [OK]
- Using color images unnecessarily
- Adding noise that confuses model
- Skipping normalization
- Ignoring edge detection benefits
