When we turn images into numbers (pixels and channels), we want to check how well our model understands these numbers to recognize or classify images. The key metrics are accuracy for simple tasks, and precision, recall, and F1 score when classes are uneven or mistakes have different costs. These metrics tell us if the model correctly identifies images or confuses them.
Image as numerical data (pixels, channels) in Computer Vision - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Image as numerical data (pixels, channels)
Which metric matters for this concept and WHY
Confusion matrix or equivalent visualization (ASCII)
Confusion Matrix Example for 3 classes (Cat, Dog, Bird):
Predicted
Cat Dog Bird
True Cat 50 2 3
Dog 4 45 1
Bird 2 3 40
Total samples = 150
Here, diagonal numbers (50, 45, 40) are correct predictions.
Off-diagonal numbers are mistakes.
Precision vs Recall tradeoff with concrete examples
Imagine a model that finds cats in photos:
- High precision: When it says "cat," it is almost always right. Good if you want to avoid false alarms, like tagging a dog as a cat.
- High recall: It finds almost all cats, even if some mistakes happen. Good if missing a cat is worse, like in wildlife monitoring.
Choosing precision or recall depends on what mistakes cost more in your task.
What "good" vs "bad" metric values look like for this use case
For image data tasks:
- Good: Accuracy above 90%, precision and recall above 85% for each class means the model understands pixel data well.
- Bad: Accuracy below 60%, or very low precision/recall (below 50%) means the model struggles to interpret pixel values correctly.
Metrics pitfalls (accuracy paradox, data leakage, overfitting indicators)
- Accuracy paradox: If one class is very common, a model guessing that class always can have high accuracy but poor real performance.
- Data leakage: If test images are too similar to training images, metrics look better but model won't generalize.
- Overfitting: Very high training accuracy but low test accuracy means the model memorizes pixels instead of learning patterns.
Self-check question
Your image classifier has 98% accuracy but only 12% recall on the rare "bird" class. Is it good for production? Why not?
Answer: No, because it misses most birds. High accuracy is misleading if the model ignores rare classes. You need better recall to catch birds reliably.
Key Result
Accuracy alone can be misleading; precision and recall reveal how well the model interprets pixel data for each class.
Practice
1. What does each pixel in a color image usually represent?
easy
Solution
Step 1: Understand pixel representation in color images
Each pixel stores values for red, green, and blue channels to show color.Step 2: Compare options to pixel data
Only A set of numbers for red, green, and blue colors correctly describes pixels as sets of RGB numbers.Final Answer:
A set of numbers for red, green, and blue colors -> Option DQuick Check:
Pixel = RGB values [OK]
Hint: Pixels hold RGB numbers, not text or sound [OK]
Common Mistakes:
- Thinking pixels store text labels
- Confusing pixel with brightness only
- Assuming pixels represent sound
2. Which Python code correctly creates a 3x3 image with 3 color channels filled with zeros?
easy
Solution
Step 1: Recall numpy zeros syntax
np.zeros requires a single tuple argument for shape, like (3, 3, 3).Step 2: Check each option's syntax
image = np.zeros((3, 3, 3)) uses correct tuple and function call syntax. Others have syntax errors or missing np.Final Answer:
image = np.zeros((3, 3, 3)) -> Option AQuick Check:
np.zeros((3,3,3)) creates 3x3 RGB image [OK]
Hint: Use np.zeros with shape tuple inside parentheses [OK]
Common Mistakes:
- Passing multiple arguments instead of a tuple
- Using square brackets instead of parentheses
- Forgetting np. prefix
3. Given this code:
What is the output?
import numpy as np
image = np.array([[[255, 0, 0], [0, 255, 0]],
[[0, 0, 255], [255, 255, 0]]])
print(image.shape)What is the output?
medium
Solution
Step 1: Analyze the array structure
The array has 2 rows, each with 2 pixels, each pixel has 3 color values (RGB).Step 2: Determine shape order
Shape is (height=2, width=2, channels=3), so (2, 2, 3).Final Answer:
(2, 2, 3) -> Option CQuick Check:
Shape = (rows, cols, channels) = (2, 2, 3) [OK]
Hint: Shape is (height, width, channels) in that order [OK]
Common Mistakes:
- Mixing up dimensions order
- Counting channels as first dimension
- Assuming square shape without checking
4. What is wrong with this code snippet for accessing the green channel of an image?
green_channel = image[:, :, 1:2]
medium
Solution
Step 1: Understand slicing with 1:2
Slicing with 1:2 keeps the channel dimension, returning shape (height, width, 1).Step 2: Compare with expected 2D array
To get a 2D array, use index 1 without slice, like image[:, :, 1].Final Answer:
It returns a 3D array instead of 2D -> Option AQuick Check:
Slicing with 1:2 keeps channel dim [OK]
Hint: Use single index, not slice, for 2D channel array [OK]
Common Mistakes:
- Using slice returns extra dimension
- Confusing channel indices
- Assuming it changes original image
5. You have a grayscale image stored as a 2D array with shape (100, 100). You want to convert it to a 3-channel RGB image by repeating the grayscale values across all channels. Which code correctly does this?
hard
Solution
Step 1: Understand the goal
We want to create a 3D array where each pixel's grayscale value repeats in 3 channels.Step 2: Check each method
rgb_image = np.stack([gray_image]*3, axis=2) stacks the grayscale image 3 times along new channel axis correctly. rgb_image = np.repeat(gray_image, 3) repeats flattening data, wrong shape. rgb_image = gray_image.reshape(100, 100, 3) reshapes without adding channels, causing error. rgb_image = np.concatenate(gray_image, 3) has wrong syntax.Final Answer:
rgb_image = np.stack([gray_image]*3, axis=2) -> Option BQuick Check:
Stack repeats grayscale across channels [OK]
Hint: Use np.stack with axis=2 to add channels [OK]
Common Mistakes:
- Using np.repeat without axis
- Reshaping without adding channel dimension
- Wrong function syntax for concatenation
