Bird
Raised Fist0
Computer Visionml~12 mins

Image as numerical data (pixels, channels) in Computer Vision - Model Pipeline Trace

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Model Pipeline - Image as numerical data (pixels, channels)

This pipeline shows how an image is turned into numbers that a computer can understand. It breaks down the image into pixels and color channels, then uses these numbers to train a model that learns to recognize patterns.

Data Flow - 4 Stages
1Input Image
1 image of 28x28 pixels with 3 color channelsRaw image loaded as height x width x channels28 rows x 28 columns x 3 channels
A photo of a red apple represented as a 28x28 grid with RGB values
2Normalization
28 rows x 28 columns x 3 channelsScale pixel values from 0-255 to 0-128 rows x 28 columns x 3 channels
Pixel value 255 becomes 1.0, pixel value 0 stays 0.0
3Flattening
28 rows x 28 columns x 3 channelsConvert 3D image data into 1D array for model input2352 features (28*28*3)
The 3D pixel grid becomes a long list of 2352 numbers
4Model Training
2352 featuresTrain a simple neural network to classify imagesOutput probabilities for each class
Model predicts 0.8 probability for 'apple', 0.2 for 'banana'
Training Trace - Epoch by Epoch

Epoch 1: ************ (loss=1.2)
Epoch 2: *********    (loss=0.9)
Epoch 3: *******      (loss=0.7)
Epoch 4: *****        (loss=0.5)
Epoch 5: ****         (loss=0.4)
EpochLoss ↓Accuracy ↑Observation
11.20.45Model starts learning, accuracy is low
20.90.6Loss decreases, accuracy improves
30.70.72Model is learning important features
40.50.82Good improvement, model is converging
50.40.88Loss low, accuracy high, training successful
Prediction Trace - 4 Layers
Layer 1: Input Image
Layer 2: Normalization
Layer 3: Flattening
Layer 4: Neural Network Prediction
Model Quiz - 3 Questions
Test your understanding
What does normalization do to the image pixel values?
AChanges image size from 28x28 to 14x14
BScales pixel values from 0-255 to 0-1
CConverts color image to black and white
DFlattens the image into a 1D array
Key Insight
Images are made of pixels arranged in height, width, and color channels. Turning these pixels into numbers lets a model learn patterns. Normalizing and flattening prepare the data so the model can understand it and make predictions.

Practice

(1/5)
1. What does each pixel in a color image usually represent?
easy
A. A single number representing brightness only
B. A sound wave frequency
C. A text label describing the image
D. A set of numbers for red, green, and blue colors

Solution

  1. Step 1: Understand pixel representation in color images

    Each pixel stores values for red, green, and blue channels to show color.
  2. Step 2: Compare options to pixel data

    Only A set of numbers for red, green, and blue colors correctly describes pixels as sets of RGB numbers.
  3. Final Answer:

    A set of numbers for red, green, and blue colors -> Option D
  4. Quick Check:

    Pixel = RGB values [OK]
Hint: Pixels hold RGB numbers, not text or sound [OK]
Common Mistakes:
  • Thinking pixels store text labels
  • Confusing pixel with brightness only
  • Assuming pixels represent sound
2. Which Python code correctly creates a 3x3 image with 3 color channels filled with zeros?
easy
A. image = np.zeros((3, 3, 3))
B. image = np.zeros(3, 3, 3)
C. image = np.zeros[3, 3, 3]
D. image = zeros((3, 3, 3))

Solution

  1. Step 1: Recall numpy zeros syntax

    np.zeros requires a single tuple argument for shape, like (3, 3, 3).
  2. Step 2: Check each option's syntax

    image = np.zeros((3, 3, 3)) uses correct tuple and function call syntax. Others have syntax errors or missing np.
  3. Final Answer:

    image = np.zeros((3, 3, 3)) -> Option A
  4. Quick Check:

    np.zeros((3,3,3)) creates 3x3 RGB image [OK]
Hint: Use np.zeros with shape tuple inside parentheses [OK]
Common Mistakes:
  • Passing multiple arguments instead of a tuple
  • Using square brackets instead of parentheses
  • Forgetting np. prefix
3. Given this code:
import numpy as np
image = np.array([[[255, 0, 0], [0, 255, 0]],
                  [[0, 0, 255], [255, 255, 0]]])
print(image.shape)

What is the output?
medium
A. (2, 3, 2)
B. (3, 2, 2)
C. (2, 2, 3)
D. (3, 3, 3)

Solution

  1. Step 1: Analyze the array structure

    The array has 2 rows, each with 2 pixels, each pixel has 3 color values (RGB).
  2. Step 2: Determine shape order

    Shape is (height=2, width=2, channels=3), so (2, 2, 3).
  3. Final Answer:

    (2, 2, 3) -> Option C
  4. Quick Check:

    Shape = (rows, cols, channels) = (2, 2, 3) [OK]
Hint: Shape is (height, width, channels) in that order [OK]
Common Mistakes:
  • Mixing up dimensions order
  • Counting channels as first dimension
  • Assuming square shape without checking
4. What is wrong with this code snippet for accessing the green channel of an image?
green_channel = image[:, :, 1:2]
medium
A. It returns a 3D array instead of 2D
B. It causes an index error
C. It accesses the red channel instead
D. It modifies the original image

Solution

  1. Step 1: Understand slicing with 1:2

    Slicing with 1:2 keeps the channel dimension, returning shape (height, width, 1).
  2. Step 2: Compare with expected 2D array

    To get a 2D array, use index 1 without slice, like image[:, :, 1].
  3. Final Answer:

    It returns a 3D array instead of 2D -> Option A
  4. Quick Check:

    Slicing with 1:2 keeps channel dim [OK]
Hint: Use single index, not slice, for 2D channel array [OK]
Common Mistakes:
  • Using slice returns extra dimension
  • Confusing channel indices
  • Assuming it changes original image
5. You have a grayscale image stored as a 2D array with shape (100, 100). You want to convert it to a 3-channel RGB image by repeating the grayscale values across all channels. Which code correctly does this?
hard
A. rgb_image = np.repeat(gray_image, 3)
B. rgb_image = np.stack([gray_image]*3, axis=2)
C. rgb_image = gray_image.reshape(100, 100, 3)
D. rgb_image = np.concatenate(gray_image, 3)

Solution

  1. Step 1: Understand the goal

    We want to create a 3D array where each pixel's grayscale value repeats in 3 channels.
  2. Step 2: Check each method

    rgb_image = np.stack([gray_image]*3, axis=2) stacks the grayscale image 3 times along new channel axis correctly. rgb_image = np.repeat(gray_image, 3) repeats flattening data, wrong shape. rgb_image = gray_image.reshape(100, 100, 3) reshapes without adding channels, causing error. rgb_image = np.concatenate(gray_image, 3) has wrong syntax.
  3. Final Answer:

    rgb_image = np.stack([gray_image]*3, axis=2) -> Option B
  4. Quick Check:

    Stack repeats grayscale across channels [OK]
Hint: Use np.stack with axis=2 to add channels [OK]
Common Mistakes:
  • Using np.repeat without axis
  • Reshaping without adding channel dimension
  • Wrong function syntax for concatenation