0
0
Computer Visionml~12 mins

Image as numerical data (pixels, channels) in Computer Vision - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Image as numerical data (pixels, channels)

This pipeline shows how an image is turned into numbers that a computer can understand. It breaks down the image into pixels and color channels, then uses these numbers to train a model that learns to recognize patterns.

Data Flow - 4 Stages
1Input Image
1 image of 28x28 pixels with 3 color channelsRaw image loaded as height x width x channels28 rows x 28 columns x 3 channels
A photo of a red apple represented as a 28x28 grid with RGB values
2Normalization
28 rows x 28 columns x 3 channelsScale pixel values from 0-255 to 0-128 rows x 28 columns x 3 channels
Pixel value 255 becomes 1.0, pixel value 0 stays 0.0
3Flattening
28 rows x 28 columns x 3 channelsConvert 3D image data into 1D array for model input2352 features (28*28*3)
The 3D pixel grid becomes a long list of 2352 numbers
4Model Training
2352 featuresTrain a simple neural network to classify imagesOutput probabilities for each class
Model predicts 0.8 probability for 'apple', 0.2 for 'banana'
Training Trace - Epoch by Epoch

Epoch 1: ************ (loss=1.2)
Epoch 2: *********    (loss=0.9)
Epoch 3: *******      (loss=0.7)
Epoch 4: *****        (loss=0.5)
Epoch 5: ****         (loss=0.4)
EpochLoss ↓Accuracy ↑Observation
11.20.45Model starts learning, accuracy is low
20.90.6Loss decreases, accuracy improves
30.70.72Model is learning important features
40.50.82Good improvement, model is converging
50.40.88Loss low, accuracy high, training successful
Prediction Trace - 4 Layers
Layer 1: Input Image
Layer 2: Normalization
Layer 3: Flattening
Layer 4: Neural Network Prediction
Model Quiz - 3 Questions
Test your understanding
What does normalization do to the image pixel values?
AChanges image size from 28x28 to 14x14
BScales pixel values from 0-255 to 0-1
CConverts color image to black and white
DFlattens the image into a 1D array
Key Insight
Images are made of pixels arranged in height, width, and color channels. Turning these pixels into numbers lets a model learn patterns. Normalizing and flattening prepare the data so the model can understand it and make predictions.