Computer Visionml~12 mins

Why detection localizes objects in images in Computer Vision - Model Pipeline Impact

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Model Pipeline - Why detection localizes objects in images

This pipeline shows how an object detection model finds and draws boxes around objects in images. It learns to spot where objects are and what they are by looking at many labeled pictures.

Data Flow - 6 Stages

1Input Image

1 image x 640 x 640 x 3 channels→Load and resize image to fixed size→1 image x 640 x 640 x 3 channels

A photo of a dog and a cat resized to 640x640 pixels

↓

2Preprocessing

1 image x 640 x 640 x 3 channels→Normalize pixel values to 0-1 range→1 image x 640 x 640 x 3 channels

Pixel values changed from 0-255 to 0.0-1.0

↓

3Feature Extraction

1 image x 640 x 640 x 3 channels→Apply convolutional layers to find patterns→1 image x 80 x 80 x 256 features

Edges and textures detected in smaller feature maps

↓

4Region Proposal

1 image x 80 x 80 x 256 features→Suggest possible object locations (bounding boxes)→1 image x 1000 proposals x 4 coordinates

Boxes like [x1, y1, x2, y2] around possible objects

↓

5Classification and Localization

1 image x 1000 proposals x 4 coordinates→Predict object class and refine box coordinates→1 image x N detected objects x (class + box coords)

Detected 'dog' with box [50, 60, 200, 220]

↓

6Non-Maximum Suppression

1 image x N detected objects x (class + box coords)→Remove overlapping boxes to keep best ones→1 image x M final objects x (class + box coords)

Final boxes around dog and cat without overlap

Training Trace - Epoch by Epoch


Loss
2.5 |*       
2.0 | *      
1.5 |  *     
1.0 |   *    
0.5 |    *   
0.0 |     *  
     ----------------
      1 5 10 15 20 Epochs

Epoch	Loss ↓	Accuracy ↑	Observation
1	2.5	0.30	Model starts learning, loss high, accuracy low
5	1.2	0.55	Loss decreases, model better at finding objects
10	0.8	0.70	Model improves localization and classification
15	0.5	0.82	Good balance of accuracy and low loss
20	0.4	0.88	Model converges with high accuracy

Prediction Trace - 5 Layers

Layer 1: Input Image

Layer 2: Feature Extraction

Layer 3: Region Proposal

Layer 4: Classification and Localization

Layer 5: Non-Maximum Suppression

Model Quiz - 3 Questions

Test your understanding

Why does the model resize images before detection?

ATo remove colors from the image

BTo increase the number of objects

CTo make all images the same size for the model

DTo change the image format to text

Key Insight

Object detection models learn to find and locate objects by first extracting features, then proposing many possible boxes, and finally selecting the best boxes with class labels. This step-by-step process helps the model localize objects accurately in images.