0
0
Computer Visionml~12 mins

Why detection localizes objects in images in Computer Vision - Model Pipeline Impact

Choose your learning style9 modes available
Model Pipeline - Why detection localizes objects in images

This pipeline shows how an object detection model finds and draws boxes around objects in images. It learns to spot where objects are and what they are by looking at many labeled pictures.

Data Flow - 6 Stages
1Input Image
1 image x 640 x 640 x 3 channelsLoad and resize image to fixed size1 image x 640 x 640 x 3 channels
A photo of a dog and a cat resized to 640x640 pixels
2Preprocessing
1 image x 640 x 640 x 3 channelsNormalize pixel values to 0-1 range1 image x 640 x 640 x 3 channels
Pixel values changed from 0-255 to 0.0-1.0
3Feature Extraction
1 image x 640 x 640 x 3 channelsApply convolutional layers to find patterns1 image x 80 x 80 x 256 features
Edges and textures detected in smaller feature maps
4Region Proposal
1 image x 80 x 80 x 256 featuresSuggest possible object locations (bounding boxes)1 image x 1000 proposals x 4 coordinates
Boxes like [x1, y1, x2, y2] around possible objects
5Classification and Localization
1 image x 1000 proposals x 4 coordinatesPredict object class and refine box coordinates1 image x N detected objects x (class + box coords)
Detected 'dog' with box [50, 60, 200, 220]
6Non-Maximum Suppression
1 image x N detected objects x (class + box coords)Remove overlapping boxes to keep best ones1 image x M final objects x (class + box coords)
Final boxes around dog and cat without overlap
Training Trace - Epoch by Epoch

Loss
2.5 |*       
2.0 | *      
1.5 |  *     
1.0 |   *    
0.5 |    *   
0.0 |     *  
     ----------------
      1 5 10 15 20 Epochs
EpochLoss ↓Accuracy ↑Observation
12.50.30Model starts learning, loss high, accuracy low
51.20.55Loss decreases, model better at finding objects
100.80.70Model improves localization and classification
150.50.82Good balance of accuracy and low loss
200.40.88Model converges with high accuracy
Prediction Trace - 5 Layers
Layer 1: Input Image
Layer 2: Feature Extraction
Layer 3: Region Proposal
Layer 4: Classification and Localization
Layer 5: Non-Maximum Suppression
Model Quiz - 3 Questions
Test your understanding
Why does the model resize images before detection?
ATo remove colors from the image
BTo increase the number of objects
CTo make all images the same size for the model
DTo change the image format to text
Key Insight
Object detection models learn to find and locate objects by first extracting features, then proposing many possible boxes, and finally selecting the best boxes with class labels. This step-by-step process helps the model localize objects accurately in images.