Computer Visionml~12 mins

Custom object detection dataset in Computer Vision - Model Pipeline Trace

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Model Pipeline - Custom object detection dataset

This pipeline shows how a custom object detection dataset is prepared, used to train a model, and then how the model predicts objects in new images.

Data Flow - 5 Stages

1Raw dataset

1000 images x varying sizes→Collect images and bounding box annotations with labels→1000 images with bounding boxes and labels

Image1: cat at (50,30,150,130), dog at (200,100,300,250)

↓

2Preprocessing

1000 images with bounding boxes→Resize images to 224x224, normalize pixel values, adjust bounding boxes accordingly→1000 images 224x224 x 3 channels with normalized pixel values and updated bounding boxes

Image1 resized to 224x224, cat box now (20,12,60,52)

↓

3Data augmentation

1000 images 224x224 with bounding boxes→Apply random flips and color jitter, update bounding boxes for flips→1000 augmented images 224x224 with updated bounding boxes

Image1 flipped horizontally, cat box coordinates adjusted accordingly

↓

4Train/test split

1000 augmented images with bounding boxes→Split dataset into 800 training and 200 testing images→800 training images, 200 testing images with bounding boxes

Training set: 800 images, Testing set: 200 images

↓

5Model input preparation

800 training images 224x224 with bounding boxes→Convert bounding boxes and labels into model-specific tensor format→800 training samples with image tensors and target tensors

Sample: image tensor shape (3,224,224), target dict with boxes tensor shape (N,4), labels tensor shape (N)

Training Trace - Epoch by Epoch

Loss
2.5 |*****
2.0 |**** 
1.5 |***  
1.0 |**   
0.5 |*    
0.0 +-----
      1 2 3 4 5 Epochs

Epoch	Loss ↓	Accuracy ↑	Observation
1	2.5	0.15	High loss and low accuracy as model starts learning
2	1.8	0.35	Loss decreases, accuracy improves as model learns object features
3	1.2	0.55	Model better detects objects, bounding box predictions improve
4	0.9	0.70	Loss continues to decrease, accuracy rises steadily
5	0.7	0.78	Model converging, good detection performance

Prediction Trace - 6 Layers

Layer 1: Input image preprocessing

Layer 2: Feature extraction (CNN layers)

Layer 3: Region proposal network

Layer 4: Bounding box regression and classification

Layer 5: Non-maximum suppression

Layer 6: Output prediction

Model Quiz - 3 Questions

Test your understanding

What happens to the bounding boxes during image resizing in preprocessing?

AThey are adjusted to match the new image size

BThey remain the same as original image

CThey are removed and recreated later

DThey are converted to grayscale

Key Insight

This visualization shows how preparing a custom object detection dataset carefully and training a model step-by-step leads to improved detection accuracy and reliable bounding box predictions.