Computer Visionml~12 mins

Face detection with deep learning in Computer Vision - Model Pipeline Trace

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Model Pipeline - Face detection with deep learning

This pipeline detects faces in images using a deep learning model. It takes an image, processes it to find face features, trains a model to recognize faces, and then predicts face locations in new images.

Data Flow - 6 Stages

1Input Image

1 image x 640 x 480 x 3 channels→Load and resize image to fixed size→1 image x 224 x 224 x 3 channels

A photo of a person resized to 224x224 pixels with RGB colors

↓

2Preprocessing

1 image x 224 x 224 x 3 channels→Normalize pixel values to range 0-1→1 image x 224 x 224 x 3 channels

Pixel values changed from 0-255 to 0.0-1.0

↓

3Feature Extraction

1 image x 224 x 224 x 3 channels→Pass image through convolutional layers to extract features→1 image x 14 x 14 x 256 feature maps

Edges and shapes detected in the image

↓

4Face Region Proposal

1 image x 14 x 14 x 256 feature maps→Generate candidate face boxes using region proposal network→1 image x 300 candidate boxes x 4 coordinates

300 boxes with coordinates like [x_min, y_min, x_max, y_max]

↓

5Classification and Bounding Box Regression

1 image x 300 candidate boxes x features→Classify each box as face or background and refine box coordinates→1 image x 300 boxes with class scores and refined coordinates

Box 1: face score 0.95, coordinates refined to better fit face

↓

6Non-Maximum Suppression (NMS)

1 image x 300 boxes with scores→Remove overlapping boxes to keep best face detections→1 image x 5 final face boxes

5 boxes left with highest confidence and no large overlap

Training Trace - Epoch by Epoch


Epochs
1 |************
2 |**************
3 |****************
4 |********************
5 |**********************
Loss
1.2 0.9 0.7 0.5 0.4
Accuracy
0.60 0.72 0.80 0.87 0.91

Epoch	Loss ↓	Accuracy ↑	Observation
1	1.2	0.60	Model starts learning basic face features
2	0.9	0.72	Loss decreases as model improves face detection
3	0.7	0.80	Model learns better bounding box predictions
4	0.5	0.87	Face classification accuracy improves
5	0.4	0.91	Model converges with high accuracy and low loss

Prediction Trace - 6 Layers

Layer 1: Input Image

Layer 2: Preprocessing

Layer 3: Feature Extraction (Conv Layers)

Layer 4: Region Proposal Network

Layer 5: Classification and Box Refinement

Layer 6: Non-Maximum Suppression

Model Quiz - 3 Questions

Test your understanding

What is the purpose of the Non-Maximum Suppression step?

ATo resize the input image to a fixed size

BTo normalize pixel values between 0 and 1

CTo remove overlapping boxes and keep the best face detections

DTo extract features from the image using convolution

Key Insight

This visualization shows how a deep learning model learns to detect faces by extracting features, proposing candidate face regions, and refining predictions. Training improves accuracy while reducing loss, and post-processing like Non-Maximum Suppression helps produce clear final face detections.