Computer Visionml~12 mins

Face landmark detection in Computer Vision - Model Pipeline Trace

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Model Pipeline - Face landmark detection

This pipeline detects key points on a face, like eyes, nose, and mouth corners. It helps computers understand face shapes and expressions.

Data Flow - 4 Stages

1Input Image

1 image x 480 x 480 x 3 channels→Raw face image in color→1 image x 480 x 480 x 3 channels

A photo of a person's face with eyes, nose, mouth visible

↓

2Preprocessing

1 image x 480 x 480 x 3 channels→Resize image to 128x128 and normalize pixel values to 0-1→1 image x 128 x 128 x 3 channels

Resized and scaled face image ready for model input

↓

3Feature Extraction

1 image x 128 x 128 x 3 channels→Convolutional layers extract face features→1 tensor x 32 x 32 x 64 channels

Feature map highlighting edges and textures of face parts

↓

4Landmark Regression

1 tensor x 32 x 32 x 64 channels→Fully connected layers predict 68 (x, y) landmark coordinates→1 vector x 136 values (68 points x 2 coordinates)

Coordinates like (34, 45) for left eye corner, (60, 80) for nose tip

Training Trace - Epoch by Epoch


Loss
0.15 | *
0.12 |  *
0.09 |   *
0.06 |    *
0.03 |     *
      ----------------
       1 5 10 15 Epochs

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.15	0.60	Model starts learning basic face features
5	0.08	0.75	Landmark predictions improve, loss decreases
10	0.04	0.85	Model accurately detects most landmarks
15	0.03	0.88	Training converges with low error

Prediction Trace - 4 Layers

Layer 1: Input Image

Layer 2: Convolutional Layers

Layer 3: Fully Connected Layers

Layer 4: Output Coordinates

Model Quiz - 3 Questions

Test your understanding

What is the shape of the model output for face landmarks?

AVector with 136 values representing 68 (x,y) points

BImage of size 128x128 with 3 channels

CTensor of size 32x32x64

DSingle scalar value representing face score

Key Insight

Face landmark detection models learn to find key face points by extracting features from images and predicting coordinates. Training shows steady improvement as loss decreases and accuracy rises, meaning the model gets better at locating landmarks precisely.