Computer Visionml~20 mins

Hand and face landmark detection in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Hand and face landmark detection

Problem:Detect key points (landmarks) on hands and faces from images to enable gesture and expression recognition.

Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.25

Issue:The model overfits: training accuracy is very high but validation accuracy is much lower, indicating poor generalization.

Your Task

Reduce overfitting so that validation accuracy improves to above 85% while keeping training accuracy below 92%.

You can only modify the model architecture and training hyperparameters.

Do not change the dataset or input image size.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Computer Vision

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Data augmentation setup
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    validation_split=0.2
)

# Assuming train_dir contains hand and face images with landmarks labels
train_generator = train_datagen.flow_from_directory(
    'train_dir',
    target_size=(128, 128),
    batch_size=32,
    class_mode='categorical',
    subset='training'
)
validation_generator = train_datagen.flow_from_directory(
    'train_dir',
    target_size=(128, 128),
    batch_size=32,
    class_mode='categorical',
    subset='validation'
)

# Model architecture with dropout to reduce overfitting
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(128,128,3)),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.25),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(42, activation='linear')  # 21 landmarks * 2 coordinates
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='mean_squared_error',
              metrics=['mse'])

history = model.fit(
    train_generator,
    epochs=30,
    validation_data=validation_generator
)

Added dropout layers after convolution and dense layers to reduce overfitting.

Applied data augmentation to increase training data variety.

Reduced learning rate from default to 0.0005 for smoother training.

Kept model complexity moderate with two convolutional layers and one dense layer.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, Training loss 0.05, Validation loss 0.25

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.12, Validation loss 0.15

Adding dropout and data augmentation helps reduce overfitting by making the model less confident on training data and more generalizable to new data.

Bonus Experiment

Try using a pretrained model like MobileNetV2 as a feature extractor for landmark detection to improve accuracy further.

💡 Hint

Freeze the pretrained layers and add custom dense layers on top for landmark regression.

Practice

(1/5)

1. What is the main purpose of hand and face landmark detection in computer vision?

easy

A. To compress video files

B. To increase image resolution

C. To change the color of images

D. To find key points on hands and faces in images or videos

Hand and face landmark detection in Computer Vision - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the goal of landmark detection

Step 2: Compare options with the goal

Final Answer:

Quick Check:

Solution

Step 1: Recall MediaPipe import syntax

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Understand the code flow

Step 2: Interpret the output

Final Answer:

Quick Check:

Solution

Step 1: Check input image format for MediaPipe FaceMesh

Step 2: Understand error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand challenges in gesture recognition

Step 2: Choose best method to improve robustness

Final Answer:

Quick Check: