Computer Visionml~20 mins

Action recognition basics in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Action recognition basics

Problem:We want to teach a computer to recognize simple actions like walking, running, and jumping from short video clips.

Current Metrics:Training accuracy: 95%, Validation accuracy: 65%

Issue:The model is overfitting. It performs very well on training data but poorly on new, unseen videos.

Your Task

Reduce overfitting so that validation accuracy improves to at least 80%, while keeping training accuracy below 90%.

You can only change the model architecture and training settings.

Do not add more data or change the dataset.

Keep the input video length and preprocessing the same.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Computer Vision

import tensorflow as tf
from tensorflow.keras import layers, models

# Define a simple 3D CNN model for action recognition
model = models.Sequential([
    layers.Conv3D(32, kernel_size=(3,3,3), activation='relu', input_shape=(16, 64, 64, 3)),
    layers.BatchNormalization(),
    layers.MaxPooling3D(pool_size=(1,2,2)),
    layers.Dropout(0.3),

    layers.Conv3D(64, kernel_size=(3,3,3), activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling3D(pool_size=(2,2,2)),
    layers.Dropout(0.4),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(3, activation='softmax')  # 3 classes: walking, running, jumping
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Assume X_train, y_train, X_val, y_val are preloaded video data and labels
# model.fit(X_train, y_train, epochs=30, batch_size=16, validation_data=(X_val, y_val))

Added BatchNormalization layers after Conv3D layers to stabilize and speed up training.

Added Dropout layers with rates 0.3, 0.4, and 0.5 to reduce overfitting.

Reduced learning rate from 0.001 to 0.0005 for smoother learning.

Kept model architecture simple with fewer layers to avoid complexity.

Results Interpretation

Before: Training accuracy was 95%, validation accuracy was 65%, showing strong overfitting.

After: Training accuracy dropped to 88%, validation accuracy improved to 82%, indicating better generalization.

Adding dropout and batch normalization, lowering learning rate, and simplifying the model helps reduce overfitting and improves the model's ability to recognize actions on new videos.

Bonus Experiment

Try using a pretrained 3D CNN model like I3D or C3D and fine-tune it on the same dataset to improve accuracy further.

💡 Hint

Use transfer learning by freezing early layers and training only the last few layers with a low learning rate.

Practice

(1/5)

1. What is the main goal of action recognition in computer vision?

easy

A. To generate captions for images

B. To detect objects in images

C. To enhance image resolution

D. To identify human movements in videos

Action recognition basics in Computer Vision - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of action recognition

Step 2: Compare with other tasks

Final Answer:

Quick Check:

Solution

Step 1: Identify video data format

Step 2: Eliminate incorrect options

Final Answer:

Quick Check:

Solution

Step 1: Understand the loop over frames

Step 2: Count how many features are appended

Final Answer:

Quick Check:

Solution

Step 1: Analyze feature extraction and model input

Step 2: Check other training steps

Final Answer:

Quick Check:

Solution

Step 1: Understand spatial vs temporal features

Step 2: Identify model type capturing motion

Step 3: Evaluate other options

Final Answer:

Quick Check: