0
0
Computer Visionml~20 mins

Action recognition basics in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Action recognition basics
Problem:We want to teach a computer to recognize simple actions like walking, running, and jumping from short video clips.
Current Metrics:Training accuracy: 95%, Validation accuracy: 65%
Issue:The model is overfitting. It performs very well on training data but poorly on new, unseen videos.
Your Task
Reduce overfitting so that validation accuracy improves to at least 80%, while keeping training accuracy below 90%.
You can only change the model architecture and training settings.
Do not add more data or change the dataset.
Keep the input video length and preprocessing the same.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import tensorflow as tf
from tensorflow.keras import layers, models

# Define a simple 3D CNN model for action recognition
model = models.Sequential([
    layers.Conv3D(32, kernel_size=(3,3,3), activation='relu', input_shape=(16, 64, 64, 3)),
    layers.BatchNormalization(),
    layers.MaxPooling3D(pool_size=(1,2,2)),
    layers.Dropout(0.3),

    layers.Conv3D(64, kernel_size=(3,3,3), activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling3D(pool_size=(2,2,2)),
    layers.Dropout(0.4),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(3, activation='softmax')  # 3 classes: walking, running, jumping
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Assume X_train, y_train, X_val, y_val are preloaded video data and labels
# model.fit(X_train, y_train, epochs=30, batch_size=16, validation_data=(X_val, y_val))
Added BatchNormalization layers after Conv3D layers to stabilize and speed up training.
Added Dropout layers with rates 0.3, 0.4, and 0.5 to reduce overfitting.
Reduced learning rate from 0.001 to 0.0005 for smoother learning.
Kept model architecture simple with fewer layers to avoid complexity.
Results Interpretation

Before: Training accuracy was 95%, validation accuracy was 65%, showing strong overfitting.

After: Training accuracy dropped to 88%, validation accuracy improved to 82%, indicating better generalization.

Adding dropout and batch normalization, lowering learning rate, and simplifying the model helps reduce overfitting and improves the model's ability to recognize actions on new videos.
Bonus Experiment
Try using a pretrained 3D CNN model like I3D or C3D and fine-tune it on the same dataset to improve accuracy further.
💡 Hint
Use transfer learning by freezing early layers and training only the last few layers with a low learning rate.