0
0
Prompt Engineering / GenAIml~20 mins

Video understanding basics in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Video understanding basics
Problem:We want to teach a computer to understand simple actions in short videos, like recognizing if someone is walking or running.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%
Issue:The model is overfitting: it performs very well on training data but poorly on new, unseen videos.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.
You can only change the model architecture and training settings.
Do not change the dataset or add more data.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping

# Sample video data shape: (num_samples, frames, height, width, channels)
# For simplicity, we simulate data here
import numpy as np

num_samples = 1000
frames = 10
height = 64
width = 64
channels = 3
num_classes = 2

X_train = np.random.rand(num_samples, frames, height, width, channels).astype('float32')
y_train = np.random.randint(0, num_classes, size=(num_samples,))

X_val = np.random.rand(200, frames, height, width, channels).astype('float32')
y_val = np.random.randint(0, num_classes, size=(200,))

# Build a simple 3D CNN model with dropout
model = models.Sequential([
    layers.Conv3D(32, kernel_size=(3,3,3), activation='relu', input_shape=(frames, height, width, channels)),
    layers.MaxPooling3D(pool_size=(1,2,2)),
    layers.Dropout(0.3),
    layers.Conv3D(64, kernel_size=(3,3,3), activation='relu'),
    layers.MaxPooling3D(pool_size=(2,2,2)),
    layers.Dropout(0.3),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(num_classes, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(X_train, y_train, epochs=50, batch_size=32,
                    validation_data=(X_val, y_val), callbacks=[early_stop])
Added dropout layers after convolution and dense layers to reduce overfitting.
Lowered learning rate from default to 0.0005 for smoother training.
Added early stopping to halt training when validation loss stops improving.
Results Interpretation

Before: Training accuracy was 95%, validation accuracy was 70%, showing overfitting.

After: Training accuracy reduced to 90%, validation accuracy improved to 87%, indicating better generalization.

Adding dropout and early stopping helps the model avoid memorizing training data and perform better on new videos. Lower learning rate helps the model learn more carefully.
Bonus Experiment
Try using a pretrained video model like MobileNet3D or I3D and fine-tune it on this dataset to improve accuracy further.
💡 Hint
Pretrained models have learned useful features from large video datasets and can help your model understand videos better with less training.