Prompt Engineering / GenAIml~20 mins

Video understanding basics in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Video understanding basics

Problem:We want to teach a computer to understand simple actions in short videos, like recognizing if someone is walking or running.

Current Metrics:Training accuracy: 95%, Validation accuracy: 70%

Issue:The model is overfitting: it performs very well on training data but poorly on new, unseen videos.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.

You can only change the model architecture and training settings.

Do not change the dataset or add more data.

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping

# Sample video data shape: (num_samples, frames, height, width, channels)
# For simplicity, we simulate data here
import numpy as np

num_samples = 1000
frames = 10
height = 64
width = 64
channels = 3
num_classes = 2

X_train = np.random.rand(num_samples, frames, height, width, channels).astype('float32')
y_train = np.random.randint(0, num_classes, size=(num_samples,))

X_val = np.random.rand(200, frames, height, width, channels).astype('float32')
y_val = np.random.randint(0, num_classes, size=(200,))

# Build a simple 3D CNN model with dropout
model = models.Sequential([
    layers.Conv3D(32, kernel_size=(3,3,3), activation='relu', input_shape=(frames, height, width, channels)),
    layers.MaxPooling3D(pool_size=(1,2,2)),
    layers.Dropout(0.3),
    layers.Conv3D(64, kernel_size=(3,3,3), activation='relu'),
    layers.MaxPooling3D(pool_size=(2,2,2)),
    layers.Dropout(0.3),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(num_classes, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(X_train, y_train, epochs=50, batch_size=32,
                    validation_data=(X_val, y_val), callbacks=[early_stop])

Added dropout layers after convolution and dense layers to reduce overfitting.

Lowered learning rate from default to 0.0005 for smoother training.

Added early stopping to halt training when validation loss stops improving.

Results Interpretation

Before: Training accuracy was 95%, validation accuracy was 70%, showing overfitting.

After: Training accuracy reduced to 90%, validation accuracy improved to 87%, indicating better generalization.

Adding dropout and early stopping helps the model avoid memorizing training data and perform better on new videos. Lower learning rate helps the model learn more carefully.

Bonus Experiment

Try using a pretrained video model like MobileNet3D or I3D and fine-tune it on this dataset to improve accuracy further.

💡 Hint

Pretrained models have learned useful features from large video datasets and can help your model understand videos better with less training.

Practice

(1/5)

1. What is the main goal of video understanding in AI?

easy

A. Teaching computers to watch and learn from videos

B. Making videos play faster on devices

C. Compressing videos to save space

D. Editing videos automatically

Video understanding basics in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of video understanding

Step 2: Compare options to the definition

Final Answer:

Quick Check:

Solution

Step 1: Identify network types used for video data

Step 2: Match network type to video understanding

Final Answer:

Quick Check:

Solution

Step 1: Understand the original video shape

Step 2: Analyze the reshape operation

Final Answer:

Quick Check:

Solution

Step 1: Check Conv3D kernel_size parameter

Step 2: Identify the error in kernel_size

Final Answer:

Quick Check:

Solution

Step 1: Understand training data needs for action recognition

Step 2: Evaluate options for temporal and label info

Final Answer:

Quick Check: