0
0
Prompt Engineering / GenAIml~20 mins

Video understanding basics in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Video Understanding Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main purpose of video understanding in AI?

Imagine you have a video of a soccer game. What does video understanding help AI do with this video?

ARecognize actions, objects, and events happening in the video
BOnly extract the audio from the video
CConvert the video into a text document without any analysis
DChange the video colors to black and white
Attempts:
2 left
💡 Hint

Think about what you want a smart system to learn from watching a video.

Predict Output
intermediate
2:00remaining
Output of frame extraction code snippet

What will be the output of this Python code that extracts frames from a video?

Prompt Engineering / GenAI
import cv2
cap = cv2.VideoCapture('video.mp4')
count = 0
while True:
    ret, frame = cap.read()
    if not ret or count == 3:
        break
    print(f'Frame {count} shape:', frame.shape)
    count += 1
cap.release()
A
Frame 0 shape: (480, 640, 3)
Frame 1 shape: (480, 640, 3)
Frame 2 shape: (480, 640, 3)
B
Frame 0 shape: (640, 480, 3)
Frame 1 shape: (640, 480, 3)
Frame 2 shape: (640, 480, 3)
C
Frame 0 shape: (480, 640)
Frame 1 shape: (480, 640)
Frame 2 shape: (480, 640)
DNo output because of error reading video
Attempts:
2 left
💡 Hint

Remember that OpenCV reads frames as height x width x channels.

Model Choice
advanced
2:00remaining
Best model type for action recognition in videos

You want to build an AI that recognizes actions like running or jumping in videos. Which model type is best suited?

AConvolutional Neural Network (CNN) applied only on single images
BRecurrent Neural Network (RNN) processing video frame sequences
CSimple linear regression model
D3D Convolutional Neural Network (3D CNN) that analyzes spatial and temporal data
Attempts:
2 left
💡 Hint

Think about a model that can understand both space (image) and time (video).

Metrics
advanced
2:00remaining
Choosing the right metric for video classification

You trained a video classification model to label videos into categories. Which metric best shows how well your model predicts the correct category?

AMean Squared Error (MSE)
BAccuracy
CBLEU score
DPerplexity
Attempts:
2 left
💡 Hint

Think about a metric that counts correct predictions out of total predictions.

🔧 Debug
expert
3:00remaining
Debugging a video captioning model output error

You have a video captioning model that generates text descriptions. The output is always empty strings. What is the most likely cause?

AThe video frames are too large in resolution
BThe optimizer learning rate is too high
CThe model's vocabulary is empty or not loaded properly
DThe video file format is unsupported
Attempts:
2 left
💡 Hint

Think about what the model needs to produce words in captions.