0
0
Computer Visionml~20 mins

Action recognition basics in Computer Vision - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Action Recognition Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main input type for action recognition models?

Action recognition models analyze data to identify what action is happening. What kind of input data do these models mainly use?

ASingle images showing a moment in time
BText descriptions of actions
CAudio recordings of sounds related to actions
DSequences of images or video clips showing movement over time
Attempts:
2 left
💡 Hint

Think about how you recognize actions yourself. Do you need just one picture or a series of pictures?

Model Choice
intermediate
2:00remaining
Which model type is best suited for capturing temporal information in action recognition?

To understand actions, models must capture how things change over time. Which model type is designed to handle sequences and temporal data?

AFeedforward Neural Networks without loops
BRecurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks
CConvolutional Neural Networks (CNNs) only
DSupport Vector Machines (SVMs)
Attempts:
2 left
💡 Hint

Think about models that remember past information to understand sequences.

Predict Output
advanced
2:00remaining
What is the output shape of a 3D CNN model for action recognition given input shape (batch_size=8, frames=16, height=64, width=64, channels=3) and 10 action classes?

Consider a 3D CNN model that takes video clips as input. The input shape is (8, 16, 64, 64, 3) representing batch size, frames, height, width, and color channels. The model outputs predictions for 10 action classes. What is the shape of the output tensor?

Computer Vision
input_shape = (8, 16, 64, 64, 3)
num_classes = 10
# Model outputs class probabilities for each video in the batch
A(8, 16, 10)
B(16, 10)
C(8, 10)
D(8, 64, 64, 10)
Attempts:
2 left
💡 Hint

The model predicts one action class per video clip in the batch.

Metrics
advanced
2:00remaining
Which metric is most appropriate to evaluate an action recognition model on a balanced multi-class dataset?

You trained an action recognition model on a dataset with 10 balanced classes. Which metric best measures how well your model predicts the correct action?

AAccuracy
BRoot Mean Squared Error (RMSE)
CMean Squared Error (MSE)
DPrecision for one class only
Attempts:
2 left
💡 Hint

Think about a metric that counts how many predictions are exactly right out of all predictions.

🔧 Debug
expert
3:00remaining
Why does this action recognition training code raise a shape mismatch error?

Consider this PyTorch training snippet for an action recognition model:

outputs = model(inputs)  # outputs shape: (8, 10)
labels = labels.unsqueeze(1)  # labels shape: (8, 1)
loss = criterion(outputs, labels)

Why does this code raise a shape mismatch error during loss calculation?

ABecause labels need to be a 1D tensor of shape (8,) for CrossEntropyLoss
BBecause outputs should have shape (8, 1) to match labels
CBecause inputs and labels have different batch sizes
DBecause criterion expects labels to be one-hot encoded
Attempts:
2 left
💡 Hint

Check the expected label shape for PyTorch's CrossEntropyLoss.