Action recognition models analyze data to identify what action is happening. What kind of input data do these models mainly use?
Think about how you recognize actions yourself. Do you need just one picture or a series of pictures?
Action recognition models need to see how things change over time, so they use sequences of images or video clips, not just single images.
To understand actions, models must capture how things change over time. Which model type is designed to handle sequences and temporal data?
Think about models that remember past information to understand sequences.
RNNs and LSTMs are designed to process sequences and keep track of past information, making them suitable for temporal data like videos.
Consider a 3D CNN model that takes video clips as input. The input shape is (8, 16, 64, 64, 3) representing batch size, frames, height, width, and color channels. The model outputs predictions for 10 action classes. What is the shape of the output tensor?
input_shape = (8, 16, 64, 64, 3) num_classes = 10 # Model outputs class probabilities for each video in the batch
The model predicts one action class per video clip in the batch.
The model outputs one prediction per video clip, so the output shape is (batch_size, num_classes) = (8, 10).
You trained an action recognition model on a dataset with 10 balanced classes. Which metric best measures how well your model predicts the correct action?
Think about a metric that counts how many predictions are exactly right out of all predictions.
Accuracy measures the proportion of correct predictions over all predictions, which is suitable for balanced multi-class classification.
Consider this PyTorch training snippet for an action recognition model:
outputs = model(inputs) # outputs shape: (8, 10) labels = labels.unsqueeze(1) # labels shape: (8, 1) loss = criterion(outputs, labels)
Why does this code raise a shape mismatch error during loss calculation?
Check the expected label shape for PyTorch's CrossEntropyLoss.
PyTorch's CrossEntropyLoss expects labels as a 1D tensor with class indices, not as a 2D tensor. Unsqueezing labels adds an extra dimension causing mismatch.