Complete the code to load video frames for action recognition.
import cv2 cap = cv2.VideoCapture('video.mp4') ret, frame = cap.[1]() if ret: print('Frame loaded') cap.release()
The read() method reads the next video frame from the capture object.
Complete the code to extract features from frames using a pretrained CNN model.
from torchvision import models, transforms import torch model = models.resnet18(pretrained=True) model.eval() preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) input_tensor = preprocess(frame) input_batch = input_tensor.unsqueeze(0) with torch.no_grad(): features = model.[1](input_batch)
The forward method runs the input through the model to get output features.
Fix the error in the code to correctly compute accuracy for action recognition predictions.
correct = 0 for pred, label in zip(predictions, labels): if pred == [1]: correct += 1 accuracy = correct / len(labels) print(f'Accuracy: {accuracy:.2f}')
We compare each predicted label pred to the true label label to count correct predictions.
Fill both blanks to create a dictionary of frame indices and their corresponding action labels.
frame_labels = {i: [1] for i, [2] in enumerate(predicted_actions)}We use action as the value and action as the loop variable for predicted actions.
Fill all three blanks to filter frames with confidence above threshold and create a result dictionary.
result = {frame_id: [1] for frame_id, (label, conf) in frame_data.items() if conf [2] [3]We keep labels where confidence is greater than 0.8.