Which step is typically the first in a computer vision project workflow?
Think about what you need before training a model.
The first step is to collect and label the dataset. Without data, you cannot train or evaluate a model.
You want to classify images into 10 categories. Which model architecture is most suitable to start with?
Consider which model type is designed to process images.
CNNs are designed to handle image data by capturing spatial features. RNNs are for sequences, linear regression is for numeric prediction, and K-Means is unsupervised clustering.
After training a CV model, you get the following confusion matrix for 3 classes:
[[50, 2, 3], [4, 45, 1], [2, 3, 48]]
What is the overall accuracy?
Accuracy = (sum of diagonal) / (sum of all elements).
Sum diagonal = 50+45+48=143; total sum = 50+2+3+4+45+1+2+3+48=158; accuracy = 143/158 ≈ 0.91.
What error will this code raise when applying data augmentation using PyTorch transforms?
import torchvision.transforms as T
from PIL import Image
transform = T.Compose([
T.RandomHorizontalFlip(p=0.5),
T.ToTensor(),
T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
image = Image.open('image.jpg')
augmented = transform(image)Check the number of channels in the image and the mean/std length.
The image is likely RGB (3 channels), so mean and std should have 3 values each to match the channels, avoiding errors in Normalize.
You train a deep CNN for object detection. Which learning rate choice is most likely to cause unstable training with loss oscillations?
Higher learning rates can cause unstable updates.
A learning rate of 0.01 is often too high for deep CNNs, causing large weight updates and oscillating loss.