Which of the following best describes dataset bias in computer vision?
Think about how the data used to train the model affects its ability to generalize.
Dataset bias occurs when the training data lacks diversity or is skewed, so the model learns patterns that do not generalize well to new, different data.
You have a dataset with images labeled as 'cat' or 'dog'. The dataset has 900 cat images and 100 dog images. What is the class distribution ratio?
labels = ['cat'] * 900 + ['dog'] * 100 from collections import Counter class_counts = Counter(labels) ratio = class_counts['cat'] / class_counts['dog'] print(ratio)
Divide the number of cat images by the number of dog images.
There are 900 cat images and 100 dog images, so the ratio is 900/100 = 9.0.
Given a dataset with image counts per class, which bar chart correctly shows the bias towards one class?
import matplotlib.pyplot as plt classes = ['cat', 'dog', 'rabbit'] counts = [900, 100, 50] plt.bar(classes, counts, color=['blue', 'orange', 'green']) plt.title('Image Counts per Class') plt.xlabel('Class') plt.ylabel('Number of Images') plt.show()
Look for the chart where one class has many more images than others.
The dataset has 900 cat images, which is much more than dog (100) and rabbit (50), so the cat bar should be tallest.
Consider a model trained on a biased dataset with mostly cats. The accuracy on cats is 95%, but on dogs is 60%. What is the likely cause?
Think about how the imbalance in training data affects model performance on different classes.
Because the dataset has many more cat images, the model learns cat features well but poorly learns dog features, causing lower accuracy on dogs.
You have a dataset with strong bias: 90% of images are indoor scenes, 10% outdoor. You want to train a model that works well on both. Which approach is best?
Think about how to balance the dataset to reduce bias.
Data augmentation can increase the number of outdoor images, balancing the dataset and helping the model learn features from both indoor and outdoor scenes.