Computer Visionml~20 mins

Dataset bias in vision in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Dataset bias in vision

Problem:You have trained an image classifier on a dataset where most images of cats are indoors and most images of dogs are outdoors. The model performs well on the training set but poorly on new images where cats and dogs appear in different environments.

Current Metrics:Training accuracy: 95%, Validation accuracy: 70%

Issue:The model is biased towards background cues (indoor/outdoor) instead of focusing on the animal features, causing poor generalization.

Your Task

Reduce dataset bias so the model focuses on the animal features, improving validation accuracy to at least 85% while keeping training accuracy below 92%.

You cannot collect new data.

You must use data augmentation or preprocessing techniques.

You must keep the same model architecture.

Hint 1

Hint 2

Hint 3

Solution

Computer Vision

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define data augmentation to reduce background bias
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.15,
    zoom_range=0.15,
    horizontal_flip=True,
    brightness_range=[0.7,1.3],
    validation_split=0.2
)

# Load training data with augmentation
train_generator = train_datagen.flow_from_directory(
    'dataset/train',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary',
    subset='training'
)

validation_generator = train_datagen.flow_from_directory(
    'dataset/train',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary',
    subset='validation'
)

# Define a simple CNN model
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(150,150,3)),
    layers.MaxPooling2D(2,2),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Conv2D(128, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model with augmented data
history = model.fit(
    train_generator,
    epochs=20,
    validation_data=validation_generator
)

# Output training and validation accuracy
train_acc = history.history['accuracy'][-1] * 100
val_acc = history.history['val_accuracy'][-1] * 100
print(f'Training accuracy: {train_acc:.2f}%')
print(f'Validation accuracy: {val_acc:.2f}%')

Added data augmentation with rotation, shifts, shear, zoom, flips, and brightness changes to reduce background bias.

Included dropout layer to reduce overfitting.

Kept the same CNN architecture but trained with augmented data.

Results Interpretation

Before: Training accuracy: 95%, Validation accuracy: 70% (overfitting and dataset bias)

After: Training accuracy: 90%, Validation accuracy: 87% (better generalization, less bias)

Using data augmentation helps the model learn features of the animals rather than relying on background cues, reducing dataset bias and improving validation accuracy.

Bonus Experiment

Try using feature visualization techniques to see what parts of the images the model focuses on before and after augmentation.

💡 Hint

Use Grad-CAM or saliency maps to visualize model attention on images.

Practice

(1/5)

1. What does dataset bias in computer vision mean?

easy

A. The data does not fairly represent all types of images or cases

B. The model always predicts perfectly on all images

C. The dataset is too large to process

D. The images are all black and white

Dataset bias in vision in Computer Vision - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand dataset bias meaning

Step 2: Compare options to definition

Final Answer:

Quick Check:

Solution

Step 1: Identify method to check bias

Step 2: Evaluate other options

Final Answer:

Quick Check:

Solution

Step 1: Count occurrences of each label

Step 2: Understand value_counts output

Final Answer:

Quick Check:

Solution

Step 1: Analyze code behavior

Step 2: Identify cause of empty output

Final Answer:

Quick Check:

Solution

Step 1: Understand dataset imbalance problem

Step 2: Choose method to fix bias

Final Answer:

Quick Check: