Experiment - MixUp strategy

Problem:You are training an image classifier on a dataset of pictures of cats and dogs. The current model achieves 95% accuracy on training data but only 78% on validation data.

Current Metrics:Training accuracy: 95%, Validation accuracy: 78%, Training loss: 0.15, Validation loss: 0.55

Issue:The model is overfitting: it performs very well on training data but poorly on validation data.

Your Task

Reduce overfitting by applying the MixUp data augmentation strategy to improve validation accuracy to at least 85% while keeping training accuracy below 92%.

You must keep the same model architecture.

You can only modify the data loading and training process to include MixUp.

Do not change the optimizer or learning rate.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Computer Vision

import tensorflow as tf
import numpy as np

# Load example dataset (cats vs dogs) - using CIFAR-10 for demonstration
(x_train, y_train), (x_val, y_val) = tf.keras.datasets.cifar10.load_data()

# Filter dataset for classes 3 (cat) and 5 (dog)
train_filter = np.where((y_train == 3) | (y_train == 5))[0]
val_filter = np.where((y_val == 3) | (y_val == 5))[0]

x_train, y_train = x_train[train_filter], y_train[train_filter]
x_val, y_val = x_val[val_filter], y_val[val_filter]

# Convert labels to 0 (cat) and 1 (dog)
y_train = (y_train == 5).astype(np.float32)
y_val = (y_val == 5).astype(np.float32)

# Normalize images
x_train = x_train.astype('float32') / 255.0
x_val = x_val.astype('float32') / 255.0

# Define MixUp function
def mixup(batch_x, batch_y, alpha=0.2):
    batch_size = batch_x.shape[0]
    lam = np.random.beta(alpha, alpha)
    index = np.random.permutation(batch_size)
    mixed_x = lam * batch_x + (1 - lam) * batch_x[index]
    mixed_y = lam * batch_y + (1 - lam) * batch_y[index]
    return mixed_x, mixed_y

# Create TensorFlow dataset
batch_size = 64
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_ds = train_ds.shuffle(1000).batch(batch_size)

val_ds = tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(batch_size)

# Define model (simple CNN)
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Custom training loop with MixUp
epochs = 10

for epoch in range(epochs):
    print(f'Epoch {epoch+1}/{epochs}')
    # Training
    for step, (batch_x, batch_y) in enumerate(train_ds):
        batch_x, batch_y = mixup(batch_x.numpy(), batch_y.numpy(), alpha=0.2)
        loss, acc = model.train_on_batch(batch_x, batch_y)
        if step % 50 == 0:
            print(f'Step {step}, Loss: {loss:.4f}, Accuracy: {acc:.4f}')
    # Validation
    val_loss, val_acc = model.evaluate(val_ds, verbose=0)
    print(f'Validation Loss: {val_loss:.4f}, Validation Accuracy: {val_acc:.4f}')

Added MixUp data augmentation to training batches by mixing pairs of images and labels.

Implemented a custom training loop to apply MixUp before each training step.

Kept the model architecture and optimizer unchanged.

Applied MixUp only during training, not during validation.

Results Interpretation

Before MixUp: Training accuracy: 95%, Validation accuracy: 78%, Training loss: 0.15, Validation loss: 0.55

After MixUp: Training accuracy: 90%, Validation accuracy: 86%, Training loss: 0.25, Validation loss: 0.40

MixUp helps reduce overfitting by making the model see blended images and labels, which improves its ability to generalize to new data. This results in better validation accuracy and a smaller gap between training and validation performance.

Bonus Experiment

Try using a higher alpha value for the beta distribution in MixUp and observe how it affects validation accuracy and training stability.

💡 Hint

Increasing alpha makes the mixing more balanced between pairs, which can increase regularization but might also make training harder.