0
0
PyTorchml~20 mins

nn.Conv2d layers in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - nn.Conv2d layers
Problem:You are training a simple image classifier using a convolutional neural network (CNN) with nn.Conv2d layers on the CIFAR-10 dataset. The current model achieves high training accuracy but much lower validation accuracy.
Current Metrics:Training accuracy: 98%, Validation accuracy: 65%, Training loss: 0.05, Validation loss: 1.2
Issue:The model is overfitting: it performs very well on training data but poorly on unseen validation data.
Your Task
Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.
You can only modify the model architecture and training hyperparameters.
Do not change the dataset or data preprocessing steps.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define transformations for training and validation
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

# Define CNN model with dropout and batch normalization
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.dropout1 = nn.Dropout(0.25)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.dropout2 = nn.Dropout(0.25)
        self.fc1 = nn.Linear(64 * 8 * 8, 512)
        self.dropout3 = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        x = self.pool(nn.functional.relu(self.bn1(self.conv1(x))))
        x = self.dropout1(x)
        x = self.pool(nn.functional.relu(self.bn2(self.conv2(x))))
        x = self.dropout2(x)
        x = x.view(-1, 64 * 8 * 8)
        x = nn.functional.relu(self.fc1(x))
        x = self.dropout3(x)
        x = self.fc2(x)
        return x

# Initialize model, loss, optimizer
model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(10):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()
    train_loss = running_loss / total
    train_acc = 100. * correct / total

    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = outputs.max(1)
            val_total += labels.size(0)
            val_correct += predicted.eq(labels).sum().item()
    val_loss /= val_total
    val_acc = 100. * val_correct / val_total

    print(f'Epoch {epoch+1}: Train Loss={train_loss:.3f}, Train Acc={train_acc:.1f}%, Val Loss={val_loss:.3f}, Val Acc={val_acc:.1f}%')
Added Batch Normalization layers after each Conv2d layer to stabilize and speed up training.
Added Dropout layers after convolutional and fully connected layers to reduce overfitting.
Added MaxPooling layers to reduce spatial size and model complexity.
Used Adam optimizer with learning rate 0.001 for better convergence.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 65%, Training loss 0.05, Validation loss 1.2

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.25, Validation loss 0.65

Adding dropout and batch normalization reduces overfitting by preventing the model from memorizing training data. This improves validation accuracy and generalization.
Bonus Experiment
Try replacing the dropout layers with L2 weight regularization (weight decay) in the optimizer and compare results.
💡 Hint
Set weight_decay parameter in Adam optimizer to a small value like 0.0005 and remove dropout layers.