0
0
PyTorchml~20 mins

Sequential model shortcut in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Sequential model shortcut
Problem:You have a simple neural network built with PyTorch's Sequential model to classify digits from the MNIST dataset. The model currently achieves 98% training accuracy but only 85% validation accuracy.
Current Metrics:Training accuracy: 98%, Validation accuracy: 85%, Training loss: 0.05, Validation loss: 0.35
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.
Your Task
Reduce overfitting by improving validation accuracy to at least 90% while keeping training accuracy below 95%.
You must keep using PyTorch Sequential model.
You cannot change the dataset or increase its size.
You can only modify the model architecture or training hyperparameters.
Hint 1
Hint 2
Hint 3
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data preparation
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000, shuffle=False)

# Define model with dropout and batch normalization
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28*28, 256),
    nn.BatchNorm1d(256),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(256, 128),
    nn.BatchNorm1d(128),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(128, 10)
)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(10):
    model.train()
    total_loss = 0
    correct = 0
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * data.size(0)
        pred = output.argmax(dim=1)
        correct += pred.eq(target).sum().item()
    train_loss = total_loss / len(train_loader.dataset)
    train_acc = correct / len(train_loader.dataset) * 100

    model.eval()
    val_loss = 0
    val_correct = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            loss = criterion(output, target)
            val_loss += loss.item() * data.size(0)
            pred = output.argmax(dim=1)
            val_correct += pred.eq(target).sum().item()
    val_loss /= len(val_loader.dataset)
    val_acc = val_correct / len(val_loader.dataset) * 100

    print(f'Epoch {epoch+1}: Train Loss={train_loss:.4f}, Train Acc={train_acc:.2f}%, Val Loss={val_loss:.4f}, Val Acc={val_acc:.2f}%')
Added nn.Dropout layers with 0.3 dropout rate after activation layers to reduce overfitting.
Added nn.BatchNorm1d layers after linear layers to stabilize and speed up training.
Kept learning rate at 0.001 and trained for 10 epochs.
Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 85%, Training loss: 0.05, Validation loss: 0.35

After: Training accuracy: 93%, Validation accuracy: 91%, Training loss: 0.15, Validation loss: 0.25

Adding dropout and batch normalization helps reduce overfitting by preventing the model from relying too much on training data details. This improves validation accuracy and generalization.
Bonus Experiment
Try replacing the dropout layers with L2 weight regularization (weight decay) in the optimizer and compare results.
💡 Hint
Set weight_decay parameter in Adam optimizer to a small value like 0.0005 and remove dropout layers.