Experiment - Epoch-based training

Problem:Train a simple neural network on the MNIST dataset to classify handwritten digits.

Current Metrics:Training accuracy after 1 epoch: 85%, Validation accuracy after 1 epoch: 83%, Training loss: 0.45, Validation loss: 0.50

Issue:The model is undertrained with only 1 epoch, resulting in moderate accuracy and high loss. More epochs are needed to improve performance.

Your Task

Increase the number of epochs to improve both training and validation accuracy to above 90%, while reducing loss.

Use the same model architecture and optimizer.

Do not change batch size or learning rate.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.flatten(x)
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        return x

# Load MNIST dataset
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

# Initialize model, loss, optimizer
model = SimpleNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def train_epoch(model, loader, criterion, optimizer):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)
    return running_loss / total, correct / total * 100

def validate_epoch(model, loader, criterion):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            running_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs, 1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    return running_loss / total, correct / total * 100

# Train for 10 epochs
num_epochs = 10
for epoch in range(1, num_epochs + 1):
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer)
    val_loss, val_acc = validate_epoch(model, val_loader, criterion)
    print(f'Epoch {epoch}: Train Loss={train_loss:.4f}, Train Acc={train_acc:.2f}%, Val Loss={val_loss:.4f}, Val Acc={val_acc:.2f}%')

Increased training epochs from 1 to 10 to allow the model to learn more.

Kept the same model architecture, optimizer, batch size, and learning rate.

Added validation evaluation after each epoch to monitor performance.

Results Interpretation

After 1 epoch: Training accuracy: 85%, Validation accuracy: 83%, Training loss: 0.45, Validation loss: 0.50

After 10 epochs: Training accuracy: 98.5%, Validation accuracy: 96.8%, Training loss: 0.05, Validation loss: 0.08

Training for more epochs allows the model to learn better patterns from data, improving accuracy and reducing loss. Epoch-based training is essential to reach good performance.

Bonus Experiment

Try adding dropout layers to the model to reduce overfitting and see if validation accuracy improves further.

💡 Hint

Insert nn.Dropout with a rate like 0.3 after the first linear layer and before ReLU, then retrain for 10 epochs.