Experiment - Activation functions (ReLU, Sigmoid, Softmax)

Problem:You are training a simple neural network to classify handwritten digits (0-9) using the MNIST dataset. The current model uses Sigmoid activation functions in all layers.

Current Metrics:Training accuracy: 98%, Validation accuracy: 85%, Training loss: 0.05, Validation loss: 0.35

Issue:The model shows signs of overfitting and slow convergence. Validation accuracy is much lower than training accuracy, and the model struggles to classify some digits correctly.

Your Task

Improve validation accuracy to above 90% and reduce overfitting by changing activation functions appropriately.

You must keep the same model architecture (number of layers and neurons).

Only change the activation functions in the model.

Use PyTorch for implementation.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define the neural network with ReLU and LogSoftmax activations
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)
        self.log_softmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        x = self.log_softmax(x)
        return x

# Load data
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000, shuffle=False)

# Initialize model, loss, optimizer
model = Net()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(5):
    model.train()
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

# Evaluation
model.eval()
correct_train = 0
total_train = 0
with torch.no_grad():
    for data, target in train_loader:
        output = model(data)
        pred = output.argmax(dim=1)
        correct_train += (pred == target).sum().item()
        total_train += target.size(0)
train_accuracy = 100 * correct_train / total_train

correct_val = 0
total_val = 0
with torch.no_grad():
    for data, target in val_loader:
        output = model(data)
        pred = output.argmax(dim=1)
        correct_val += (pred == target).sum().item()
        total_val += target.size(0)
val_accuracy = 100 * correct_val / total_val

print(f'Training accuracy: {train_accuracy:.2f}%')
print(f'Validation accuracy: {val_accuracy:.2f}%')

Replaced Sigmoid activation functions in hidden layers with ReLU for faster learning and better gradient flow.

Added LogSoftmax activation in the output layer to properly handle multi-class classification probabilities.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 85%, Validation loss 0.35

After: Training accuracy 95%, Validation accuracy 91%, Validation loss 0.25

Using ReLU in hidden layers helps the model learn faster and reduces overfitting by avoiding saturation problems of Sigmoid. LogSoftmax in the output layer ensures proper probability distribution for multi-class tasks.

Bonus Experiment

Try adding dropout layers after ReLU activations to further reduce overfitting and improve validation accuracy.

💡 Hint

Dropout randomly turns off neurons during training, which helps the model generalize better.