PyTorchml~20 mins

Why learning rate strategy affects convergence in PyTorch - Experiment to Prove It

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Why learning rate strategy affects convergence

Problem:Train a simple neural network on the MNIST dataset to classify handwritten digits.

Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.85

Issue:The model overfits: training accuracy is very high but validation accuracy is low, indicating poor generalization.

Your Task

Reduce overfitting by improving validation accuracy to above 85% while keeping training accuracy below 95%.

Keep the model architecture the same (a simple 2-layer fully connected network).

Only change the learning rate strategy (learning rate value and scheduler).

Use PyTorch for implementation.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define simple 2-layer neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        x = x.view(-1, 28*28)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Load MNIST dataset
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000, shuffle=False)

# Initialize model, loss, optimizer
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.05)  # Lower initial learning rate

# Learning rate scheduler: StepLR reduces lr by 0.5 every 5 epochs
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)

def train():
    model.train()
    total_loss = 0
    correct = 0
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * data.size(0)
        pred = output.argmax(dim=1)
        correct += pred.eq(target).sum().item()
    return total_loss / len(train_loader.dataset), correct / len(train_loader.dataset)

def validate():
    model.eval()
    total_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            loss = criterion(output, target)
            total_loss += loss.item() * data.size(0)
            pred = output.argmax(dim=1)
            correct += pred.eq(target).sum().item()
    return total_loss / len(val_loader.dataset), correct / len(val_loader.dataset)

# Training loop
num_epochs = 15
for epoch in range(1, num_epochs + 1):
    train_loss, train_acc = train()
    val_loss, val_acc = validate()
    scheduler.step()
    print(f'Epoch {epoch}: Train loss {train_loss:.4f}, Train acc {train_acc:.4f}, Val loss {val_loss:.4f}, Val acc {val_acc:.4f}')

Reduced initial learning rate from 0.1 to 0.05 to avoid large weight updates.

Added StepLR scheduler to reduce learning rate by half every 5 epochs to help convergence.

Kept model architecture and other hyperparameters unchanged.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 93%, Validation accuracy 87%, Training loss 0.15, Validation loss 0.35

Using a smaller initial learning rate and reducing it gradually during training helps the model converge better. This reduces overfitting by preventing the model from fitting noise in training data and improves validation accuracy.

Bonus Experiment

Try using a cosine annealing learning rate scheduler instead of StepLR and observe the effect on convergence and accuracy.

💡 Hint

Cosine annealing gradually reduces the learning rate following a cosine curve, which can help the model escape local minima and improve generalization.

Practice

(1/5)

1. What is the main role of the learning rate in training a PyTorch model?

easy

A. It determines the type of activation function used.

B. It decides the number of layers in the model.

C. It sets the batch size for training.

D. It controls the size of the steps the model takes to learn.

Why learning rate strategy affects convergence in PyTorch - Experiment to Prove It

Start learning this pattern below

Practice

Solution

Step 1: Understand learning rate function

Step 2: Identify the correct role

Final Answer:

Quick Check:

Solution

Step 1: Check PyTorch optimizer syntax

Step 2: Identify correct code

Final Answer:

Quick Check:

Solution

Step 1: Understand effect of high learning rate

Step 2: Analyze options

Final Answer:

Quick Check:

Solution

Step 1: Understand StepLR behavior

Step 2: Analyze learning rate printout

Final Answer:

Quick Check:

Solution

Step 1: Understand training phases

Step 2: Match strategy to goal

Final Answer:

Quick Check: