PyTorchml~20 mins

StepLR and MultiStepLR in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - StepLR and MultiStepLR

Problem:You have a neural network training on a classification task. The learning rate is fixed, causing the model to plateau early and not improve validation accuracy after some epochs.

Current Metrics:Training accuracy: 95%, Validation accuracy: 78%, Training loss: 0.15, Validation loss: 0.45

Issue:The model overfits early and validation accuracy stops improving because the learning rate is not adjusted during training.

Your Task

Use learning rate schedulers StepLR and MultiStepLR to reduce the learning rate during training and improve validation accuracy to above 85% while keeping training accuracy below 92%.

You must keep the same model architecture and optimizer.

You can only change the learning rate scheduler and its parameters.

Training epochs should remain 30.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR, MultiStepLR
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Simple model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
    def forward(self, x):
        return self.fc(x)

# Data
transform = transforms.ToTensor()
train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST('.', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000)

# Model, optimizer, loss
model = SimpleNet()
optimizer = optim.SGD(model.parameters(), lr=0.1)
criterion = nn.CrossEntropyLoss()

# Scheduler: Choose one
#scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
scheduler = MultiStepLR(optimizer, milestones=[10,20], gamma=0.1)

# Training loop
for epoch in range(30):
    model.train()
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()

    # Validation
    model.eval()
    correct_train = 0
    total_train = 0
    with torch.no_grad():
        for data, target in train_loader:
            output = model(data)
            pred = output.argmax(dim=1)
            correct_train += (pred == target).sum().item()
            total_train += target.size(0)
    train_acc = 100 * correct_train / total_train

    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            pred = output.argmax(dim=1)
            correct_val += (pred == target).sum().item()
            total_val += target.size(0)
    val_acc = 100 * correct_val / total_val

    print(f'Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%, LR: {optimizer.param_groups[0]["lr"]:.4f}')

Added learning rate scheduler StepLR or MultiStepLR to reduce learning rate during training.

Set StepLR to reduce learning rate by 0.1 every 10 epochs.

Set MultiStepLR to reduce learning rate by 0.1 at epochs 10 and 20.

Kept model and optimizer same, only changed scheduler and training loop to call scheduler.step() each epoch.

Results Interpretation

Before: Training accuracy 95%, Validation accuracy 78%, Validation loss 0.45

After: Training accuracy 90%, Validation accuracy 87%, Validation loss 0.35

Using learning rate schedulers like StepLR or MultiStepLR helps reduce overfitting by lowering the learning rate during training. This allows the model to converge better and improve validation accuracy.

Bonus Experiment

Try using a cosine annealing learning rate scheduler instead of StepLR or MultiStepLR and observe the effect on validation accuracy.

💡 Hint

CosineAnnealingLR gradually reduces the learning rate following a cosine curve, which can help smooth convergence.

Practice

(1/5)

1. What is the main difference between StepLR and MultiStepLR in PyTorch?

easy

A. StepLR decreases learning rate at fixed intervals; MultiStepLR decreases at specific epochs.

B. StepLR increases learning rate; MultiStepLR decreases learning rate.

C. StepLR changes learning rate randomly; MultiStepLR keeps it constant.

D. StepLR is used only for batch size adjustment; MultiStepLR for learning rate.

StepLR and MultiStepLR in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand `StepLR` behavior

Step 2: Understand `MultiStepLR` behavior

Final Answer:

Quick Check:

Solution

Step 1: Recall `StepLR` parameters

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand milestones and gamma

Step 2: Calculate learning rate at epoch 7

Final Answer:

Quick Check:

Solution

Step 1: Check StepLR parameters

Step 2: Identify misuse of milestones

Final Answer:

Quick Check:

Solution

Step 1: Understand the requirement

Step 2: Analyze scheduler options

Step 3: Evaluate options

Final Answer:

Quick Check:

Start learning this pattern below

Practice

Solution

Step 1: Understand StepLR behavior

Step 2: Understand MultiStepLR behavior

Final Answer:

Quick Check:

Solution

Step 1: Recall StepLR parameters

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand milestones and gamma

Step 2: Calculate learning rate at epoch 7

Final Answer:

Quick Check:

Solution

Step 1: Check StepLR parameters

Step 2: Identify misuse of milestones

Final Answer:

Quick Check:

Solution

Step 1: Understand the requirement

Step 2: Analyze scheduler options

Step 3: Evaluate options

Final Answer:

Quick Check:

Step 1: Understand `StepLR` behavior

Step 2: Understand `MultiStepLR` behavior

Step 1: Recall `StepLR` parameters