Bird
Raised Fist0
PyTorchml~20 mins

StepLR and MultiStepLR in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - StepLR and MultiStepLR
Problem:You have a neural network training on a classification task. The learning rate is fixed, causing the model to plateau early and not improve validation accuracy after some epochs.
Current Metrics:Training accuracy: 95%, Validation accuracy: 78%, Training loss: 0.15, Validation loss: 0.45
Issue:The model overfits early and validation accuracy stops improving because the learning rate is not adjusted during training.
Your Task
Use learning rate schedulers StepLR and MultiStepLR to reduce the learning rate during training and improve validation accuracy to above 85% while keeping training accuracy below 92%.
You must keep the same model architecture and optimizer.
You can only change the learning rate scheduler and its parameters.
Training epochs should remain 30.
Hint 1
Hint 2
Hint 3
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR, MultiStepLR
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Simple model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
    def forward(self, x):
        return self.fc(x)

# Data
transform = transforms.ToTensor()
train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
val_dataset = datasets.MNIST('.', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000)

# Model, optimizer, loss
model = SimpleNet()
optimizer = optim.SGD(model.parameters(), lr=0.1)
criterion = nn.CrossEntropyLoss()

# Scheduler: Choose one
#scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
scheduler = MultiStepLR(optimizer, milestones=[10,20], gamma=0.1)

# Training loop
for epoch in range(30):
    model.train()
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()

    # Validation
    model.eval()
    correct_train = 0
    total_train = 0
    with torch.no_grad():
        for data, target in train_loader:
            output = model(data)
            pred = output.argmax(dim=1)
            correct_train += (pred == target).sum().item()
            total_train += target.size(0)
    train_acc = 100 * correct_train / total_train

    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            pred = output.argmax(dim=1)
            correct_val += (pred == target).sum().item()
            total_val += target.size(0)
    val_acc = 100 * correct_val / total_val

    print(f'Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%, LR: {optimizer.param_groups[0]["lr"]:.4f}')
Added learning rate scheduler StepLR or MultiStepLR to reduce learning rate during training.
Set StepLR to reduce learning rate by 0.1 every 10 epochs.
Set MultiStepLR to reduce learning rate by 0.1 at epochs 10 and 20.
Kept model and optimizer same, only changed scheduler and training loop to call scheduler.step() each epoch.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 78%, Validation loss 0.45

After: Training accuracy 90%, Validation accuracy 87%, Validation loss 0.35

Using learning rate schedulers like StepLR or MultiStepLR helps reduce overfitting by lowering the learning rate during training. This allows the model to converge better and improve validation accuracy.
Bonus Experiment
Try using a cosine annealing learning rate scheduler instead of StepLR or MultiStepLR and observe the effect on validation accuracy.
💡 Hint
CosineAnnealingLR gradually reduces the learning rate following a cosine curve, which can help smooth convergence.

Practice

(1/5)
1. What is the main difference between StepLR and MultiStepLR in PyTorch?
easy
A. StepLR decreases learning rate at fixed intervals; MultiStepLR decreases at specific epochs.
B. StepLR increases learning rate; MultiStepLR decreases learning rate.
C. StepLR changes learning rate randomly; MultiStepLR keeps it constant.
D. StepLR is used only for batch size adjustment; MultiStepLR for learning rate.

Solution

  1. Step 1: Understand StepLR behavior

    StepLR reduces the learning rate by a factor every fixed number of epochs (step size).
  2. Step 2: Understand MultiStepLR behavior

    MultiStepLR reduces the learning rate at specific epochs defined by a list of milestones.
  3. Final Answer:

    StepLR decreases learning rate at fixed intervals; MultiStepLR decreases at specific epochs. -> Option A
  4. Quick Check:

    StepLR fixed steps, MultiStepLR specific milestones [OK]
Hint: StepLR uses fixed steps; MultiStepLR uses milestone epochs [OK]
Common Mistakes:
  • Confusing increase vs decrease of learning rate
  • Thinking StepLR changes learning rate randomly
  • Mixing learning rate with batch size adjustments
2. Which of the following is the correct way to create a StepLR scheduler in PyTorch that reduces learning rate every 5 epochs by a factor of 0.1?
easy
A. scheduler = StepLR(optimizer, step_size=5, gamma=0.1)
B. scheduler = StepLR(optimizer, milestones=[5], gamma=0.1)
C. scheduler = MultiStepLR(optimizer, step_size=5, gamma=0.1)
D. scheduler = MultiStepLR(optimizer, milestones=[5], gamma=0.1)

Solution

  1. Step 1: Recall StepLR parameters

    StepLR takes step_size (int) and gamma (decay factor).
  2. Step 2: Identify correct syntax

    scheduler = StepLR(optimizer, step_size=5, gamma=0.1) uses step_size=5 and gamma=0.1, which matches the requirement.
  3. Final Answer:

    scheduler = StepLR(optimizer, step_size=5, gamma=0.1) -> Option A
  4. Quick Check:

    StepLR uses step_size, not milestones [OK]
Hint: StepLR uses step_size, MultiStepLR uses milestones list [OK]
Common Mistakes:
  • Using milestones parameter with StepLR
  • Confusing MultiStepLR and StepLR syntax
  • Passing step_size as a list
3. Given the following code, what will be the learning rate after epoch 7?
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = MultiStepLR(optimizer, milestones=[3, 6], gamma=0.1)
for epoch in range(8):
    scheduler.step()
    print(f"Epoch {epoch}: lr = {optimizer.param_groups[0]['lr']}")
medium
A. 0.01
B. 0.001
C. 0.1
D. 0.0001

Solution

  1. Step 1: Understand milestones and gamma

    Learning rate reduces by factor 0.1 at epochs 3 and 6.
  2. Step 2: Calculate learning rate at epoch 7

    Initial lr=0.1; after epoch 3: 0.1*0.1=0.01; after epoch 6: 0.01*0.1=0.001; so at epoch 7 lr=0.001.
  3. Final Answer:

    0.001 -> Option B
  4. Quick Check:

    Two milestones reduce lr twice: 0.1 -> 0.01 -> 0.001 [OK]
Hint: Multiply lr by gamma at each milestone passed [OK]
Common Mistakes:
  • Forgetting to apply gamma at both milestones
  • Assuming lr changes before first milestone
  • Confusing StepLR with MultiStepLR behavior
4. Identify the error in this code snippet using StepLR:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = StepLR(optimizer, milestones=[10, 20], gamma=0.5)
for epoch in range(25):
    scheduler.step()
    print(optimizer.param_groups[0]['lr'])
medium
A. scheduler.step() must be called after optimizer.step() inside loop.
B. Optimizer Adam cannot be used with StepLR scheduler.
C. StepLR does not accept milestones parameter; use step_size instead.
D. Gamma value must be greater than 1 for StepLR.

Solution

  1. Step 1: Check StepLR parameters

    StepLR expects step_size, not milestones.
  2. Step 2: Identify misuse of milestones

    Passing milestones causes error; correct is step_size=10 for example.
  3. Final Answer:

    StepLR does not accept milestones parameter; use step_size instead. -> Option C
  4. Quick Check:

    StepLR uses step_size, not milestones [OK]
Hint: StepLR uses step_size, not milestones list [OK]
Common Mistakes:
  • Using milestones with StepLR
  • Thinking Adam optimizer is incompatible
  • Misunderstanding gamma parameter range
5. You want to train a model for 30 epochs. You want the learning rate to drop by 0.1 at epochs 10 and 20, and then again every 5 epochs after epoch 20. Which scheduler setup correctly achieves this?
hard
A. Use StepLR with step_size=10 and gamma=0.1
B. Use StepLR with step_size=5 and gamma=0.1
C. Use MultiStepLR with milestones=[10, 20, 25, 30] and gamma=0.1
D. Use MultiStepLR with milestones=[10, 20] and gamma=0.1, then StepLR with step_size=5 after epoch 20

Solution

  1. Step 1: Understand the requirement

    Learning rate drops at epochs 10 and 20, then every 5 epochs after 20 (i.e., 25, 30).
  2. Step 2: Analyze scheduler options

    MultiStepLR can handle fixed milestones (10, 20). StepLR can handle regular steps (every 5 epochs). Combining both after epoch 20 fits the requirement.
  3. Step 3: Evaluate options

    Use MultiStepLR with milestones=[10, 20, 25, 30] and gamma=0.1 misses epochs after 20 beyond 25 and 30; Use StepLR with step_size=5 and gamma=0.1 drops every 5 epochs from start; Use StepLR with step_size=10 and gamma=0.1 drops every 10 epochs only; Use MultiStepLR with milestones=[10, 20] and gamma=0.1, then StepLR with step_size=5 after epoch 20 correctly combines both schedulers.
  4. Final Answer:

    Use MultiStepLR with milestones=[10, 20] and gamma=0.1, then StepLR with step_size=5 after epoch 20 -> Option D
  5. Quick Check:

    Combine MultiStepLR for early milestones + StepLR for regular steps after [OK]
Hint: Combine MultiStepLR for milestones + StepLR for regular steps [OK]
Common Mistakes:
  • Trying to use only one scheduler for mixed schedule
  • Misplacing milestones or step_size values
  • Assuming StepLR can handle irregular milestones