In training neural networks, why do we use learning rate schedulers?
Think about how changing the step size affects learning progress.
Learning rate schedulers adjust the learning rate during training to help the model converge better and avoid jumping over minima.
Given the PyTorch code below, what is the learning rate printed after 3 epochs?
import torch import torch.optim as optim model_params = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = optim.SGD(model_params, lr=0.1) scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.5) for epoch in range(3): optimizer.step() scheduler.step() lr = optimizer.param_groups[0]['lr'] print(f"Learning rate after 3 epochs: {lr}")
StepLR reduces the learning rate by gamma every step_size epochs.
StepLR reduces lr by 0.5 every 2 epochs. After 2 epochs lr=0.05, after 3 epochs still 0.05.
You want the learning rate to cyclically increase and decrease during training to help escape local minima. Which PyTorch scheduler should you choose?
Look for a scheduler that explicitly cycles the learning rate.
CyclicLR cycles the learning rate between bounds, increasing and decreasing it during training.
During training, you apply a learning rate scheduler that reduces the learning rate when validation loss plateaus. What effect do you expect on the training loss curve?
Reducing learning rate on plateau helps fine-tune the model.
Lowering learning rate when progress stalls helps the model settle into a better minimum, smoothing loss decrease.
Consider this PyTorch code snippet:
import torch
import torch.optim as optim
params = [torch.nn.Parameter(torch.randn(1, requires_grad=True))]
optimizer = optim.SGD(params, lr=0.1)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5)
for epoch in range(10):
optimizer.step()
scheduler.step()
print(f"Epoch {epoch+1}: lr = {optimizer.param_groups[0]['lr']}")The learning rate resets after 5 epochs but does not decrease smoothly over 10 epochs as intended. What is the cause?
Check the meaning of T_max parameter in CosineAnnealingLR.
T_max defines the number of epochs for one cosine cycle. Setting T_max=5 causes the learning rate to restart every 5 epochs, not smooth over 10.