How to Use CosineAnnealingLR in PyTorch for Learning Rate Scheduling
Use
torch.optim.lr_scheduler.CosineAnnealingLR by passing your optimizer, the total number of iterations (T_max), and optionally the minimum learning rate (eta_min). It adjusts the learning rate following a cosine curve, helping your model converge smoothly.Syntax
The CosineAnnealingLR scheduler requires these main parameters:
- optimizer: The optimizer whose learning rate you want to schedule.
- T_max: The number of iterations (usually epochs) for one cosine cycle.
- eta_min (optional): The minimum learning rate value after annealing, default is 0.
- last_epoch (optional): The index of last epoch, default is -1 to start fresh.
This scheduler updates the learning rate each epoch or step to follow a cosine decay from the initial learning rate down to eta_min.
python
torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
Example
This example shows how to use CosineAnnealingLR with a simple optimizer and print the learning rate for each epoch over 10 epochs.
python
import torch import torch.nn as nn import torch.optim as optim # Dummy model model = nn.Linear(10, 2) # Optimizer with initial lr 0.1 optimizer = optim.SGD(model.parameters(), lr=0.1) # CosineAnnealingLR scheduler for 10 epochs scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.01) for epoch in range(10): # Training step would go here optimizer.step() # Print current learning rate lr = optimizer.param_groups[0]['lr'] print(f"Epoch {epoch+1}: Learning Rate = {lr:.5f}") # Step the scheduler scheduler.step()
Output
Epoch 1: Learning Rate = 0.10000
Epoch 2: Learning Rate = 0.08545
Epoch 3: Learning Rate = 0.04755
Epoch 4: Learning Rate = 0.01455
Epoch 5: Learning Rate = 0.01000
Epoch 6: Learning Rate = 0.01455
Epoch 7: Learning Rate = 0.04755
Epoch 8: Learning Rate = 0.08545
Epoch 9: Learning Rate = 0.10000
Epoch 10: Learning Rate = 0.08545
Common Pitfalls
- Not calling
scheduler.step()each epoch: The learning rate won't update without this call. - Confusing
optimizer.step()andscheduler.step()order: Always calloptimizer.step()beforescheduler.step()in each epoch. - Setting
T_maxincorrectly: It should match the number of epochs or iterations for one cosine cycle; otherwise, the schedule won't behave as expected. - Ignoring
eta_min: If you want a minimum learning rate above zero, seteta_minexplicitly.
python
import torch.optim as optim # Wrong order example optimizer = optim.SGD(model.parameters(), lr=0.1) scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10) for epoch in range(10): scheduler.step() # Wrong: scheduler.step() before optimizer.step() optimizer.step()
Quick Reference
Summary tips for using CosineAnnealingLR:
- Initialize with your optimizer and total epochs (
T_max). - Call
optimizer.step()beforescheduler.step()each epoch. - Set
eta_minif you want a minimum learning rate above zero. - Use this scheduler to smoothly reduce learning rate and potentially improve training stability.
Key Takeaways
Use CosineAnnealingLR to smoothly decrease learning rate following a cosine curve.
Always call optimizer.step() before scheduler.step() each epoch.
Set T_max to the number of epochs for one full cosine cycle.
Optionally set eta_min to define the minimum learning rate.
Not calling scheduler.step() each epoch prevents learning rate updates.