0
0
PytorchHow-ToBeginner · 4 min read

How to Use CosineAnnealingLR in PyTorch for Learning Rate Scheduling

Use torch.optim.lr_scheduler.CosineAnnealingLR by passing your optimizer, the total number of iterations (T_max), and optionally the minimum learning rate (eta_min). It adjusts the learning rate following a cosine curve, helping your model converge smoothly.
📐

Syntax

The CosineAnnealingLR scheduler requires these main parameters:

  • optimizer: The optimizer whose learning rate you want to schedule.
  • T_max: The number of iterations (usually epochs) for one cosine cycle.
  • eta_min (optional): The minimum learning rate value after annealing, default is 0.
  • last_epoch (optional): The index of last epoch, default is -1 to start fresh.

This scheduler updates the learning rate each epoch or step to follow a cosine decay from the initial learning rate down to eta_min.

python
torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
💻

Example

This example shows how to use CosineAnnealingLR with a simple optimizer and print the learning rate for each epoch over 10 epochs.

python
import torch
import torch.nn as nn
import torch.optim as optim

# Dummy model
model = nn.Linear(10, 2)

# Optimizer with initial lr 0.1
optimizer = optim.SGD(model.parameters(), lr=0.1)

# CosineAnnealingLR scheduler for 10 epochs
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.01)

for epoch in range(10):
    # Training step would go here
    optimizer.step()
    # Print current learning rate
    lr = optimizer.param_groups[0]['lr']
    print(f"Epoch {epoch+1}: Learning Rate = {lr:.5f}")
    # Step the scheduler
    scheduler.step()
Output
Epoch 1: Learning Rate = 0.10000 Epoch 2: Learning Rate = 0.08545 Epoch 3: Learning Rate = 0.04755 Epoch 4: Learning Rate = 0.01455 Epoch 5: Learning Rate = 0.01000 Epoch 6: Learning Rate = 0.01455 Epoch 7: Learning Rate = 0.04755 Epoch 8: Learning Rate = 0.08545 Epoch 9: Learning Rate = 0.10000 Epoch 10: Learning Rate = 0.08545
⚠️

Common Pitfalls

  • Not calling scheduler.step() each epoch: The learning rate won't update without this call.
  • Confusing optimizer.step() and scheduler.step() order: Always call optimizer.step() before scheduler.step() in each epoch.
  • Setting T_max incorrectly: It should match the number of epochs or iterations for one cosine cycle; otherwise, the schedule won't behave as expected.
  • Ignoring eta_min: If you want a minimum learning rate above zero, set eta_min explicitly.
python
import torch.optim as optim

# Wrong order example
optimizer = optim.SGD(model.parameters(), lr=0.1)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)

for epoch in range(10):
    scheduler.step()  # Wrong: scheduler.step() before optimizer.step()
    optimizer.step()
📊

Quick Reference

Summary tips for using CosineAnnealingLR:

  • Initialize with your optimizer and total epochs (T_max).
  • Call optimizer.step() before scheduler.step() each epoch.
  • Set eta_min if you want a minimum learning rate above zero.
  • Use this scheduler to smoothly reduce learning rate and potentially improve training stability.

Key Takeaways

Use CosineAnnealingLR to smoothly decrease learning rate following a cosine curve.
Always call optimizer.step() before scheduler.step() each epoch.
Set T_max to the number of epochs for one full cosine cycle.
Optionally set eta_min to define the minimum learning rate.
Not calling scheduler.step() each epoch prevents learning rate updates.