Bird
Raised Fist0
PyTorchml~5 mins

CosineAnnealingLR in PyTorch

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

CosineAnnealingLR helps the learning rate go down smoothly like a wave. This helps the model learn better by not changing too fast or too slow.

When training a neural network and you want the learning rate to slowly decrease and then restart.
When you want to avoid sudden drops in learning rate that can confuse the model.
When you want to improve training stability by adjusting the learning rate in a smooth pattern.
When you want to try a learning rate schedule that can help the model escape local mistakes.
When you want to experiment with cyclical learning rates that reset after some steps.
Syntax
PyTorch
torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)

optimizer: The optimizer whose learning rate you want to adjust.

T_max: Number of epochs or steps for one full cosine cycle.

Examples
Learning rate will decrease following a cosine curve over 10 steps.
PyTorch
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
Learning rate decreases to a minimum of 0.001 over 20 steps.
PyTorch
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20, eta_min=0.001)
Sample Model

This code shows how the learning rate changes over 10 steps using CosineAnnealingLR with a cycle of 5 steps. The learning rate starts at 0.1 and decreases smoothly to 0.01, then restarts.

PyTorch
import torch
import torch.nn as nn
import torch.optim as optim

# Simple model
model = nn.Linear(2, 1)

# Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Scheduler with T_max=5 steps
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5, eta_min=0.01)

print('Step | Learning Rate')
for step in range(10):
    # Dummy training step
    optimizer.zero_grad()
    dummy_input = torch.tensor([[1.0, 2.0]])
    output = model(dummy_input)
    loss = output.sum()
    loss.backward()
    optimizer.step()

    # Step the scheduler
    scheduler.step()

    # Print current learning rate
    lr = optimizer.param_groups[0]['lr']
    print(f'{step+1:4d} | {lr:.5f}')
OutputSuccess
Important Notes

The learning rate follows a cosine curve from the initial value down to eta_min.

After T_max steps, the learning rate restarts the cycle.

Use scheduler.step() after each optimizer step to update the learning rate.

Summary

CosineAnnealingLR smoothly changes the learning rate like a wave.

It helps training by avoiding sudden learning rate changes.

Use it by setting T_max for cycle length and optionally eta_min for minimum learning rate.

Practice

(1/5)
1. What is the main purpose of using CosineAnnealingLR in PyTorch training?
easy
A. To stop training early when accuracy is high
B. To increase the batch size during training
C. To smoothly adjust the learning rate in a wave-like pattern
D. To shuffle the training data every epoch

Solution

  1. Step 1: Understand the role of learning rate schedulers

    Learning rate schedulers adjust the learning rate during training to improve convergence.
  2. Step 2: Identify what CosineAnnealingLR does

    CosineAnnealingLR changes the learning rate smoothly following a cosine curve, avoiding sudden jumps.
  3. Final Answer:

    To smoothly adjust the learning rate in a wave-like pattern -> Option C
  4. Quick Check:

    CosineAnnealingLR = smooth wave learning rate [OK]
Hint: CosineAnnealingLR changes learning rate smoothly like a wave [OK]
Common Mistakes:
  • Thinking it changes batch size
  • Confusing it with early stopping
  • Assuming it shuffles data
2. Which of the following is the correct way to create a CosineAnnealingLR scheduler in PyTorch with a cycle length of 10 epochs and minimum learning rate 0.001?
easy
A. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001)
B. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, max_T=10, min_lr=0.001)
C. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
D. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, min_lr=0.001)

Solution

  1. Step 1: Check the official PyTorch parameter names

    The correct parameters are T_max for cycle length and eta_min for minimum learning rate.
  2. Step 2: Match parameters with options

    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) uses T_max=10 and eta_min=0.001, which is correct syntax.
  3. Final Answer:

    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) -> Option A
  4. Quick Check:

    Use T_max and eta_min parameters [OK]
Hint: Use T_max and eta_min exactly as parameter names [OK]
Common Mistakes:
  • Using wrong parameter names like max_T or min_lr
  • Omitting eta_min when needed
  • Swapping parameter order incorrectly
3. Given the code below, what will be the learning rate after 5 calls to scheduler.step() if initial lr is 0.1, T_max=10, and eta_min=0?
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0)
for _ in range(5):
    scheduler.step()
print(optimizer.param_groups[0]['lr'])
medium
A. 0.0
B. Approximately 0.0707
C. 0.1
D. 0.05

Solution

  1. Step 1: Understand CosineAnnealingLR formula

    Learning rate after t calls to step() is: eta_min + 0.5*(initial_lr - eta_min)*(1 + cos(pi * t / T_max))
  2. Step 2: Calculate learning rate at t=5

    lr = 0 + 0.5*0.1*(1 + cos(pi*5/10)) = 0.05*(1 + cos(pi/2)) = 0.05*(1 + 0) = 0.05 exactly.
  3. Final Answer:

    0.05 -> Option D
  4. Quick Check:

    Cosine formula at step 5 = 0.05 [OK]
Hint: Use cosine formula: lr = eta_min + 0.5*(lr0 - eta_min)*(1+cos(pi*t/T_max)) at t=5 = 0.05 [OK]
Common Mistakes:
  • Assuming lr stays constant
  • Confusing step count indexing
  • Ignoring eta_min in calculation
  • Miscalculating to ~0.0707
4. Identify the error in the following code snippet using CosineAnnealingLR:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5)
for epoch in range(10):
    train()
    scheduler.step()
medium
A. scheduler.step() should be called before train()
B. No error, code is correct
C. T_max should be equal to total epochs (10) not 5
D. Learning rate should be set to 0.1 for Adam optimizer

Solution

  1. Step 1: Understand scheduler.step() timing

    Standard PyTorch practice is to call scheduler.step() after train() to update LR for the next epoch.
  2. Step 2: Verify the code

    The loop trains with current LR then steps, which is correct. T_max=5 works for 10 epochs as the schedule continues.
  3. Final Answer:

    No error, code is correct -> Option B
  4. Quick Check:

    train() then scheduler.step() [OK]
Hint: Call scheduler.step() after train() [OK]
Common Mistakes:
  • Thinking step() goes before train()
  • Requiring T_max = total epochs
  • Dictating specific LR for Adam
5. You want to train a model for 50 epochs using CosineAnnealingLR with 2 cycles of learning rate decay. How should you set T_max and why?
hard
A. Set T_max=25 to have two full cosine cycles over 50 epochs
B. Set T_max=50 to have one full cosine cycle over 50 epochs
C. Set T_max=100 to have half a cosine cycle over 50 epochs
D. Set T_max=10 to have five full cosine cycles over 50 epochs

Solution

  1. Step 1: Understand T_max meaning

    T_max is the number of epochs for one full cosine cycle of learning rate decay.
  2. Step 2: Calculate T_max for 2 cycles in 50 epochs

    To have 2 cycles in 50 epochs, each cycle should last 25 epochs, so T_max=25.
  3. Final Answer:

    Set T_max=25 to have two full cosine cycles over 50 epochs -> Option A
  4. Quick Check:

    Two cycles = total epochs / 2 = 25 [OK]
Hint: Divide total epochs by number of cycles for T_max [OK]
Common Mistakes:
  • Setting T_max equal to total epochs for multiple cycles
  • Confusing half and full cycles
  • Choosing T_max larger than total epochs