CosineAnnealingLR helps the learning rate go down smoothly like a wave. This helps the model learn better by not changing too fast or too slow.
CosineAnnealingLR in PyTorch
Start learning this pattern below
Jump into concepts and practice - no test required
torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)
optimizer: The optimizer whose learning rate you want to adjust.
T_max: Number of epochs or steps for one full cosine cycle.
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20, eta_min=0.001)
This code shows how the learning rate changes over 10 steps using CosineAnnealingLR with a cycle of 5 steps. The learning rate starts at 0.1 and decreases smoothly to 0.01, then restarts.
import torch import torch.nn as nn import torch.optim as optim # Simple model model = nn.Linear(2, 1) # Optimizer optimizer = optim.SGD(model.parameters(), lr=0.1) # Scheduler with T_max=5 steps scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5, eta_min=0.01) print('Step | Learning Rate') for step in range(10): # Dummy training step optimizer.zero_grad() dummy_input = torch.tensor([[1.0, 2.0]]) output = model(dummy_input) loss = output.sum() loss.backward() optimizer.step() # Step the scheduler scheduler.step() # Print current learning rate lr = optimizer.param_groups[0]['lr'] print(f'{step+1:4d} | {lr:.5f}')
The learning rate follows a cosine curve from the initial value down to eta_min.
After T_max steps, the learning rate restarts the cycle.
Use scheduler.step() after each optimizer step to update the learning rate.
CosineAnnealingLR smoothly changes the learning rate like a wave.
It helps training by avoiding sudden learning rate changes.
Use it by setting T_max for cycle length and optionally eta_min for minimum learning rate.
Practice
CosineAnnealingLR in PyTorch training?Solution
Step 1: Understand the role of learning rate schedulers
Learning rate schedulers adjust the learning rate during training to improve convergence.Step 2: Identify what CosineAnnealingLR does
CosineAnnealingLR changes the learning rate smoothly following a cosine curve, avoiding sudden jumps.Final Answer:
To smoothly adjust the learning rate in a wave-like pattern -> Option CQuick Check:
CosineAnnealingLR = smooth wave learning rate [OK]
- Thinking it changes batch size
- Confusing it with early stopping
- Assuming it shuffles data
CosineAnnealingLR scheduler in PyTorch with a cycle length of 10 epochs and minimum learning rate 0.001?Solution
Step 1: Check the official PyTorch parameter names
The correct parameters areT_maxfor cycle length andeta_minfor minimum learning rate.Step 2: Match parameters with options
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) usesT_max=10andeta_min=0.001, which is correct syntax.Final Answer:
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) -> Option AQuick Check:
Use T_max and eta_min parameters [OK]
- Using wrong parameter names like max_T or min_lr
- Omitting eta_min when needed
- Swapping parameter order incorrectly
scheduler.step() if initial lr is 0.1, T_max=10, and eta_min=0?
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0)
for _ in range(5):
scheduler.step()
print(optimizer.param_groups[0]['lr'])Solution
Step 1: Understand CosineAnnealingLR formula
Learning rate after t calls to step() is: eta_min + 0.5*(initial_lr - eta_min)*(1 + cos(pi * t / T_max))Step 2: Calculate learning rate at t=5
lr = 0 + 0.5*0.1*(1 + cos(pi*5/10)) = 0.05*(1 + cos(pi/2)) = 0.05*(1 + 0) = 0.05 exactly.Final Answer:
0.05 -> Option DQuick Check:
Cosine formula at step 5 = 0.05 [OK]
- Assuming lr stays constant
- Confusing step count indexing
- Ignoring eta_min in calculation
- Miscalculating to ~0.0707
CosineAnnealingLR:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5)
for epoch in range(10):
train()
scheduler.step()Solution
Step 1: Understand scheduler.step() timing
Standard PyTorch practice is to call scheduler.step() after train() to update LR for the next epoch.Step 2: Verify the code
The loop trains with current LR then steps, which is correct. T_max=5 works for 10 epochs as the schedule continues.Final Answer:
No error, code is correct -> Option BQuick Check:
train() then scheduler.step() [OK]
- Thinking step() goes before train()
- Requiring T_max = total epochs
- Dictating specific LR for Adam
CosineAnnealingLR with 2 cycles of learning rate decay. How should you set T_max and why?Solution
Step 1: Understand T_max meaning
T_max is the number of epochs for one full cosine cycle of learning rate decay.Step 2: Calculate T_max for 2 cycles in 50 epochs
To have 2 cycles in 50 epochs, each cycle should last 25 epochs, so T_max=25.Final Answer:
Set T_max=25 to have two full cosine cycles over 50 epochs -> Option AQuick Check:
Two cycles = total epochs / 2 = 25 [OK]
- Setting T_max equal to total epochs for multiple cycles
- Confusing half and full cycles
- Choosing T_max larger than total epochs
