What is CosineAnnealingLR in PyTorch?

CosineAnnealingLR helps the learning rate go down smoothly like a wave. This helps the model learn better by not changing too fast or too slow.

CosineAnnealingLR in PyTorch - Syntax, Examples & Explanation

Practice

(1/5)

1. What is the main purpose of using CosineAnnealingLR in PyTorch training?

easy

A. To stop training early when accuracy is high

B. To increase the batch size during training

C. To smoothly adjust the learning rate in a wave-like pattern

D. To shuffle the training data every epoch

Solution

Step 1: Understand the role of learning rate schedulers
Learning rate schedulers adjust the learning rate during training to improve convergence.
Step 2: Identify what CosineAnnealingLR does
CosineAnnealingLR changes the learning rate smoothly following a cosine curve, avoiding sudden jumps.
Final Answer:
To smoothly adjust the learning rate in a wave-like pattern -> Option C
Quick Check:
CosineAnnealingLR = smooth wave learning rate [OK]

Hint: CosineAnnealingLR changes learning rate smoothly like a wave [OK]

Common Mistakes:

Thinking it changes batch size
Confusing it with early stopping
Assuming it shuffles data

2. Which of the following is the correct way to create a CosineAnnealingLR scheduler in PyTorch with a cycle length of 10 epochs and minimum learning rate 0.001?

easy

A. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001)

B. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, max_T=10, min_lr=0.001)

C. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)

D. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, min_lr=0.001)

Solution

Step 1: Check the official PyTorch parameter names
The correct parameters are T_max for cycle length and eta_min for minimum learning rate.
Step 2: Match parameters with options
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) uses T_max=10 and eta_min=0.001, which is correct syntax.
Final Answer:
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) -> Option A
Quick Check:
Use T_max and eta_min parameters [OK]

Hint: Use T_max and eta_min exactly as parameter names [OK]

Common Mistakes:

Using wrong parameter names like max_T or min_lr
Omitting eta_min when needed
Swapping parameter order incorrectly

3. Given the code below, what will be the learning rate after 5 calls to scheduler.step() if initial lr is 0.1, T_max=10, and eta_min=0?

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0)
for _ in range(5):
    scheduler.step()
print(optimizer.param_groups[0]['lr'])

medium

A. 0.0

B. Approximately 0.0707

C. 0.1

D. 0.05

Solution

Step 1: Understand CosineAnnealingLR formula
Learning rate after t calls to step() is: eta_min + 0.5*(initial_lr - eta_min)*(1 + cos(pi * t / T_max))
Step 2: Calculate learning rate at t=5
lr = 0 + 0.5*0.1*(1 + cos(pi*5/10)) = 0.05*(1 + cos(pi/2)) = 0.05*(1 + 0) = 0.05 exactly.
Final Answer:
0.05 -> Option D
Quick Check:
Cosine formula at step 5 = 0.05 [OK]

Hint: Use cosine formula: lr = eta_min + 0.5*(lr0 - eta_min)*(1+cos(pi*t/T_max)) at t=5 = 0.05 [OK]

Common Mistakes:

Assuming lr stays constant
Confusing step count indexing
Ignoring eta_min in calculation
Miscalculating to ~0.0707

4. Identify the error in the following code snippet using CosineAnnealingLR:

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5)
for epoch in range(10):
    train()
    scheduler.step()

medium

A. scheduler.step() should be called before train()

B. No error, code is correct

C. T_max should be equal to total epochs (10) not 5

D. Learning rate should be set to 0.1 for Adam optimizer

Solution

Step 1: Understand scheduler.step() timing
Standard PyTorch practice is to call scheduler.step() after train() to update LR for the next epoch.
Step 2: Verify the code
The loop trains with current LR then steps, which is correct. T_max=5 works for 10 epochs as the schedule continues.
Final Answer:
No error, code is correct -> Option B
Quick Check:
train() then scheduler.step() [OK]

Hint: Call scheduler.step() after train() [OK]

Common Mistakes:

Thinking step() goes before train()
Requiring T_max = total epochs
Dictating specific LR for Adam

5. You want to train a model for 50 epochs using CosineAnnealingLR with 2 cycles of learning rate decay. How should you set T_max and why?

hard

A. Set T_max=25 to have two full cosine cycles over 50 epochs

B. Set T_max=50 to have one full cosine cycle over 50 epochs

C. Set T_max=100 to have half a cosine cycle over 50 epochs

D. Set T_max=10 to have five full cosine cycles over 50 epochs

Solution

Step 1: Understand T_max meaning
T_max is the number of epochs for one full cosine cycle of learning rate decay.
Step 2: Calculate T_max for 2 cycles in 50 epochs
To have 2 cycles in 50 epochs, each cycle should last 25 epochs, so T_max=25.
Final Answer:
Set T_max=25 to have two full cosine cycles over 50 epochs -> Option A
Quick Check:
Two cycles = total epochs / 2 = 25 [OK]

Hint: Divide total epochs by number of cycles for T_max [OK]

Common Mistakes:

Setting T_max equal to total epochs for multiple cycles
Confusing half and full cycles
Choosing T_max larger than total epochs

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of learning rate schedulers

Step 2: Identify what CosineAnnealingLR does

Final Answer:

Quick Check:

Solution

Step 1: Check the official PyTorch parameter names

Step 2: Match parameters with options

Final Answer:

Quick Check:

Solution

Step 1: Understand CosineAnnealingLR formula

Step 2: Calculate learning rate at t=5

Final Answer:

Quick Check:

Solution

Step 1: Understand scheduler.step() timing

Step 2: Verify the code

Final Answer:

Quick Check:

Solution

Step 1: Understand T_max meaning

Step 2: Calculate T_max for 2 cycles in 50 epochs

Final Answer:

Quick Check: