Bird
Raised Fist0
PyTorchml~20 mins

CosineAnnealingLR in PyTorch - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
CosineAnnealingLR Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What does CosineAnnealingLR scheduler do during training?
Imagine you are training a model and using the CosineAnnealingLR scheduler. What best describes how the learning rate changes over time?
AThe learning rate oscillates between a maximum and minimum value following a cosine curve over a set period.
BThe learning rate increases exponentially during training to speed up convergence.
CThe learning rate stays constant until a certain epoch, then drops suddenly to a lower value.
DThe learning rate decreases linearly from the initial value to zero over the total number of epochs.
Attempts:
2 left
💡 Hint
Think about how cosine functions behave between 0 and pi.
Predict Output
intermediate
2:00remaining
What is the learning rate after 5 epochs?
Given this PyTorch code snippet using CosineAnnealingLR, what is the learning rate at epoch 5?
PyTorch
import torch
import torch.optim as optim

model_params = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = optim.SGD(model_params, lr=0.1)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.01)

lrs = []
for epoch in range(6):
    scheduler.step()
    lrs.append(optimizer.param_groups[0]['lr'])

print(lrs[-1])
A0.1
B0.01
C0.075
D0.055
Attempts:
2 left
💡 Hint
CosineAnnealingLR formula: lr = eta_min + (initial_lr - eta_min) * (1 + cos(pi * epoch / T_max)) / 2
Model Choice
advanced
2:00remaining
Which scenario benefits most from using CosineAnnealingLR?
You want to train a deep neural network that tends to get stuck in local minima. Which training setup is best suited for CosineAnnealingLR?
ATraining a small linear regression model with a fixed learning rate.
BTraining a deep convolutional network where you want the learning rate to restart periodically to escape local minima.
CTraining a model with very few epochs where learning rate decay is not needed.
DTraining a model with a learning rate that should increase steadily during training.
Attempts:
2 left
💡 Hint
CosineAnnealingLR can be combined with restarts to help escape local minima.
Hyperparameter
advanced
2:00remaining
What effect does increasing T_max have in CosineAnnealingLR?
In the CosineAnnealingLR scheduler, what happens if you increase the T_max parameter while keeping other settings constant?
AThe learning rate decreases faster and reaches eta_min sooner.
BThe learning rate oscillates more frequently between max and min values.
CThe learning rate decreases more slowly and takes longer to reach eta_min.
DThe learning rate stays constant for longer before decreasing.
Attempts:
2 left
💡 Hint
T_max controls the period of the cosine cycle.
🔧 Debug
expert
2:00remaining
Why does the learning rate not change as expected?
You wrote this code to use CosineAnnealingLR but the learning rate stays constant at 0.1 for all epochs. What is the most likely cause?
PyTorch
import torch
import torch.optim as optim

model_params = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = optim.SGD(model_params, lr=0.1)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.01)

for epoch in range(5):
    # training code here
    print(f"Epoch {epoch} lr: {optimizer.param_groups[0]['lr']}")
    # forgot to call scheduler.step()
AThe scheduler.step() function was not called inside the training loop.
BThe model parameters are not passed correctly to the optimizer.
CThe eta_min parameter is set too low to affect the learning rate.
DThe optimizer learning rate was set too high initially.
Attempts:
2 left
💡 Hint
Schedulers need to be updated each epoch to change the learning rate.

Practice

(1/5)
1. What is the main purpose of using CosineAnnealingLR in PyTorch training?
easy
A. To stop training early when accuracy is high
B. To increase the batch size during training
C. To smoothly adjust the learning rate in a wave-like pattern
D. To shuffle the training data every epoch

Solution

  1. Step 1: Understand the role of learning rate schedulers

    Learning rate schedulers adjust the learning rate during training to improve convergence.
  2. Step 2: Identify what CosineAnnealingLR does

    CosineAnnealingLR changes the learning rate smoothly following a cosine curve, avoiding sudden jumps.
  3. Final Answer:

    To smoothly adjust the learning rate in a wave-like pattern -> Option C
  4. Quick Check:

    CosineAnnealingLR = smooth wave learning rate [OK]
Hint: CosineAnnealingLR changes learning rate smoothly like a wave [OK]
Common Mistakes:
  • Thinking it changes batch size
  • Confusing it with early stopping
  • Assuming it shuffles data
2. Which of the following is the correct way to create a CosineAnnealingLR scheduler in PyTorch with a cycle length of 10 epochs and minimum learning rate 0.001?
easy
A. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001)
B. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, max_T=10, min_lr=0.001)
C. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
D. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, min_lr=0.001)

Solution

  1. Step 1: Check the official PyTorch parameter names

    The correct parameters are T_max for cycle length and eta_min for minimum learning rate.
  2. Step 2: Match parameters with options

    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) uses T_max=10 and eta_min=0.001, which is correct syntax.
  3. Final Answer:

    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0.001) -> Option A
  4. Quick Check:

    Use T_max and eta_min parameters [OK]
Hint: Use T_max and eta_min exactly as parameter names [OK]
Common Mistakes:
  • Using wrong parameter names like max_T or min_lr
  • Omitting eta_min when needed
  • Swapping parameter order incorrectly
3. Given the code below, what will be the learning rate after 5 calls to scheduler.step() if initial lr is 0.1, T_max=10, and eta_min=0?
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0)
for _ in range(5):
    scheduler.step()
print(optimizer.param_groups[0]['lr'])
medium
A. 0.0
B. Approximately 0.0707
C. 0.1
D. 0.05

Solution

  1. Step 1: Understand CosineAnnealingLR formula

    Learning rate after t calls to step() is: eta_min + 0.5*(initial_lr - eta_min)*(1 + cos(pi * t / T_max))
  2. Step 2: Calculate learning rate at t=5

    lr = 0 + 0.5*0.1*(1 + cos(pi*5/10)) = 0.05*(1 + cos(pi/2)) = 0.05*(1 + 0) = 0.05 exactly.
  3. Final Answer:

    0.05 -> Option D
  4. Quick Check:

    Cosine formula at step 5 = 0.05 [OK]
Hint: Use cosine formula: lr = eta_min + 0.5*(lr0 - eta_min)*(1+cos(pi*t/T_max)) at t=5 = 0.05 [OK]
Common Mistakes:
  • Assuming lr stays constant
  • Confusing step count indexing
  • Ignoring eta_min in calculation
  • Miscalculating to ~0.0707
4. Identify the error in the following code snippet using CosineAnnealingLR:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5)
for epoch in range(10):
    train()
    scheduler.step()
medium
A. scheduler.step() should be called before train()
B. No error, code is correct
C. T_max should be equal to total epochs (10) not 5
D. Learning rate should be set to 0.1 for Adam optimizer

Solution

  1. Step 1: Understand scheduler.step() timing

    Standard PyTorch practice is to call scheduler.step() after train() to update LR for the next epoch.
  2. Step 2: Verify the code

    The loop trains with current LR then steps, which is correct. T_max=5 works for 10 epochs as the schedule continues.
  3. Final Answer:

    No error, code is correct -> Option B
  4. Quick Check:

    train() then scheduler.step() [OK]
Hint: Call scheduler.step() after train() [OK]
Common Mistakes:
  • Thinking step() goes before train()
  • Requiring T_max = total epochs
  • Dictating specific LR for Adam
5. You want to train a model for 50 epochs using CosineAnnealingLR with 2 cycles of learning rate decay. How should you set T_max and why?
hard
A. Set T_max=25 to have two full cosine cycles over 50 epochs
B. Set T_max=50 to have one full cosine cycle over 50 epochs
C. Set T_max=100 to have half a cosine cycle over 50 epochs
D. Set T_max=10 to have five full cosine cycles over 50 epochs

Solution

  1. Step 1: Understand T_max meaning

    T_max is the number of epochs for one full cosine cycle of learning rate decay.
  2. Step 2: Calculate T_max for 2 cycles in 50 epochs

    To have 2 cycles in 50 epochs, each cycle should last 25 epochs, so T_max=25.
  3. Final Answer:

    Set T_max=25 to have two full cosine cycles over 50 epochs -> Option A
  4. Quick Check:

    Two cycles = total epochs / 2 = 25 [OK]
Hint: Divide total epochs by number of cycles for T_max [OK]
Common Mistakes:
  • Setting T_max equal to total epochs for multiple cycles
  • Confusing half and full cycles
  • Choosing T_max larger than total epochs