Practice

(1/5)

1. What is the main purpose of using a warmup strategy in PyTorch training?

easy

A. To immediately set the learning rate to its maximum value

B. To gradually increase the learning rate at the start of training

C. To decrease the learning rate throughout the entire training

D. To freeze model weights during the first epochs

Solution

Step 1: Understand what warmup means
Warmup means starting with a low learning rate and increasing it slowly.
Step 2: Identify the goal of warmup
This helps the model learn smoothly and avoid sudden big updates that can harm training.
Final Answer:
To gradually increase the learning rate at the start of training -> Option B
Quick Check:
Warmup = gradual learning rate increase [OK]

Hint: Warmup means slowly raising learning rate early [OK]

Common Mistakes:

Thinking warmup immediately sets max learning rate
Confusing warmup with learning rate decay
Assuming warmup freezes model weights

2. Which PyTorch class is commonly used to implement a warmup learning rate schedule with a custom function?

easy

A. torch.optim.lr_scheduler.StepLR

B. torch.optim.lr_scheduler.ReduceLROnPlateau

C. torch.optim.lr_scheduler.LambdaLR

D. torch.optim.lr_scheduler.ExponentialLR

Solution

Step 1: Recall PyTorch schedulers for warmup
LambdaLR allows defining a custom function to adjust learning rate.
Step 2: Match scheduler to warmup use
Warmup needs a custom function to increase learning rate gradually, which LambdaLR supports.
Final Answer:
torch.optim.lr_scheduler.LambdaLR -> Option C
Quick Check:
Custom function scheduler = LambdaLR [OK]

Hint: LambdaLR lets you define custom learning rate changes [OK]

Common Mistakes:

Choosing StepLR which uses fixed step decay
Picking ReduceLROnPlateau which reacts to metrics
Selecting ExponentialLR which decays exponentially

3. Given the following PyTorch code snippet, what will be the learning rate at epoch 3?

import torch
optimizer = torch.optim.SGD([torch.nn.Parameter(torch.randn(2, 2))], lr=0.1)

warmup_epochs = 5
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: min((epoch + 1) / warmup_epochs, 1))

for epoch in range(5):
    scheduler.step()
    print(f"Epoch {epoch+1} LR: {optimizer.param_groups[0]['lr']}")

medium

A. 0.06

B. 0.03

C. 0.10

D. 0.50

Solution

Step 1: Understand the lambda function for LR
The lambda function returns (epoch+1)/5 until it reaches 1, scaling the base LR 0.1.
Step 2: Calculate LR at epoch 3 (0-based index)
Epoch 3 means epoch=2, so LR factor = (2+1)/5 = 3/5 = 0.6. LR = 0.1 * 0.6 = 0.06.
Final Answer:
0.06 -> Option A
Quick Check:
Epoch 3 LR = 0.1 * 3/5 = 0.06 [OK]

Hint: Multiply base LR by (epoch+1)/warmup_epochs [OK]

Common Mistakes:

Using epoch number directly without +1
Confusing epoch index with count
Assuming LR is constant during warmup

4. Identify the error in this PyTorch warmup scheduler code:

import torch
optimizer = torch.optim.Adam([torch.nn.Parameter(torch.randn(2, 2))], lr=0.01)
warmup_epochs = 3
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: epoch / warmup_epochs)

for epoch in range(5):
    scheduler.step()
    print(f"Epoch {epoch} LR: {optimizer.param_groups[0]['lr']}")

medium

A. The optimizer should be SGD, not Adam

B. scheduler.step() should be called after optimizer.step()

C. The learning rate is not scaled by base LR

D. The lambda function returns 0 at epoch 0 causing zero LR

Solution

Step 1: Analyze lambda function behavior at epoch 0
At epoch 0, lambda returns 0/3 = 0, so LR is zero, which stops learning initially.
Step 2: Understand why zero LR is a problem
Zero LR means no weight updates, which can slow or stop training progress early.
Final Answer:
The lambda function returns 0 at epoch 0 causing zero LR -> Option D
Quick Check:
Epoch 0 LR = 0 causes no learning [OK]

Hint: Check if lambda returns zero at first epoch [OK]

Common Mistakes:

Ignoring zero LR at start
Thinking optimizer type causes error
Confusing scheduler step order

5. You want to implement a warmup strategy that linearly increases the learning rate from 0 to 0.1 over 4 epochs, then keeps it constant. Which lr_lambda function correctly achieves this in PyTorch's LambdaLR?

hard

A. lambda epoch: min((epoch + 1) / 4, 1)

B. lambda epoch: epoch / 4

C. lambda epoch: 1 if epoch >= 4 else 0.1 * epoch

D. lambda epoch: (epoch + 1) * 0.1

Solution

Step 1: Understand the warmup goal
Learning rate should increase linearly from 0 to 1 (scale factor) over 4 epochs, then stay at 1.
Step 2: Check each lambda function
lambda epoch: min((epoch + 1) / 4, 1) uses min((epoch+1)/4, 1), which linearly increases from 0.25 to 1 by epoch 4, then stays at 1.
Final Answer:
lambda epoch: min((epoch + 1) / 4, 1) -> Option A
Quick Check:
Linear increase capped at 1 = lambda epoch: min((epoch + 1) / 4, 1) [OK]

Hint: Use min((epoch+1)/warmup_epochs, 1) for linear warmup [OK]

Common Mistakes:

Not adding +1 to epoch causing zero start
Multiplying by 0.1 inside lambda instead of base LR
Using step function instead of linear increase

Why Warmup strategies in PyTorch? - Purpose & Use Cases

Start learning this pattern below

Practice

Solution

Step 1: Understand what warmup means

Step 2: Identify the goal of warmup

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch schedulers for warmup

Step 2: Match scheduler to warmup use

Final Answer:

Quick Check:

Solution

Step 1: Understand the lambda function for LR

Step 2: Calculate LR at epoch 3 (0-based index)

Final Answer:

Quick Check:

Solution

Step 1: Analyze lambda function behavior at epoch 0

Step 2: Understand why zero LR is a problem

Final Answer:

Quick Check:

Solution

Step 1: Understand the warmup goal

Step 2: Check each lambda function

Final Answer:

Quick Check: