Warmup strategies help the model start learning gently by slowly increasing the learning rate. This avoids big jumps that can confuse the model early on.
Warmup strategies in PyTorch
Start learning this pattern below
Jump into concepts and practice - no test required
from torch.optim.lr_scheduler import LambdaLR # Define a warmup function def warmup_lambda(current_step): if current_step < warmup_steps: return float(current_step) / float(max(1, warmup_steps)) return 1.0 # Create optimizer optimizer = torch.optim.Adam(model.parameters(), lr=base_lr) # Create scheduler with warmup scheduler = LambdaLR(optimizer, lr_lambda=warmup_lambda)
The warmup function returns a multiplier for the learning rate.
During warmup steps, the multiplier grows from 0 to 1, then stays at 1.
def warmup_lambda(step): return min(1.0, step / 1000)
def warmup_lambda(step): if step < 500: return step / 500 else: return 1.0
scheduler = LambdaLR(optimizer, lr_lambda=lambda step: min(1.0, step / 2000))
This code trains a simple linear model with a warmup strategy for the learning rate over 5 steps. The learning rate starts at 0 and grows to 0.1 gradually, then stays constant.
import torch import torch.nn as nn import torch.optim as optim from torch.optim.lr_scheduler import LambdaLR # Simple model model = nn.Linear(10, 1) # Parameters base_lr = 0.1 warmup_steps = 5 # Optimizer optimizer = optim.SGD(model.parameters(), lr=base_lr) # Warmup function def warmup_lambda(step): if step < warmup_steps: return float(step) / float(max(1, warmup_steps)) return 1.0 # Scheduler scheduler = LambdaLR(optimizer, lr_lambda=warmup_lambda) # Dummy data inputs = torch.randn(10, 10) targets = torch.randn(10, 1) # Loss criterion = nn.MSELoss() print('Step | Learning Rate | Loss') for step in range(10): optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() scheduler.step() lr = optimizer.param_groups[0]['lr'] print(f'{step:4d} | {lr:.4f} | {loss.item():.4f}')
Warmup helps prevent the model from making large, unstable updates early in training.
You can combine warmup with other learning rate schedules for better results.
Adjust warmup_steps based on your dataset size and model complexity.
Warmup strategies gradually increase learning rate at the start of training.
This helps models learn smoothly and avoid unstable updates.
In PyTorch, LambdaLR with a custom function is a simple way to add warmup.
Practice
Solution
Step 1: Understand what warmup means
Warmup means starting with a low learning rate and increasing it slowly.Step 2: Identify the goal of warmup
This helps the model learn smoothly and avoid sudden big updates that can harm training.Final Answer:
To gradually increase the learning rate at the start of training -> Option BQuick Check:
Warmup = gradual learning rate increase [OK]
- Thinking warmup immediately sets max learning rate
- Confusing warmup with learning rate decay
- Assuming warmup freezes model weights
Solution
Step 1: Recall PyTorch schedulers for warmup
LambdaLR allows defining a custom function to adjust learning rate.Step 2: Match scheduler to warmup use
Warmup needs a custom function to increase learning rate gradually, which LambdaLR supports.Final Answer:
torch.optim.lr_scheduler.LambdaLR -> Option CQuick Check:
Custom function scheduler = LambdaLR [OK]
- Choosing StepLR which uses fixed step decay
- Picking ReduceLROnPlateau which reacts to metrics
- Selecting ExponentialLR which decays exponentially
import torch
optimizer = torch.optim.SGD([torch.nn.Parameter(torch.randn(2, 2))], lr=0.1)
warmup_epochs = 5
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: min((epoch + 1) / warmup_epochs, 1))
for epoch in range(5):
scheduler.step()
print(f"Epoch {epoch+1} LR: {optimizer.param_groups[0]['lr']}")Solution
Step 1: Understand the lambda function for LR
The lambda function returns (epoch+1)/5 until it reaches 1, scaling the base LR 0.1.Step 2: Calculate LR at epoch 3 (0-based index)
Epoch 3 means epoch=2, so LR factor = (2+1)/5 = 3/5 = 0.6. LR = 0.1 * 0.6 = 0.06.Final Answer:
0.06 -> Option AQuick Check:
Epoch 3 LR = 0.1 * 3/5 = 0.06 [OK]
- Using epoch number directly without +1
- Confusing epoch index with count
- Assuming LR is constant during warmup
import torch
optimizer = torch.optim.Adam([torch.nn.Parameter(torch.randn(2, 2))], lr=0.01)
warmup_epochs = 3
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: epoch / warmup_epochs)
for epoch in range(5):
scheduler.step()
print(f"Epoch {epoch} LR: {optimizer.param_groups[0]['lr']}")Solution
Step 1: Analyze lambda function behavior at epoch 0
At epoch 0, lambda returns 0/3 = 0, so LR is zero, which stops learning initially.Step 2: Understand why zero LR is a problem
Zero LR means no weight updates, which can slow or stop training progress early.Final Answer:
The lambda function returns 0 at epoch 0 causing zero LR -> Option DQuick Check:
Epoch 0 LR = 0 causes no learning [OK]
- Ignoring zero LR at start
- Thinking optimizer type causes error
- Confusing scheduler step order
lr_lambda function correctly achieves this in PyTorch's LambdaLR?Solution
Step 1: Understand the warmup goal
Learning rate should increase linearly from 0 to 1 (scale factor) over 4 epochs, then stay at 1.Step 2: Check each lambda function
lambda epoch: min((epoch + 1) / 4, 1) uses min((epoch+1)/4, 1), which linearly increases from 0.25 to 1 by epoch 4, then stays at 1.Final Answer:
lambda epoch: min((epoch + 1) / 4, 1) -> Option AQuick Check:
Linear increase capped at 1 = lambda epoch: min((epoch + 1) / 4, 1) [OK]
- Not adding +1 to epoch causing zero start
- Multiplying by 0.1 inside lambda instead of base LR
- Using step function instead of linear increase
