Challenge - 5 Problems
Warmup Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate2:00remaining
Why use a learning rate warmup in training?
Which of the following best explains the main reason for using a learning rate warmup at the start of training a neural network?
Attempts:
2 left
💡 Hint
Think about how sudden large updates affect a new model's weights.
✗ Incorrect
A warmup gradually increases the learning rate from a small value to the target value to avoid large, unstable updates early in training.
❓ Predict Output
intermediate2:00remaining
Output of learning rate scheduler with warmup
What will be the learning rate printed at epoch 3 in the following PyTorch code?
PyTorch
import torch from torch.optim import SGD from torch.optim.lr_scheduler import LambdaLR optimizer = SGD([torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))], lr=0.1) warmup_epochs = 5 def lr_lambda(epoch): if epoch < warmup_epochs: return (epoch + 1) / warmup_epochs else: return 1.0 scheduler = LambdaLR(optimizer, lr_lambda=lr_lambda) for epoch in range(7): scheduler.step() print(f"Epoch {epoch}: lr = {optimizer.param_groups[0]['lr']}")
Attempts:
2 left
💡 Hint
Calculate (epoch + 1) / warmup_epochs * base_lr for epoch 3.
✗ Incorrect
At epoch 3, lr_lambda returns (3+1)/5 = 0.8, so lr = 0.8 * 0.1 = 0.08.
❓ Model Choice
advanced2:00remaining
Choosing warmup strategy for transformer training
You are training a transformer model on a large text dataset. Which warmup strategy is most suitable to stabilize training and improve final accuracy?
Attempts:
2 left
💡 Hint
Consider smooth transitions in learning rate for stable training.
✗ Incorrect
Linear warmup gradually increases the learning rate, and cosine decay smoothly reduces it, which is effective for transformers.
❓ Hyperparameter
advanced2:00remaining
Determining warmup steps for a training schedule
If you have a total of 100 epochs and want to use a warmup phase that lasts 10% of training, how many warmup steps should you set?
Attempts:
2 left
💡 Hint
Calculate 10% of 100 epochs.
✗ Incorrect
10% of 100 epochs is 10 epochs, so warmup steps should be 10.
❓ Metrics
expert2:00remaining
Effect of warmup on training loss curve
During training with and without learning rate warmup, which difference in the training loss curve is expected?
Attempts:
2 left
💡 Hint
Think about stability of updates at the start of training.
✗ Incorrect
Warmup prevents large early updates, so loss decreases smoothly. Without warmup, large updates can cause spikes or fluctuations.