Challenge - 5 Problems
Checkpoint Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this PyTorch checkpoint loading code?
Consider the following PyTorch code that saves and loads a model checkpoint including the optimizer state. What will be printed after loading?
PyTorch
import torch import torch.nn as nn import torch.optim as optim model = nn.Linear(2, 1) optimizer = optim.SGD(model.parameters(), lr=0.1) # Simulate one optimizer step optimizer.zero_grad() output = model(torch.tensor([[1.0, 2.0]])) loss = output.sum() loss.backward() optimizer.step() # Save checkpoint checkpoint = {'model_state': model.state_dict(), 'optimizer_state': optimizer.state_dict()} torch.save(checkpoint, 'checkpoint.pth') # Create new model and optimizer model2 = nn.Linear(2, 1) optimizer2 = optim.SGD(model2.parameters(), lr=0.1) # Load checkpoint loaded = torch.load('checkpoint.pth') model2.load_state_dict(loaded['model_state']) optimizer2.load_state_dict(loaded['optimizer_state']) # Check optimizer state keys print(sorted(optimizer2.state_dict().keys()))
Attempts:
2 left
💡 Hint
Look at what keys are stored inside optimizer.state_dict() in PyTorch.
✗ Incorrect
The optimizer.state_dict() returns a dictionary with two keys: 'state' which holds the internal state for each parameter (like momentum buffers), and 'param_groups' which holds parameter group info including learning rates.
❓ Model Choice
intermediate1:30remaining
Which optimizer state is necessary to save for resuming training exactly?
When saving a checkpoint to resume training later without losing optimizer progress, which part of the optimizer must be saved?
Attempts:
2 left
💡 Hint
Think about what the optimizer uses internally to update parameters beyond just hyperparameters.
✗ Incorrect
To resume training exactly, you must save the optimizer's internal state (like momentum buffers) and parameter groups. This allows the optimizer to continue updating parameters as if training was never interrupted.
❓ Hyperparameter
advanced1:30remaining
What happens if you load an optimizer state with a different learning rate than the current optimizer?
Suppose you saved an optimizer state with learning rate 0.01 but now you create a new optimizer with learning rate 0.001 and load the saved state. What learning rate will the optimizer use after loading?
Attempts:
2 left
💡 Hint
Loading optimizer state_dict overwrites all parameter groups including learning rates.
✗ Incorrect
When loading optimizer state_dict, the parameter groups including learning rates are restored from the saved state, overriding any new settings.
🔧 Debug
advanced2:00remaining
Why does this checkpoint loading code cause a runtime error?
Given this code snippet, why does loading the optimizer state cause a runtime error?
```python
model = nn.Linear(3, 2)
optimizer = optim.Adam(model.parameters(), lr=0.01)
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state'])
optimizer.load_state_dict(checkpoint['optimizer_state'])
```
Assume the checkpoint was saved from a model with 2 input features instead of 3.
Attempts:
2 left
💡 Hint
Check if model parameter shapes match between saved and current model.
✗ Incorrect
If the model parameters have different shapes, the optimizer state dict cannot map saved states to current parameters, causing a runtime error when loading optimizer state.
🧠 Conceptual
expert2:30remaining
Why is saving optimizer state important for training with adaptive optimizers?
Adaptive optimizers like Adam keep internal statistics (e.g., running averages of gradients). Why is saving and restoring the optimizer state critical when resuming training with such optimizers?
Attempts:
2 left
💡 Hint
Think about what adaptive optimizers use internally to adjust updates.
✗ Incorrect
Adaptive optimizers rely on internal running averages of gradients and squared gradients to adjust updates. Losing this state resets these statistics, changing training behavior and possibly harming convergence.