Challenge - 5 Problems
Optimizer Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate1:30remaining
Difference between SGD and Adam optimizers
Which statement correctly describes a key difference between SGD and Adam optimizers?
Attempts:
2 left
💡 Hint
Think about how each optimizer handles learning rates for different parameters.
✗ Incorrect
Adam optimizer calculates adaptive learning rates for each parameter using estimates of first (mean) and second (variance) moments of gradients. SGD applies the same learning rate to all parameters unless manually changed.
❓ Predict Output
intermediate2:00remaining
Output of training loss with SGD vs Adam
Given the following PyTorch training loop snippet, what will be the printed loss values after one update step using SGD and Adam optimizers respectively?
PyTorch
import torch import torch.nn as nn import torch.optim as optim model = nn.Linear(2, 1) criterion = nn.MSELoss() x = torch.tensor([[1.0, 2.0]]) y = torch.tensor([[1.0]]) # Using SGD optimizer optimizer_sgd = optim.SGD(model.parameters(), lr=0.1) # Forward pass output = model(x) loss = criterion(output, y) # Backward and optimize optimizer_sgd.zero_grad() loss.backward() optimizer_sgd.step() loss_sgd = criterion(model(x), y).item() # Reset model weights model = nn.Linear(2, 1) optimizer_adam = optim.Adam(model.parameters(), lr=0.1) output = model(x) loss = criterion(output, y) optimizer_adam.zero_grad() loss.backward() optimizer_adam.step() loss_adam = criterion(model(x), y).item() print(round(loss_sgd, 3), round(loss_adam, 3))
Attempts:
2 left
💡 Hint
Adam usually converges faster due to adaptive learning rates.
✗ Incorrect
After one update step, both optimizers reduce the loss, but Adam typically reduces it more because it adapts learning rates per parameter.
❓ Hyperparameter
advanced1:00remaining
Choosing learning rate for Adam optimizer
Which learning rate value is generally recommended as a good starting point for the Adam optimizer in PyTorch?
Attempts:
2 left
💡 Hint
Adam usually requires smaller learning rates than SGD.
✗ Incorrect
Adam typically works well with a learning rate around 0.001 as a default starting point.
❓ Metrics
advanced1:30remaining
Effect of optimizer on training accuracy
You train the same neural network on a classification task using SGD and Adam optimizers with the same learning rate and number of epochs. Which outcome is most likely regarding training accuracy?
Attempts:
2 left
💡 Hint
Consider how adaptive learning rates affect convergence speed.
✗ Incorrect
Adam usually converges faster and reaches higher accuracy earlier because it adapts learning rates per parameter.
🔧 Debug
expert2:00remaining
Identifying error in optimizer usage
What error will this PyTorch code raise when trying to train a model with Adam optimizer?
PyTorch
import torch import torch.nn as nn import torch.optim as optim model = nn.Linear(3, 1) optimizer = optim.Adam(model.parameters(), lr=0.01) x = torch.randn(5, 3) y = torch.randn(5, 1) criterion = nn.MSELoss() optimizer.zero_grad() output = model(x) loss = criterion(output, y) loss.backward() optimizer.step()
Attempts:
2 left
💡 Hint
Check the order of zero_grad and backward calls.
✗ Incorrect
optimizer.zero_grad() should be called before loss.backward() to clear old gradients. Calling zero_grad after backward causes incorrect gradient accumulation and runtime error.