0
0
PyTorchml~20 mins

Optimizers (SGD, Adam) in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Optimizer Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Difference between SGD and Adam optimizers
Which statement correctly describes a key difference between SGD and Adam optimizers?
ASGD uses adaptive learning rates based on gradient variance, while Adam uses a constant learning rate.
BSGD uses momentum by default, while Adam does not use momentum at all.
CAdam requires manual tuning of learning rate decay, but SGD automatically adjusts learning rate during training.
DAdam adapts the learning rate for each parameter individually using estimates of first and second moments, while SGD uses a fixed learning rate for all parameters.
Attempts:
2 left
💡 Hint
Think about how each optimizer handles learning rates for different parameters.
Predict Output
intermediate
2:00remaining
Output of training loss with SGD vs Adam
Given the following PyTorch training loop snippet, what will be the printed loss values after one update step using SGD and Adam optimizers respectively?
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(2, 1)
criterion = nn.MSELoss()

x = torch.tensor([[1.0, 2.0]])
y = torch.tensor([[1.0]])

# Using SGD optimizer
optimizer_sgd = optim.SGD(model.parameters(), lr=0.1)

# Forward pass
output = model(x)
loss = criterion(output, y)

# Backward and optimize
optimizer_sgd.zero_grad()
loss.backward()
optimizer_sgd.step()

loss_sgd = criterion(model(x), y).item()

# Reset model weights
model = nn.Linear(2, 1)
optimizer_adam = optim.Adam(model.parameters(), lr=0.1)

output = model(x)
loss = criterion(output, y)
optimizer_adam.zero_grad()
loss.backward()
optimizer_adam.step()
loss_adam = criterion(model(x), y).item()

print(round(loss_sgd, 3), round(loss_adam, 3))
ABoth losses decrease but Adam's loss decreases more after one step.
BBoth losses remain exactly the same after one step.
CSGD loss decreases, but Adam loss increases after one step.
DAdam loss decreases, but SGD loss increases after one step.
Attempts:
2 left
💡 Hint
Adam usually converges faster due to adaptive learning rates.
Hyperparameter
advanced
1:00remaining
Choosing learning rate for Adam optimizer
Which learning rate value is generally recommended as a good starting point for the Adam optimizer in PyTorch?
A0.01
B0.001
C0.1
D1.0
Attempts:
2 left
💡 Hint
Adam usually requires smaller learning rates than SGD.
Metrics
advanced
1:30remaining
Effect of optimizer on training accuracy
You train the same neural network on a classification task using SGD and Adam optimizers with the same learning rate and number of epochs. Which outcome is most likely regarding training accuracy?
ASGD always achieves higher training accuracy than Adam.
BBoth optimizers achieve exactly the same training accuracy at all times.
CAdam achieves higher training accuracy faster than SGD.
DAdam achieves lower training accuracy than SGD due to overfitting.
Attempts:
2 left
💡 Hint
Consider how adaptive learning rates affect convergence speed.
🔧 Debug
expert
2:00remaining
Identifying error in optimizer usage
What error will this PyTorch code raise when trying to train a model with Adam optimizer?
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(3, 1)
optimizer = optim.Adam(model.parameters(), lr=0.01)

x = torch.randn(5, 3)
y = torch.randn(5, 1)

criterion = nn.MSELoss()

optimizer.zero_grad()
output = model(x)
loss = criterion(output, y)

loss.backward()
optimizer.step()
ANo error; code runs correctly.
BRuntimeError: optimizer.zero_grad() called after loss.backward() instead of before.
CRuntimeError: Trying to backward through the graph a second time without retaining it.
DTypeError: optimizer.step() missing required positional argument.
Attempts:
2 left
💡 Hint
Check the order of zero_grad and backward calls.