Challenge - 5 Problems

🎖️

Optimizer Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Difference between SGD and Adam optimizers

Which statement correctly describes a key difference between SGD and Adam optimizers?

ASGD uses adaptive learning rates based on gradient variance, while Adam uses a constant learning rate.

BSGD uses momentum by default, while Adam does not use momentum at all.

CAdam requires manual tuning of learning rate decay, but SGD automatically adjusts learning rate during training.

DAdam adapts the learning rate for each parameter individually using estimates of first and second moments, while SGD uses a fixed learning rate for all parameters.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of training loss with SGD vs Adam

Given the following PyTorch training loop snippet, what will be the printed loss values after one update step using SGD and Adam optimizers respectively?

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(2, 1)
criterion = nn.MSELoss()

x = torch.tensor([[1.0, 2.0]])
y = torch.tensor([[1.0]])

# Using SGD optimizer
optimizer_sgd = optim.SGD(model.parameters(), lr=0.1)

# Forward pass
output = model(x)
loss = criterion(output, y)

# Backward and optimize
optimizer_sgd.zero_grad()
loss.backward()
optimizer_sgd.step()

loss_sgd = criterion(model(x), y).item()

# Reset model weights
model = nn.Linear(2, 1)
optimizer_adam = optim.Adam(model.parameters(), lr=0.1)

output = model(x)
loss = criterion(output, y)
optimizer_adam.zero_grad()
loss.backward()
optimizer_adam.step()
loss_adam = criterion(model(x), y).item()

print(round(loss_sgd, 3), round(loss_adam, 3))

ABoth losses decrease but Adam's loss decreases more after one step.

BBoth losses remain exactly the same after one step.

CSGD loss decreases, but Adam loss increases after one step.

DAdam loss decreases, but SGD loss increases after one step.

Attempts:

2 left

❓ Hyperparameter

advanced

1:00remaining

Choosing learning rate for Adam optimizer

Which learning rate value is generally recommended as a good starting point for the Adam optimizer in PyTorch?

A0.01

B0.001

C0.1

D1.0

Attempts:

2 left

❓ Metrics

advanced

1:30remaining

Effect of optimizer on training accuracy

You train the same neural network on a classification task using SGD and Adam optimizers with the same learning rate and number of epochs. Which outcome is most likely regarding training accuracy?

ASGD always achieves higher training accuracy than Adam.

BBoth optimizers achieve exactly the same training accuracy at all times.

CAdam achieves higher training accuracy faster than SGD.

DAdam achieves lower training accuracy than SGD due to overfitting.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identifying error in optimizer usage

What error will this PyTorch code raise when trying to train a model with Adam optimizer?

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(3, 1)
optimizer = optim.Adam(model.parameters(), lr=0.01)

x = torch.randn(5, 3)
y = torch.randn(5, 1)

criterion = nn.MSELoss()

optimizer.zero_grad()
output = model(x)
loss = criterion(output, y)

loss.backward()
optimizer.step()

ANo error; code runs correctly.

BRuntimeError: optimizer.zero_grad() called after loss.backward() instead of before.

CRuntimeError: Trying to backward through the graph a second time without retaining it.

DTypeError: optimizer.step() missing required positional argument.

Attempts:

2 left