beginner

What is the main role of an optimizer in training a neural network?

An optimizer updates the model's weights to reduce the difference between predictions and true values, helping the model learn from data.

Click to reveal answer

beginner

What does SGD stand for and how does it update weights?

SGD stands for Stochastic Gradient Descent. It updates weights by moving them a small step opposite to the gradient of the loss, using a fixed learning rate.

Click to reveal answer

intermediate

How does Adam optimizer differ from SGD?

Adam combines ideas from momentum and adaptive learning rates. It adjusts learning rates for each weight individually using estimates of first and second moments of gradients.

Click to reveal answer

intermediate

What are the key hyperparameters of Adam optimizer?

The key hyperparameters are learning rate, beta1 (momentum decay), beta2 (RMS decay), and epsilon (small number to avoid division by zero).

Click to reveal answer

intermediate

Why might you choose Adam over SGD for training a model?

Adam often converges faster and requires less tuning of learning rate because it adapts learning rates per parameter and uses momentum, making it good for complex or noisy problems.

Click to reveal answer

What does SGD use to update model weights?

AGradient of loss and fixed learning rate

BRandom weight changes

CSecond moment estimates

DAdaptive learning rates per weight

Which optimizer adapts learning rates for each parameter individually?

ASGD

BGradient Descent

CAdam

DRMSProp

What is the purpose of the beta1 parameter in Adam optimizer?

AControls learning rate

BControls momentum decay

CPrevents division by zero

DControls batch size

Which optimizer is generally better for noisy or complex problems?

ANone

BSGD

CVanilla Gradient Descent

DAdam

What does the epsilon parameter in Adam do?

AAvoids division by zero

BControls momentum

CSets learning rate

DAdjusts batch size

Explain how SGD updates model weights during training.

Describe the advantages of using Adam optimizer compared to SGD.