Recall & Review
beginner
What is the main role of an optimizer in training a neural network?
An optimizer updates the model's weights to reduce the difference between predictions and true values, helping the model learn from data.
Click to reveal answer
beginner
What does SGD stand for and how does it update weights?
SGD stands for Stochastic Gradient Descent. It updates weights by moving them a small step opposite to the gradient of the loss, using a fixed learning rate.
Click to reveal answer
intermediate
How does Adam optimizer differ from SGD?
Adam combines ideas from momentum and adaptive learning rates. It adjusts learning rates for each weight individually using estimates of first and second moments of gradients.
Click to reveal answer
intermediate
What are the key hyperparameters of Adam optimizer?
The key hyperparameters are learning rate, beta1 (momentum decay), beta2 (RMS decay), and epsilon (small number to avoid division by zero).
Click to reveal answer
intermediate
Why might you choose Adam over SGD for training a model?
Adam often converges faster and requires less tuning of learning rate because it adapts learning rates per parameter and uses momentum, making it good for complex or noisy problems.
Click to reveal answer
What does SGD use to update model weights?
✗ Incorrect
SGD updates weights by moving opposite to the gradient of the loss using a fixed learning rate.
Which optimizer adapts learning rates for each parameter individually?
✗ Incorrect
Adam adapts learning rates per parameter using estimates of first and second moments of gradients.
What is the purpose of the beta1 parameter in Adam optimizer?
✗ Incorrect
Beta1 controls the decay rate of the moving average of past gradients, acting like momentum.
Which optimizer is generally better for noisy or complex problems?
✗ Incorrect
Adam adapts learning rates and uses momentum, making it better for noisy or complex problems.
What does the epsilon parameter in Adam do?
✗ Incorrect
Epsilon is a small number added to prevent division by zero during calculations.
Explain how SGD updates model weights during training.
Think about moving weights opposite to the slope of the loss.
You got /3 concepts.
Describe the advantages of using Adam optimizer compared to SGD.
Consider how Adam adjusts learning rates and remembers past gradients.
You got /4 concepts.