0
0
PyTorchml~5 mins

Optimizers (SGD, Adam) in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main role of an optimizer in training a neural network?
An optimizer updates the model's weights to reduce the difference between predictions and true values, helping the model learn from data.
Click to reveal answer
beginner
What does SGD stand for and how does it update weights?
SGD stands for Stochastic Gradient Descent. It updates weights by moving them a small step opposite to the gradient of the loss, using a fixed learning rate.
Click to reveal answer
intermediate
How does Adam optimizer differ from SGD?
Adam combines ideas from momentum and adaptive learning rates. It adjusts learning rates for each weight individually using estimates of first and second moments of gradients.
Click to reveal answer
intermediate
What are the key hyperparameters of Adam optimizer?
The key hyperparameters are learning rate, beta1 (momentum decay), beta2 (RMS decay), and epsilon (small number to avoid division by zero).
Click to reveal answer
intermediate
Why might you choose Adam over SGD for training a model?
Adam often converges faster and requires less tuning of learning rate because it adapts learning rates per parameter and uses momentum, making it good for complex or noisy problems.
Click to reveal answer
What does SGD use to update model weights?
AGradient of loss and fixed learning rate
BRandom weight changes
CSecond moment estimates
DAdaptive learning rates per weight
Which optimizer adapts learning rates for each parameter individually?
ASGD
BGradient Descent
CAdam
DRMSProp
What is the purpose of the beta1 parameter in Adam optimizer?
AControls learning rate
BControls momentum decay
CPrevents division by zero
DControls batch size
Which optimizer is generally better for noisy or complex problems?
ANone
BSGD
CVanilla Gradient Descent
DAdam
What does the epsilon parameter in Adam do?
AAvoids division by zero
BControls momentum
CSets learning rate
DAdjusts batch size
Explain how SGD updates model weights during training.
Think about moving weights opposite to the slope of the loss.
You got /3 concepts.
    Describe the advantages of using Adam optimizer compared to SGD.
    Consider how Adam adjusts learning rates and remembers past gradients.
    You got /4 concepts.