0
0
TensorFlowml~5 mins

Optimizers (SGD, Adam, RMSprop) in TensorFlow - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main role of an optimizer in machine learning?
An optimizer helps the model learn by adjusting its internal settings (weights) to reduce errors and improve predictions.
Click to reveal answer
beginner
How does the SGD optimizer update model weights?
SGD (Stochastic Gradient Descent) updates weights by moving them a small step opposite to the error direction, using a fixed learning rate.
Click to reveal answer
intermediate
What makes Adam optimizer different from SGD?
Adam combines ideas from momentum and RMSprop, adapting learning rates for each weight individually, which helps faster and more stable learning.
Click to reveal answer
intermediate
Why is RMSprop useful for training neural networks?
RMSprop adjusts the learning rate for each weight based on recent gradients, helping the model learn well even when gradients vary a lot.
Click to reveal answer
beginner
Which optimizer would you choose for a simple linear model and why?
SGD is often chosen for simple models because it is straightforward and effective when the learning rate is well tuned.
Click to reveal answer
Which optimizer adapts the learning rate for each parameter individually?
ASGD
BAdam
CBatch Gradient Descent
DNone of the above
What does SGD stand for?
AStochastic Gradient Descent
BSimple Gradient Descent
CSequential Gradient Descent
DStandard Gradient Descent
Which optimizer uses a moving average of squared gradients to adjust learning rates?
ASGD
BAdam
CRMSprop
DMomentum
Why might Adam be preferred over SGD?
AIt adapts learning rates and converges faster
BIt is simpler to implement
CIt requires no learning rate
DIt uses less memory
Which optimizer is best described as 'simple and effective with a fixed learning rate'?
AAdagrad
BRMSprop
CAdam
DSGD
Explain how the Adam optimizer works and why it might be better than SGD for some problems.
Think about how Adam changes learning rates for each weight and uses past gradients.
You got /4 concepts.
    Describe the differences between SGD, RMSprop, and Adam optimizers in simple terms.
    Focus on how each optimizer changes learning rates and uses past information.
    You got /4 concepts.